There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

theshatterstone54 , (edited )

Why are people downvoting? This is huge and should make LLMs more power efficient and memory efficient.

yogthos OP ,
@yogthos@lemmy.ml avatar

Indeed, this seems like a big step forward, and here’s a link to the model github.com/ridgerchu/matmulfreellm

cygnus ,
@cygnus@lemmy.ca avatar

Finally some good “AI” news. Those things aren’t going away, so I’m happy to see any improvements to their energy efficiency.

autotldr Bot ,

This is the best summary I could come up with:


The researchers’ approach involves two main innovations: first, they created a custom LLM and constrained it to use only ternary values (-1, 0, 1) instead of traditional floating-point numbers, which allows for simpler computations.

Second, the researchers redesigned the computationally expensive self-attention mechanism in traditional language models with a simpler, more efficient unit (that they called a MatMul-free Linear Gated Recurrent Unit—or MLGRU) that processes words sequentially using basic arithmetic operations instead of matrix multiplications.

These changes, combined with a custom hardware implementation to accelerate ternary operations through the aforementioned FPGA chip, allowed the researchers to achieve what they claim is performance comparable to state-of-the-art models while reducing energy use.

Researchers claim the MatMul-free LM achieved competitive performance against the Llama 2 baseline on several benchmark tasks, including answering questions, commonsense reasoning, and physical understanding.

The researchers project that their approach could theoretically intersect with and surpass the performance of standard LLMs at scales around 10²³ FLOPS, which is roughly equivalent to the training compute required for models like Meta’s Llama-3 8B or Llama-2 70B.

The article was updated on June 26, 2024 at 9:20 AM to remove an inaccurate power estimate related to running a LLM locally on a RTX 3060 created by the author.


The original article contains 570 words, the summary contains 206 words. Saved 64%. I’m a bot and I’m open source!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines