Researchers upend AI status quo by eliminating matrix multiplication in LLMs

theshatterstone54 , 1 day ago (edited 1 day ago)

Why are people downvoting? This is huge and should make LLMs more power efficient and memory efficient.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

yogthos OP , 1 day ago

Indeed, this seems like a big step forward, and here’s a link to the model github.com/ridgerchu/matmulfreellm

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

cygnus , 2 days ago

Finally some good “AI” news. Those things aren’t going away, so I’m happy to see any improvements to their energy efficiency.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

autotldr Bot , 2 days ago

This is the best summary I could come up with:

The researchers’ approach involves two main innovations: first, they created a custom LLM and constrained it to use only ternary values (-1, 0, 1) instead of traditional floating-point numbers, which allows for simpler computations.

Second, the researchers redesigned the computationally expensive self-attention mechanism in traditional language models with a simpler, more efficient unit (that they called a MatMul-free Linear Gated Recurrent Unit—or MLGRU) that processes words sequentially using basic arithmetic operations instead of matrix multiplications.

These changes, combined with a custom hardware implementation to accelerate ternary operations through the aforementioned FPGA chip, allowed the researchers to achieve what they claim is performance comparable to state-of-the-art models while reducing energy use.

Researchers claim the MatMul-free LM achieved competitive performance against the Llama 2 baseline on several benchmark tasks, including answering questions, commonsense reasoning, and physical understanding.

The researchers project that their approach could theoretically intersect with and surpass the performance of standard LLMs at scales around 10²³ FLOPS, which is roughly equivalent to the training compute required for models like Meta’s Llama-3 8B or Llama-2 70B.

The article was updated on June 26, 2024 at 9:20 AM to remove an inaccurate power estimate related to running a LLM locally on a RTX 3060 created by the author.

The original article contains 570 words, the summary contains 206 words. Saved 64%. I’m a bot and I’m open source!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...