Activity - It’s less the calculations and more about memory bandwidth. To generate a...

theterrasque , 1 month ago

It’s less the calculations and more about memory bandwidth. To generate a token you need to go through all the model data, and that’s usually many many gigabytes. So the time it takes to read through in memory is usually longer than the compute time. GPUs have gb’s of RAM that’s many times faster than the CPU’s ram, which is the main reason it’s faster for llm’s.

Most tpu’s don’t have much ram, and especially cheap ones.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...