Activity - Taking ollama for instance, either the whole model runs in vram and compute is...

PumpkinEscobar , 5 days ago

Taking ollama for instance, either the whole model runs in vram and compute is done on the gpu, or it runs in system ram and compute is done on the cpu. Running models on CPU is horribly slow. You won’t want to do it for large models

LM studio and others allow you to run part of the model on GPU and part on CPU, splitting memory requirements but still pretty slow.

Even the smaller 7B parameter models run pretty slow in CPU and the huge models are orders of magnitude slower

So technically more system ram will let you run some larger models but you will quickly figure out you just don’t want to do it.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...