There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

admin ,
@admin@lemmy.my-box.dev avatar

Technically correct ™

Before you get your hopes up: Anyone can download it, but very few will be able to actually run it.

chiisana ,
@chiisana@lemmy.chiisana.net avatar

What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.

Blaster_M ,

As a general rule of thumb, you need about 1 GB per 1B parameters, so you’re looking at about 405 GB for the full size of the model.

Quantization can compress it down to 1/2 or 1/4 that, but “makes it stupider” as a result.

modeler ,

Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model. Ouch.

Edit: you can try quantizing it. This reduces the amount of memory required per parameter to 4 bits, 2 bits or even 1 bit. As you reduce the size, the performance of the model can suffer. So in the extreme case you might be able to run this in under 64GB of graphics RAM.

TipRing ,

When the 8 bit quants hit, you could probably lease a 128GB system on runpod.

1984 ,
@1984@lemmy.today avatar

Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?

sunzu ,

405b ain't running local unless you got a proepr set up is enterpise grade lol

I think 70b is possible but I haven't find anyone confirming it yet

Also would like to know specs on whoever did it

Voyajer ,
@Voyajer@lemmy.world avatar

I’ve run quantized 70B models on CPU with 32 gigs but it is very slow

sunzu ,

I gonna add some RAM with hope I can split original 70b between GPU and RAM. 8b is great what it is as is

Looks like it should be possible, not sure how much performance hit offloading to RAM will do. Fafo

abcdqfr ,

Wake me up when it works offline “The Llama 3.1 models are available for download through Meta’s own website and on Hugging Face. They both require providing contact information and agreeing to a license and an acceptable use policy, which means that Meta can technically legally pull the rug out from under your use of Llama 3.1 or its outputs at any time.”

sunzu ,

I was able to set up small one via open webui.

It did ask to make an account but I didn't see any pinging home when I did it.

What am I missing here?

Fiivemacs ,

Through meta…

That’s where I stop caring

RandomLegend ,
@RandomLegend@lemmy.dbzer0.com avatar

It’s available through ollama already. i am running the 8b model on my little server with it’s 3070 as of right now.

It’s really impressive for a 8b model

abcdqfr ,

Intriguing. Is that an 8gb card? Might have to try this after all

RandomLegend ,
@RandomLegend@lemmy.dbzer0.com avatar

Yup, 8GB card

Its my old one from the gaming PC after switching to AMD.

It now serves as my little AI hub and whisper server for home assistant

admin ,
@admin@lemmy.my-box.dev avatar

WAKE UP!

It works offline. When you use with ollama, you don’t have to register or agree to anything.

Once you have downloaded it, it will keep on working, meta can’t shut it down.

sunzu ,

Did anyone get 70b to run locally?

If so what, what hardware specs?

DarkThoughts ,

Afaik you need about 40GB of vram for a 70b model.

sunzu ,

Can't you offload some of it to RAM?

DarkThoughts ,

Same requirements, but much slower.

sunzu ,

I guess time to buy some ram after spending decade at 16gb

hperrin ,

Yo this is big. In both that it is momentous, and holy shit that’s a lot of parameters. How many GB is this model?? I’d be able to run it if I had an few extra $10k bills lying around to buy the required hardware.

Ripper ,

its around 800gb

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines