The first GPT-4-class AI model anyone can download has arrived: Llama 405B

admin , 3 hours ago

Technically correct ™

Before you get your hopes up: Anyone can download it, but very few will be able to actually run it.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

chiisana , 3 hours ago

What’s the resources requirements for the 405B model? I did some digging but couldn’t find any documentation during my cursory search.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Blaster_M , 3 hours ago

As a general rule of thumb, you need about 1 GB per 1B parameters, so you’re looking at about 405 GB for the full size of the model.

Quantization can compress it down to 1/2 or 1/4 that, but “makes it stupider” as a result.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

modeler , 3 hours ago

Typically you need about 1GB graphics RAM for each billion parameters (i.e. one byte per parameter). This is a 405B parameter model. Ouch.

Edit: you can try quantizing it. This reduces the amount of memory required per parameter to 4 bits, 2 bits or even 1 bit. As you reduce the size, the performance of the model can suffer. So in the extreme case you might be able to run this in under 64GB of graphics RAM.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

TipRing , 2 hours ago

When the 8 bit quants hit, you could probably lease a 128GB system on runpod.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

1984 , 1 hour ago

Can you run this in a distributed manner, like with kubernetes and lots of smaller machines?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sunzu , 3 hours ago

405b ain't running local unless you got a proepr set up is enterpise grade lol

I think 70b is possible but I haven't find anyone confirming it yet

Also would like to know specs on whoever did it

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Voyajer , 2 hours ago

I’ve run quantized 70B models on CPU with 32 gigs but it is very slow

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sunzu , 2 hours ago

I gonna add some RAM with hope I can split original 70b between GPU and RAM. 8b is great what it is as is

Looks like it should be possible, not sure how much performance hit offloading to RAM will do. Fafo

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

abcdqfr , 3 hours ago

Wake me up when it works offline “The Llama 3.1 models are available for download through Meta’s own website and on Hugging Face. They both require providing contact information and agreeing to a license and an acceptable use policy, which means that Meta can technically legally pull the rug out from under your use of Llama 3.1 or its outputs at any time.”

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sunzu , 3 hours ago

I was able to set up small one via open webui.

It did ask to make an account but I didn't see any pinging home when I did it.

What am I missing here?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Fiivemacs , 2 hours ago

Through meta…

That’s where I stop caring

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

RandomLegend , 2 hours ago

It’s available through ollama already. i am running the 8b model on my little server with it’s 3070 as of right now.

It’s really impressive for a 8b model

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

abcdqfr , 1 hour ago

Intriguing. Is that an 8gb card? Might have to try this after all

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

RandomLegend , 1 hour ago

Yup, 8GB card

Its my old one from the gaming PC after switching to AMD.

It now serves as my little AI hub and whisper server for home assistant

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

admin , 2 hours ago

WAKE UP!

It works offline. When you use with ollama, you don’t have to register or agree to anything.

Once you have downloaded it, it will keep on working, meta can’t shut it down.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sunzu , 3 hours ago

Did anyone get 70b to run locally?

If so what, what hardware specs?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

DarkThoughts , 3 hours ago

Afaik you need about 40GB of vram for a 70b model.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sunzu , 3 hours ago

Can't you offload some of it to RAM?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

DarkThoughts , 3 hours ago

Same requirements, but much slower.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sunzu , 2 hours ago

I guess time to buy some ram after spending decade at 16gb

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

hperrin , 2 hours ago

Yo this is big. In both that it is momentous, and holy shit that’s a lot of parameters. How many GB is this model?? I’d be able to run it if I had an few extra $10k bills lying around to buy the required hardware.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Ripper , 1 hour ago

its around 800gb

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...