There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

Onihikage ,
@Onihikage@beehaw.org avatar

I have a fairly substantial 16gb AMD GPU, and when I load in Llama 3.1 8B Instruct 128k (Q4_0), it gives me about 12 tokens per second. That’s reasonably fast enough for me, but only 50% faster than CPU (which I test by loading mlabonne’s abliterated Q4_K_M version, which runs on CPU in GPT4All, though I have no idea if that’s actually meant to be comparable in performance).

Then I load in Nous Hermes 2 Mistral 7B DPO (also Q4_0) and it blazes through at 50+ tokens per second. So I don’t really know what’s going on there. Seems like performance varies a lot from model to model, but I don’t know enough to speculate why. I can’t even try Gemma2 models, GPT4All just crashes with them. I should probably test Alpaca to see if these perform any different there…

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines