Activity - You can probably run a 7b LLM comfortably in system RAM, maybe one of the...

theterrasque , 1 year ago

You can probably run a 7b LLM comfortably in system RAM, maybe one of the smaller 13b ones.

Software to use

github.com/ggerganov/llama.cpp - command line. Basic, flexible.

github.com/LostRuins/koboldcpp - Precompiled llama.cpp with ui - easy to start with

Models

In general, you want small GGML models. huggingface.co/TheBloke has a lot of them. There are some superHOT version of models, but I’d avoid them for now. They’re trained to handle bigger context sizes, but it seems that made them dumber too. There’s a lot of new things coming out on bigger context lengths, so you should probably revisit that when you need it.

huggingface.co/TheBloke/orca_mini_v2_13b-GGML - the q3_K_M.bin perhaps - might still be too big, depending on what you’re running in the background

huggingface.co/TheBloke/orca_mini_3B-GGML - very small model. Not sure how well it’ll do

huggingface.co/…/airoboros-7B-gpt4-1.4-GGML

huggingface.co/TheBloke/vicuna-7B-v1.3-GGML

huggingface.co/…/WizardLM-7B-V1.0-Uncensored-GGML

Each have different strengths, orca is supposed to be better at reasoning, airoboros is good at longer and more storylike answers, vicuna is a very good allrounder, wizardlm is also a notably good allrounder.

For training, there are some tricks like qlora, but results aren’t impressive from what I’ve read. Also, training LLM’s can be pretty difficult to get the results you want. You should probably start with just running them and get comfortable with that, maybe try few-shot prompts (prompts with a few examples of writing styles), and then go from there.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...