Local LLMs can be compressed to fit on consumer hardware. Model formats like GUFF and Exl2 can be loaded up with a offline hosted API like KobaldCPP or Oobabooga. These formats lose resolution from the full floating point model and become “dumber” but it’s good enough for many uses.
Also noting these models are like, 7, 11, 20 Billion parameters while hosted models like ChatGPT run closer to 8x220 Billion