I’m in the early stages of this myself and haven’t actually run an LLM locally but the term that steered me in the right direction for what I was trying to do was ‘RAG’ Retrieval-Augmented Generation.
ragflow.io (terrible name but good product) seems to be a good starting point but is mainly set up for APIs at the moment though I found this link for local LLM integration and I’m going to play with it later today. github.com/infiniflow/…/deploy_local_llm.md