GPT-4 is apparently the model to beat. I haven't seen all that much difference in practice between GPT-4 and 4o. I've heard various claims about various other models outperforming it (notably including Claude) but I haven't seen the claims materialize over the long haul as yet.
I have however heard that Mistral can get quite close to GPT-4, run for free locally with the right hardware, if you build up a hand curated set of around 100 query/response pairs from GPT-4 that are what you want it to do, and then fine-tune Mistral against that training set. I haven't tried it but that's what I've heard.