Yeah this seems like a really tough problem with LLMs. From memory OpenAI have said they are hoping to see a big improvement next year which is a pretty long time given the rapid pace of everything else in the AI space.
I really hope they or others can make some big strides here because it really limits the usefulness of these models.