I’m still not sold that dynamic text generation is going to be the major near-term application for LLMs, much less in games. Like, don’t get me wrong, it’s impressive what they’ve done. But I’ve also found it to be the least-practically-useful of the LLM model categories. Like, you can make real, honest-to-God solid usable graphics with Stable Diffusion. You can do pretty impressive speech generation in TortoiseTTS. I imagine that someone will make a locally-runnable music LLM model and software at some point if they haven’t yet; I’m pretty impressed with what the online services do there. I think that there are a lot of neat applications for image recognition; the other day I wanted to identify a tree and seedpod. Someone hasn’t built software to do that yet (that I’m aware of), but I’m sure that they will; the ability to map images back to text is pretty impressive. I’m also amazed by the AI image upscaling that Stable Diffusion can do, and I suspect that there’s still room for a lot of improvement there, as that’s not the main goal of Stable Diffusion. And once someone has done a good job of building a bunch of annotated 3d models, I think that there’s a whole new world of 3d.
I will bet that before we see that becoming the norm in games, we’ll see LLMs regularly used for either pre-generated speech synth or in-game speech synthesis, so that the characters say text (which might be procedurally-generated, aren’t just static pre-recorded samples, but aren’t necessarily generated from an LLM). Like, it’s not practical to have a human voice actor cover all possible phrases with static recorded speech that one might want an in-game character to speak.