The fuck is an English language AI supposed to be trained on, if not popular examples of the English language? Churning the entire global corpus down to a mess of algebra is extremely transformative. Nevermind the end goal is a generalized program to produce anything, in limitless quantities, based on high-level descriptions and incomplete examples.
We can’t even offer to use texts from thirty-odd years ago, in the public domain… because copyright maximalism like this shit right here have ruined the public domain. Surprisingly - A Game Of Thrones would still be in-copyright. It’s from 1996. But there’s another generation of living authors, and a whole mess of dead people, who haven’t written much since well before then, and made quite a lot of money off what they did. That culture belongs to its audience now. That’s what the money was for.