A LOT. You can barely run 13b parameter models on a 24gb gfx card and outputs are like a page or so of text. Translate that over to audio and it would have to be broken down into discrete chunks that the model could use as “prompts” to output a section of audio that fit into the models available output. It might compress better, but it would be exceedingly painful and slow to extract even on AI focused cards. And it would use OODLES of watts to get just a little bit better than flac.