I think that training models on scraped internet data should be legal if and only if those models’ weights are required to be open-source. It’d be like slapping a copyleft license on the internet - you can do what you want with public data, but you have to give what you use it for back to the public.