I can already tell this is going to be a unpopular opinion judging by the comments but this is my ideology on it
it’s totally true. I’m indifferent on it, if it was acquired by a public facing source I don’t really care, but like im definitly against using data dumps or data that wasn’t available to the public in the first place. The whole thing with AI is rediculous, it’s the same as someone going to a website and making a mirror, or a reporter making an article that talks about what’s in it, last three web search based AI’s even gave sources for where it got the info. I don’t get the argument.
if it’s image based AI, well it’s the equivalent to an artist going to an art museum and deciding they want to replicate the art style seen in a painting. Maybe they shouldn’t be in a publishing field if they don’t want their work seen/used. That’s my ideology on it it’s not like the AI is taking a one-to-one copy and selling the artwork as , which in my opinion is a much more harmful instance and already happens commonly in today’s art world, it’s analyzing existing artwork which was available through the same means that everyone else had of going online loading up images and scraping the data. By this logic, artist should not be allowed to enter any art based websites museums or galleries, since by looking at others are they are able to adjust their own art which is stealing the author’s work. I’m not for or against it but, the ideology is insane to me.
@Pika@flop_leash_973 This is largely my thoughts on the whole thing, the process of actually training the AI is no different from a human learning
The thing about that, is that there's likely enough precedent in copyright law to actually handle that, with most copyright law it's all about intent and scale and I think that's likely where this will all go
Here the intent is to replace and the scale is astronomical, whereas an individual's intent is to add and the scale is minimal
The process of training the model is arguably similar to a human learning, and if the model just sat on a server doing nothing but knowing, there’d be no problem. Taking that knowledge and selling it to the public en mass is the issue.
This is precisely what copyrights and patents are here to safeguard. Is there already a book like A Song of Ice and Fire? Write something else, maybe better! There’s already a patent for an idea you have? Change and improve upon it and get your own patent!
You see, copyrights and patents are supposed to spur creativity, not hinder it. OpenAI should improve upon its system so that it actually thinks and is creative itself rather than regurgitating copyrighted materials, themes and ideas. Then they wouldn’t have this problem.
OpenAI wants literally all of human knowledge and creativity for free so that they can sell it back to you. And you’re okay-ish with it?
@Subverb that is, quite impressively, the opposite of what I said
Is a person infringing on copyright by producing content? No. It’s about intent and scale. Humans don’t just sit on this knowledge, they do something with it
There is nothing illegal about WHAT it’s doing, there is everything illegal about HOW and WHY
I very clearly stated that OpenAI’s intent and their scale at which they operate are blatant copyright infringement and that it has been backed up with decades of precedents
@zbyte64 with everything you see you are scraping data from your environment whether you want to or not
How does a child learn what pain is? How does a teenager learn what heartbreak is? It’s certainly not because they made the decision to find that out themselves
I bring up agency and I get an exemplary response what I mean.
Raising a child well requires someone who is able to engage in the child’s own theory of mind. If you just treat a child as an information sponge they will need more therapy than usual. A good parent takes interest in their child’s ability to exercise agency.
Then I guess my original point of agency being an essential element in human learning had nothing to do with your conversation about how AI learns like humans. Carey on.
Agreed. I don’t understand how training LLM on publicly available data is an issue. As you says, it doesn’t copy the work. Rather the data is used as “inspiration” to stay in the art analogy.
Maybe I’m ignorant. Would love to be proven wrong. Right now it seems to me that failing media publishers are trying to do a money grab and use copyright as an argument, even though their data/material isn’t getting illegally reproduced.
I don’t mind him using copyrighted materials as long as it leads to OpenAI becoming truly open source. Humans can replicate anything found in the wild with minor variations, so AI should have the same access. This is how human creativity builds upon itself. Why limit AI? We already know all the jobs people have will be replaced anyway eventually.
That’s a good point. AIs/LLMs will exist and will necessarily learn from copyrighted materials without traceability back to the copyright owners to compensate them.
Sounds to me like AIs/LLMs can’t and shouldn’t be proprietary systems owned by private entities for profit, then.
Humans can replicate anything found in the wild with minor variations, so AI should have the same access
But that’s not what OpenAI is asking though. They want free access for the type of content you or I need to pay for. And they want it so they can then sell the resulting “variation” they produce
That’s not exactly true. They are selling tools for people to recreate with variation.
I propose an analogy: Let’s imagine a company sells brush that are used by painter to create art, now imagine the employees of this company go to the street to look how street artist create those amazing art piece on the ground for everyone to see (the artist does ask for donation in a hat next to the art pieces), now let’s imagine the employees stay there to look at his techniques for hours and design a new kind of brush that will make it way easier to create the same kind of art.
Would you argue that the company should not be allowed to sell their newly designed brush without giving money to the street artist ?
Should all your teachers be paid for everything you produce throughout your life ?
Should your parents gets compensated every time you use the knowledge you acquired from them ?
In case anyone reading is interested by my opinion: I think intellectual property is the dumbest concept, and one of the biggest scams of capitalism. Nobody should own any ideas. Everybody should be legally able to use anyone else’s ideas and build on them. I think we’ve been deprived of an infinity of great stories, images, lore, design, music, movies, shapes, clothes, games, etc… Because of this dumb rule that you can’t use other people’s ideas.
I propose an analogy: Let’s imagine a company sells brush…
That would be analogous to any content publicly available for free (or via donation). OpenAI wants free access to the art being sold. They also don’t really create the brush, they produced slightly modified versions of the art produced by the artists who does not receive money or credit
Should all your teachers be paid for everything you produce throughout your life ?
They definitely should be paid more. But your analogy is completely off track here since, unlike AI, humans can actually posses and develop intelligence. Not just parrot combinations or the same things we have seen before
Should your parents gets compensated every time you use the knowledge you acquired from them ?
Ok now you are just flailing but even then, yes and most do as it is a general thing that kids take care of their parents when the kids are grown and parents cannot look after themselves
In case anyone reading is interested by my opinion…
This is your best paragraph and I would agree with it. It’s not compatible with capitalism as you allude but I’d be open to radical new thinking
However, that’s is not what’s at play here either. OpenAI wants something we all have to pay for, for free, so they can then resale something else. Worst yet, the value in what OpenAI wants to sell, lies basically on never paying again to the people who produce the stuff it wants for free
But then If we agree on IP, we should not complain that openai want free access to copyrighted materials, we should use their own logic to force them to make their model open source, and free for anyone to execute on their own hardware.
They get free access to data so we should get free access to the compilation of the data. Then they can charge us for the hardware cost of running the model, but they’ll have to charge us no more than what it costs, because they will be competing with other company running the exact same model and driving the price down.
Yeah, but because our government views technological dominance as a National Security issue we can be sure that this will come to nothing bc China Bad™.
Seems the same as a band being influenced by other bands that came before them. How many bands listened to Metallica and used those ideas to create new music?
Those claiming AI training on copyrighted works is “theft” are misunderstanding key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.
This process is more akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.
This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.
Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was found to be legal despite protests from authors and publishers. AI training is arguably even more transformative.
While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.
So the issue being, in general to be influenced by someone else’s work you would have typically supported that work… like… literally at all. Purchasing, or even simply discussing and sharing with others who may purchase said material are both worth a lot more than not at all, and directly competing without giving source material, influences, or etc.
If it is on the open internet and visible to anyone with a web browser and you have an adblocker like most people, you are not paying to support that work. That’s what it was trained on.
Fucking Christ I am so sick of people referencing the Google books lawsuit in any discussion about AI
The publishers lost that case because the judge ruled that Google Books was copying a minimal portion of the books, and that Google Books was not competing against the publishers, thus the infringement was ruled as fair use.
AI training does not fall under this umbrella, because it’s using the entirety of the copyrighted work, and the purpose of this infringement is to build a direct competitor to the people and companies whose works were infringed. You may as well talk about OJ Simpson’s criminal trial, it’s about as relevant.
The internet has been primarily derivative content for a long time. As much as some haven’t wanted to admit it. It’s true. These fancy algorithms now take it to the exponential factor.
Original content had already become sparsely seen anymore as monetization ramped up. And then this generation of AI algorithms arrived.
The several years before prior to LLMs becoming a thing, the internet was basically just regurgitating data from API calls or scraping someone else’s content and representing it in your own way.