There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.

This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.

Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.

For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. twit.tv/shows/floss-weekly/episodes/744

Loki ,
@Loki@discuss.tchncs.de avatar

Even if you come to the conclusion that these models should be allowed to “learn” from copyrighted material, the issue is that they can and will reproduce copyrighted material.

They might not recreate a picture of Mickey Mouse that exists already, but they will draw a picture of Mickey Mouse. Just like I could, except I’m aware that I can’t monetize it in any way. Well, new Mickey Mouse.

LANIK2000 ,

This process is akin to how humans learn…

I’m so fucking sick of people saying that. We have no fucking clue how humans LEARN. Aka gather understanding aka how cognition works or what it truly is. On the contrary we can deduce that it probably isn’t very close to human memory/learning/cognition/sentience (any other buzzword that are stands-ins for things we don’t understand yet), considering human memory is extremely lossy and tends to infer its own bias, as opposed to LLMs that do neither and religiously follow patters to their own fault.

It’s quite literally a text prediction machine that started its life as a translator (and still does amazingly at that task), it just happens to turn out that general human language is a very powerful tool all on its own.

I could go on and on as I usually do on lemmy about AI, but your argument is literally “Neural network is theoretically like the nervous system, therefore human”, I have no faith in getting through to you people.

calcopiritus ,

I’ll train my AI on just the bee movie. Then I’m going to ask it “can you make me a movie about bees”? When it spits the whole movie, I can just watch it or sell it or whatever, it was a creation of my AI, which learned just like any human would! Of course I didn’t even pay for the original copy to train my AI, it’s for learning purposes, and learning should be a basic human right!

stephen01king ,

That would be like you writing out the bee movie yourself after memorizing the whole movie and claiming it is your own idea or using it as proof that humans memorizing a movie is violating copyright. Just because an AI is violating copyright by outputting the whole bee movie, it doesn’t mean training the AI on copyright stuff is violating copyright.

Let’s just punish the AI companies for outputting copyright stuff instead of for training with them. Maybe that way they would actually go out of their way to make their LLM intelligent enough to not spit out copyrighted content.

Or, we can just make it so that any output made by an AI that is trained on copyrighted stuff cannot be copyrighted.

calcopiritus ,

If the solution is making the output non-copyrighted it fixes nothing. You can sell the pirating machine on a subscription. And it’s not like Netflix where the content ends when the subscription ends, you have already downloaded all the not-copyrighted content you wanted, and the internet would be full of non-copyrighted AI output.

Instead of selling the bee movie, you sell a bee movie maker, and a spiderman maker, and a titanic maker.

Sure, file a copyright infringement each time you manage to make an AI output copyrighted content. Just run it on a loop and it’s a money making machine. That’s fine by me.

Danterious ,

There is actually already a website where people just recreated the bee movie by hand so idk it might actually work as a legal argument.

https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en

FatCat OP ,
@FatCat@lemmy.world avatar

I am thrilled to see the output you get!

ContrarianTrail ,

I never fully figured out how the people who are against AI companies using copyrighted content on the training data fit that in with their general attitude towards online piracy. Seems contradictory to be against one but not another.

Big_Boss_77 ,

Is the pirate valued at $100,000,000,000? Will the pirate ever make enough of a dent to be considered a rounding error in a $100bn valuation? Is the pirate even attempting to turn a profit?

If the training data was for personal consumption, knock yourself out. When you try to say you’re worth billions but can’t afford to pay for the material? Fuck all the way off. I’m sure fucknuts at the top of this is gonna get a fat fucking pay day, so scrape a few fucking zeros off their quartly bonus and pay the people actually making the fucking content you are ABSOLUTELY going to turn around and try to make a profit off of.

ContrarianTrail ,

I don’t see how this addresses my question. Just because someone is causing bigger harm it doesn’t justify causing a little harm. Stealing a lollipop is less bad than stealing a car but it’s still both stealing. AI companies can afford paying for the material just like online pirate can afford paying for the movie.

daellat ,

Because the small thief in this example is not making money from the theft

ContrarianTrail ,

No, but they’re saving money which is effectively the same thing. There’s no practical difference between earning 50 bucks and getting a 50 buck discount.

keegomatic ,

That’s not quite true, though, is it?

$50 earned is yours to spend on anything. A $50 discount is offered by a vendor to entice you to spend enough of your money on them to make the discount worthwhile.

Pirates don’t pirate because they’re trying to save money on something they would have bought otherwise… typically they pirate because the amount they consume would bankrupt them if they purchased it through legitimate means, so they would never have been a paying customer in the first place.

So, if they wouldn’t have bought it anyway, and they’re not reselling it, did they really harm the vendor? Whether they pirated it or not, it wouldn’t affect the vendor either way.

That’s not really the same thing, in my opinion.

If you were able to pay for everything handily but pirated anyway, or if you resold pirated content, then yeah you have something similar to theft going on. But that’s not really the norm; those people are doing something bad irrespective of the piracy itself, aren’t they?

petrol_sniff_king ,

It’s not because what they’re against is the consolidation of power.

If the principle “information is free” can lead to systems where information is not free, then that’s not really desirable, is it.

If free information to inspire more creative works can lead to systems with less creative works, then that’s not really desirable, is it.

toddestan ,

Your average pirate isn’t looking to profit from their copyright infringement.

In a similar way, someone getting busted for downloading a movie is a civil matter, but if they get busted for selling unauthorized copies on DVD then it can become a criminal matter.

ContrarianTrail ,

They’re saving money which is effectively the same thing.

toddestan ,

The pirate is looking to save money with their copyright infringement.

These AI companies are looking to make money from it.

Varyk ,

tweet is good, your body argument is completely wrong

helenslunch ,
@helenslunch@feddit.nl avatar

Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology.

Or maybe they’re not talking about copyright law. They’re talking about basic concepts. Maybe copyright law needs to be brought into the 21st century?

arin ,

Kids pay for books, openAI should also pay for the material access used for training.

FatCat OP ,
@FatCat@lemmy.world avatar

OpenAI like other AI companies keep their data sources confidential. But there are services and commercial databases for books that people understand are commonly used in the AI industry.

Veneroso ,

We have hundreds of years of out of copyright books and newspapers. I look forward to interacting with old-timey AI.

“Fiddle sticks! These mechanical horses will never catch on! They’re far too loud and barely more faster than a man can run!”

“A Woman’s place is raising children and tending to the house! If they get the vote, what will they demand next!? To earn a Man’s wage!?”

That last one is still relevant to today’s discourse somehow!?

VerbFlow ,
@VerbFlow@lemmy.world avatar

There are a few problems, tho. 123456

NeoNachtwaechter ,

The sad news is:

Their argument could fall on fertile ground.

The Usamerican legal system protects a running business. When such a rich and famous corporation argues (and it would be highly paid lawyers arguing) that their business could be in jeopardy, they are going to listen, no matter how ridiculous the reasoning.

In other countries, they just make a judge laughing out loud.

kibiz0r ,

Not even stealing cheese to run a sandwich shop.

Stealing cheese to melt it all together and run a cheese shop that undercuts the original cheese shops they stole from.

TheKMAP ,

Whatever happened to copying isn’t stealing?

I think the crux of the conversation is whether or not the world is better with ChatGPT. I say yes. We can tackle the disinformation in another effort.

calcopiritus ,

When you copy to consume yourself it’s way different than when you copy to sell the copy for a lower price.

renrenPDX ,

Then OpenAI should pay for a copy, like we do.

mightyfoolish ,

Is their an official statement if OpenAI pays for at least one copy of whatever they throw into the bots?

spacesatan ,

I’m I the only person that remembers that it was “you wouldn’t steal a car” or has everyone just decided to pretend it was “you wouldn’t download a car” because that’s easier to dunk on.

roguetrick ,

You wouldn’t shoot a policeman and then steal his helmet.

C126 ,

These anti piracy commercials have gotten really mean.

JasonDJ ,

I’m pretty sure it’s either Mandela Effect or a massive gaslighting conspiracy. Though I guess that’s true for everything that’s collectively misremembered.

Cornelius_Wangenheim ,

People remember the parody, which is usually modified to be more recognizable. Like Darth Vader never said “Luke, I am your father”; in the movie it’s actually “No, I am your father”.

ShittyBeatlesFCPres ,

Maybe add a spoiler alert next time. Jeez.

stephen01king ,

Maybe it’s being confused with the “download more ram” thing.

EldritchFeminity ,

The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.

And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.

Riccosuave ,
@Riccosuave@lemmy.world avatar

Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.

v_krishna ,
@v_krishna@lemmy.ml avatar

That seems more like an argument for free higher education rather than restricting what corpuses a deep learning model can train on

Malfeasant ,

Tomato, tomato…

nickwitha_k ,

Porque no los dos? Allowing major corps to put even more downward pressure on workers doesn’t help anyone but the rich. LLMs aren’t going to save the world or become sentient.

interdimensionalmeme ,

The solution is any AI must always be released on a strong copyleft and possibly abolish copyright outright has it has only served the powerful by allowing them to enclose humanity common intellectual heritage (see Disney’s looting and enclosing if ancestral children stories). If you choose to strengthen the current regime, don’t expect things to improve for you as an irrelevant atomised individual,

Dran_Arcana ,

Devil’s Advocate:

How do we know that our brains don’t work the same way?

Why would it matter that we learn differently than a program learns?

Suppose someone has a photographic memory, should it be illegal for them to consume copyrighted works?

EldritchFeminity ,

Because we’re talking pattern recognition levels of learning. At best, they’re the equivalent of parrots mimicking human speech. They take inputs and output data based on the statistical averages from their training sets - collaging pieces of their training into what they think is the right answer. And I use the word think here loosely, as this is the exact same process that the Gaussian blur tool in Photoshop uses.

This matters in the context of the fact that these companies are trying to profit off of the output of these programs. If somebody with an eidetic memory is trying to sell pieces of works that they’ve consumed as their own - or even somebody copy-pasting bits from Clif Notes - then they should get in trouble; the same as these companies.

Given A and B, we can understand C. But an LLM will only be able to give you AB, A(b), and B(a). And they’ve even been just spitting out A and B wholesale, proving that they retain their training data and will regurgitate the entirety of copyrighted material.

ricecake ,

Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.

I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.

Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?

You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.

I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

keegomatic ,

I’m not the above poster, but I really appreciate your argument. I think many people overcorrect in their minds about whether or not these models learn the way we do, and they miss the fact that they do behave very similarly to parts of our own systems. I’ve generally found that that overcorrection leads to bad arguments about copyright violation and ethical concerns.

However, your point is very interesting (and it is thankfully independent of that overcorrection). We’ve never had to worry about nonhuman personhood in any amount of seriousness in the past, so it’s strangely not obvious despite how obvious it should be: it’s okay to treat real people as special, even in the face of the arguable personhood of a sufficiently advanced machine. One good reason the machine can be treated differently is because we made it for us, like everything else we make.

I think there still is one related but dangling ethical question. What about machines that are made for us but we decide for whatever reason that they are equivalent in sentience and consciousness to humans?

A human has rights and can take what they’ve learned and make works inspired by it for money, or for someone else to make money through them. They are well within their rights to do so. A machine that we’ve decided is equivalent in sentience to a human, though… can that nonhuman person go take what it’s learned and make works inspired by it so that another person can make money through them?

If they SHOULDN’T be allowed to do that, then it’s notable that this scenario is only separated from what we have now by a gap in technology.

If they SHOULD be allowed to do that (which we could make a good argument for, since we’ve agreed that it is a sentient being) then the technology gap is again notable.

I don’t think the size of the technology gap actually matters here, logically; I think you can hand-wave it away pretty easily and apply it to our current situation rather than a future one. My guess, though, is that the size of the gap is of intuitive importance to anyone thinking about it (I’m no different) and most people would answer one way or the other depending on how big they perceive the technology gap to be.

petrol_sniff_king ,

Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.

I agree, but the fact that shills for this technology are also wrong about it is at least interesting.

Rhetorically speaking, I don’t know if that’s useless.

I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy,

I do like this point a lot.

If they can find a way to do and use the cool stuff without making things worse, they should focus on that.

I do miss when the likes of cleverbot was just a fun novelty on the Internet.

Eatspancakes84 ,

I am also not really getting the argument. If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

The issue is of course that it’s not at all similar to how humans learn. It needs VASTLY more data to produce something even remotely sensible. Develop AI that’s truly transformative, by making it as efficient as humans are in learning, and the cost of paying for copyright will be negligible.

petrol_sniff_king ,

If I as a human want to learn a subject from a book, I buy it

xD
That’s good.

Deathcrow ,

Dude never heard of a library. I only bought a handful of books during my degree, I would’ve been homeless if I had to buy a copy of every learning source

stephen01king ,

If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.

You’re on Lemmy where people casually says “piracy is morally the right thing to do”, so I’m not sure this argument works on this platform.

macrocephalic ,

It’s an interesting area. Are they suggesting that a human reading copyright material and learning from it is a breach?

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines