There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.

This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.

This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.

Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.

While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.

For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. twit.tv/shows/floss-weekly/episodes/744

Veneroso ,

We have hundreds of years of out of copyright books and newspapers. I look forward to interacting with old-timey AI.

“Fiddle sticks! These mechanical horses will never catch on! They’re far too loud and barely more faster than a man can run!”

“A Woman’s place is raising children and tending to the house! If they get the vote, what will they demand next!? To earn a Man’s wage!?”

That last one is still relevant to today’s discourse somehow!?

VerbFlow ,
@VerbFlow@lemmy.world avatar

There are a few problems, tho. 123456

NeoNachtwaechter ,

The sad news is:

Their argument could fall on fertile ground.

The Usamerican legal system protects a running business. When such a rich and famous corporation argues (and it would be highly paid lawyers arguing) that their business could be in jeopardy, they are going to listen, no matter how ridiculous the reasoning.

In other countries, they just make a judge laughing out loud.

kibiz0r ,

Not even stealing cheese to run a sandwich shop.

Stealing cheese to melt it all together and run a cheese shop that undercuts the original cheese shops they stole from.

renrenPDX ,

Then OpenAI should pay for a copy, like we do.

mightyfoolish ,

Is their an official statement if OpenAI pays for at least one copy of whatever they throw into the bots?

spacesatan ,

I’m I the only person that remembers that it was “you wouldn’t steal a car” or has everyone just decided to pretend it was “you wouldn’t download a car” because that’s easier to dunk on.

roguetrick ,

You wouldn’t shoot a policeman and then steal his helmet.

C126 ,

These anti piracy commercials have gotten really mean.

EldritchFeminity ,

The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.

And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.

Riccosuave ,
@Riccosuave@lemmy.world avatar

Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.

v_krishna ,
@v_krishna@lemmy.ml avatar

That seems more like an argument for free higher education rather than restricting what corpuses a deep learning model can train on

Malfeasant ,

Tomato, tomato…

interdimensionalmeme ,

The solution is any AI must always be released on a strong copyleft and possibly abolish copyright outright has it has only served the powerful by allowing them to enclose humanity common intellectual heritage (see Disney’s looting and enclosing if ancestral children stories). If you choose to strengthen the current regime, don’t expect things to improve for you as an irrelevant atomised individual,

Dran_Arcana ,

Devil’s Advocate:

How do we know that our brains don’t work the same way?

Why would it matter that we learn differently than a program learns?

Suppose someone has a photographic memory, should it be illegal for them to consume copyrighted works?

macrocephalic ,

It’s an interesting area. Are they suggesting that a human reading copyright material and learning from it is a breach?

Shanedino ,

Maybe if you would pay for training data they would let you use copyright data or something?

T156 ,

Had the company paid for the training data and/or left it as voluntary, there would be less of a problem with it to begin with.

Part of the problem is that they didn’t, but are still using it for commercial purposes.

andrew_bidlaw ,
@andrew_bidlaw@sh.itjust.works avatar

Their business strategy is built on top of assumption they won’t. They don’t want this door opened at all. It was a great deal for Google to buy Reddit’s data for some $mil., because it is a huge collection behind one entity. Now imagine communicating to each individual site owner whose resources they scrapped.

If that could’ve been how it started, the development of these AI tools could be much slower because of (1) data being added to the bunch only after an agreement, (2) more expenses meaning less money for hardware expansion and (3) investors and companies being less hyped up about that thing because it doesn’t grow like a mushroom cloud while following legal procedures. Also, (4) the ability to investigate and collect a public list of what sites they have agreement with is pretty damning making it’s own news stories and conflicts.

TunaCowboy ,

I wouldn’t say I’m on OAI’s side here, but I’m down to eliminate copyright. New economic models will emerge, especially if more creatives unionize.

randon31415 ,

I finally understand Trump supporters “Fuck it, burn it all to the ground cause we can’t win” POV. Only instead of democracy, it is copyright and instead of Trump, it is AI.

uriel238 ,
@uriel238@lemmy.blahaj.zone avatar

I personally am down for this punch-up between Alphabet and Sony. Microsoft v. Disney.

🍿

overload ,

Surely it’s coming. We have The music publishing cartel vs Suno already.

roofuskit ,

The Times’ lawyers must be chuffed reading this.

PenisDuckCuck9001 , (edited )

Honestly, if this somehow results in regulators being like “fuck it, piracy is legal now” it won’t negatively impact me in any way…

Corporations have abused copyright law for decades, they’ve ruined the internet, they’ve ruined media, they’ve ruined video games. I want them to lose more than anything else.

The shitty and likely situation is they’ll be like “fuck it corporate piracy is legal but individuals doing it is still a crime”.

Starbuncle ,

I think that training models on scraped internet data should be legal if and only if those models’ weights are required to be open-source. It’d be like slapping a copyleft license on the internet - you can do what you want with public data, but you have to give what you use it for back to the public.

xenomor , (edited )

This take is correct although I would make one addition. It is true that copyright violation doesn’t happen when copyrighted material is inputted or when models are trained. While the outputs of these models are not necessarily copyright violations, it is possible for them to violate copyright. The same standards for violation that apply to humans should apply to these models.

I entirely reject the claims that there should be one standard for humans and another for these models. Every time this debate pops up, people claim some province based on ‘intelligence’ or ‘conscience’ or ‘understanding’ or ‘awareness’. This is a meaningless argument because we have no clear understanding about what those things are. I’m not claiming anything about the nature of these models. I’m just pointing out that people love to apply an undefined standard to them.

We should apply the same copyright standards to people, models, corporations, and old-school algorithms.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines