There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

The AI-focused COPIED Act would make removing digital watermarks illegal (as well as training any kind of AI on copyrighted content)

A bipartisan group of senators introduced a new bill to make it easier to authenticate and detect artificial intelligence-generated content and protect journalists and artists from having their work gobbled up by AI models without their permission.

The Content Origin Protection and Integrity from Edited and Deepfaked Media Act (COPIED Act) would direct the National Institute of Standards and Technology (NIST) to create standards and guidelines that help prove the origin of content and detect synthetic content, like through watermarking. It also directs the agency to create security measures to prevent tampering and requires AI tools for creative or journalistic content to let users attach information about their origin and prohibit that information from being removed. Under the bill, such content also could not be used to train AI models.

Content owners, including broadcasters, artists, and newspapers, could sue companies they believe used their materials without permission or tampered with authentication markers. State attorneys general and the Federal Trade Commission could also enforce the bill, which its backers say prohibits anyone from “removing, disabling, or tampering with content provenance information” outside of an exception for some security research purposes.

(A copy of the bill is in he article, here is the important part imo:

Prohibits the use of “covered content” (digital representations of copyrighted works) with content provenance to either train an AI- /algorithm-based system or create synthetic content without the express, informed consent and adherence to the terms of use of such content, including compensation)

autotldr Bot ,

This is the best summary I could come up with:


A bipartisan group of senators introduced a new bill to make it easier to authenticate and detect artificial intelligence-generated content and protect journalists and artists from having their work gobbled up by AI models without their permission.

Content owners, including broadcasters, artists, and newspapers, could sue companies they believe used their materials without permission or tampered with authentication markers.

State attorneys general and the Federal Trade Commission could also enforce the bill, which its backers say prohibits anyone from “removing, disabling, or tampering with content provenance information” outside of an exception for some security research purposes.

Senate Majority Leader Chuck Schumer (D-NY) led an effort to create an AI roadmap for the chamber, but made clear that new laws would be worked out in individual committees.

“The capacity of AI to produce stunningly accurate digital representations of performers poses a real and present threat to the economic and reputational well-being and self-determination of our members,” SAG-AFTRA national executive director and chief negotiator Duncan Crabtree-Ireland said in a statement.

“We need a fully transparent and accountable supply chain for generative Artificial Intelligence and the content it creates in order to protect everyone’s basic right to control the use of their face, voice, and persona.”


The original article contains 384 words, the summary contains 203 words. Saved 47%. I’m a bot and I’m open source!

Grimy OP , (edited )

This is essentially regulatory capture. The article is very lax on calling it what it is.

A few things to consider:

  • Laws can’t be applied retroactively, this would essentially close the door behind Openai, Google and Microsoft. Openai with sora in conjunction with the big Hollywood companies will be the only ones able to do proper video generation.
  • Individuals will not be getting paid, databrokers will.
  • They can easily pay pennies to a third world artist to build them a dataset copying a style. Styles are not copyrightable.
  • The open source scene is completely dead in the water and so is fine tuning for individuals.

Edit: This isn’t entirely true, there is more leeway for non commercial models, see comments below.

  • AI isn’t going away, all this does is force us and the economy into a subscription model.
  • Companies like Disney, Getty and Adobe reap everything.

In a perfect world, this bill would be aiming to make all models copyleft instead but sadly, no one is lobbying for that in Washington and money talks.

just_another_person ,

deleted_by_moderator

  • Loading...
  • cm0002 ,

    Yup, I fucking knew it. I knew this is what would happen with everyone bitching about copyright this and that. I knew any legislation that came as a result was going be bastardized and dressed up to make it look like it’s for everyone when in reality it’s going to mostly benefit big corps that can afford licensing fees and teams of lawyers.

    People could not/would not understand how these AI models actually processes images/text or the concept of “If you post publicly, expect it to be used publicly” and here we are…

    LainTrain , (edited )

    As always, the anprims/luddites/ecofashies (who downvoted me) are like an anvil to left-wing ideas of progress, we’re too busy arguing amongst ourselves to make a stand to protect open source AI from regulation.

    Honestly I blame Hbomberguy personally. People were a lot more open-minded before he tacked on that shitty little AI snark at the end of his plagiarism video.

    catloaf ,

    This sounds exactly like existing copyright law and DRM.

    Grimy OP ,

    It’s strengthening copyright laws by negating the transformative clause when dealing with AI

    admin ,
    @admin@lemmy.my-box.dev avatar

    Hopefully the next step: force every platform that deals in user generated content to give users the choice to exploit that content for a fraction of the profit, or to exclude their content from processing.

    It’s amazing how many people don’t realize that they themselves also hold copyright over their content, and that laws like these protect them as well.

    just_another_person ,

    Don’t see an issue with this. People who scrape copyrighted content should pay for it.

    Badeendje ,
    @Badeendje@lemmy.world avatar

    Closing the door behind the ones that already did it means only the current groups that have the data will make money of it.

    just_another_person ,

    Is it Copyright content?

    v_krishna ,
    @v_krishna@lemmy.ml avatar

    This regulation (and similar being proposed in California) would not be applied retroactively.

    just_another_person ,

    Never mentioned any sort of retroactive measures.

    snooggums ,
    @snooggums@midwest.social avatar

    Since no retroactive measures are mentioned, the companies that already scraped the web won’t be stopped from continuing to use the AI models already trained on that data, but anyone else would be stopped by the law.

    It is like making it illegal to rob banks after someone already robbed all the banks and letting them keep all the money.

    The law could have made it illegal for use of models trained on the copyrighted materials without permission instead of targeting the process for collecting it.

    just_another_person ,

    Downvote all you want. If your entire business or personal model includes stealing content from other people, then you need to rethink that.

    Badeendje ,
    @Badeendje@lemmy.world avatar

    “stealing” implies the owner does not have it anymore… It is large studio speak.

    And I get what you are trying so say, I just think the copyright system is so broken that this shows it is in need of reform. Because if the qualm is with people doing immoral shit as a business model, there are long lists of corporations that will ask you to hold their beer.

    And the fact that the training of the models already occurred on these materials means that the owners of the current models are probably training on generated datasets meaning that by the time this actually hits court, the datasets with original copyrighted materials will be obsolete.

    just_another_person ,

    deleted_by_moderator

  • Loading...
  • LainTrain ,

    No, you are. You’ll be in a hell of your own making when you’re going to be paying a subscription to the corpo AIs just to remain employable.

    Honestly reading this shit I know why structures of oppression like capitalism etc exist, it is because of dumb motherfuckers like you and I feel absolutely zero sympathy, you deserve all of it and more for simping for Musk, Altman et al.

    just_another_person ,

    deleted_by_moderator

  • Loading...
  • Badeendje ,
    @Badeendje@lemmy.world avatar

    Well that’s a well articulated reply.

    I don’t understand why you would take this position. Because the small artists will never be able to avoid Beiing included in training sets, and if they are what are they going to do against a VC backed corpnlike OpenAI. All the while the big copyright “owners” will be excluded. Meaning this only cements the position of the mega corps.

    GBU_28 ,

    Comment breaks community standards.

    fuzzzerd ,

    Regarding obsolete models, that’s only partially true. There’s loads of content that are effectively “finished” and won’t be changing, and will grow obsolete at a fairly slow pace. Meaning they’ll be useful in the models once trained for years.

    Obviously new technology and similar ideas/content that didn’t exist when the model was created won’t be there, but the amount that changes and or is new is relatively small each year compared to all the historical content.

    GBU_28 ,

    Openai exec: oh shit damn,. Damn. I gotta call my mom.

    afraid_of_zombies ,

    Stealing: depriving you of what you own

    Copying: taking a picture of what you made.

    Stealing is not copying. You still have whatever you started with.

    afraid_of_zombies ,

    Yes it is perfectly appropriate for someone who burned a backup copy of a DVD they paid for to go to prison for ten years

    toothbrush , (edited )
    @toothbrush@lemmy.blahaj.zone avatar

    They did it. They’re passing the worst version of the AI law. Thats the end for open source AI! If this passes, all AI will be closed source, and only from giant tech companies. Im sure they will find a way to steal your stuff “legally”.

    LainTrain ,

    To the cheer of so-called progressives who never understood the tech and continue to be wilfully ignorant of it the corporations win again.

    2xsaiko ,
    @2xsaiko@discuss.tchncs.de avatar

    This is exactly what OpenAI etc. wanted to achieve with all the “AI safety” bullshit doomer talk. I really hope this doesn’t pass

    trashgirlfriend ,

    No open source plagiarism machine :(

    LainTrain , (edited )

    Dw artbros and other corporation defenders will get curbstomped by the closed-source ones instead, not only will you be out of employment, but you will be unemployable without a ChatGPT subscription, and Altman/Musk/whoever will be worth trillions as a result. But at least it won’t be “plagiarism” because the lobbyists will ensure that it’s all nice and legal.

    And the worst part is you honestly deserve it for not listening to us.

    Also, this is you:

    https://lemmy.dbzer0.com/pictrs/image/6ca542a8-98ea-4864-bb33-ca59d9459493.webp

    ZILtoid1991 ,

    Okay, then lets ban art generators, problem solved!

    LainTrain ,

    That’s also not rational, but at least it’s consistent so it’s an improvement.

    Anyway banning them is impossible, even if one country bans it, all the other countries will still have them - the internet is the whole world, remember? And even then, LLMs would still exist too, and arguably those are far more significant.

    ZILtoid1991 ,

    “Why ban bad thing if bad country allows bad thing?”

    People are saying the same about raising the minimum wage, implementing labor protections, etc. “Okay, advocate for fair wages, but then that minimum wage job of yours will be outsourced to China/India/Vietnam/etc.!”

    LainTrain ,

    GenAI isn’t a bad thing though.

    girsaysdoom ,

    Did you read the documents? It’s not as bad as what you’re saying.

    It looks like the prohibited acts (section 6) specifically mention for commercial purposes where attribution markers are separated from the content. So, commercial AI software that doesn’t retain these markers or copyright marker removal done to mislead or affect in a commercial way would be against the law in 2 years.

    I don’t see how this affects anything open source related. The way I understand it is that this will just force commercial applications to adapt to this and move on.

    toothbrush ,
    @toothbrush@lemmy.blahaj.zone avatar

    oh cool, nevermind then. However, most open source AI is done for commercial purposes, so it will still cripple the ecosystem.

    Kolanaki ,
    @Kolanaki@yiffit.net avatar

    This is actually pretty cool for small artists, but how would it handle things like iFunny and such adding watermarks to shit they don’t own in the first place?

    mke , (edited )

    I don’t think that’s the kind of watermark being talked about here, Kol.

    The National Institute of Standards and Technology would be called upon to, quoted from the COPIED ACT Summary, facilitate development of guidelines for voluntary, consensus-based standards and for detection of synthetic content, watermarking and content provenance information, including evaluation, testing and cybersecurity protection. I believe we’re talking about the unseen, math-y, certification and (I imagine) cryptography kind of digital watermark, not the crappy visual edits made by iFunny and co.

    In fact, since it also says:

    Prohibits removing, altering, tampering with, or disabling content provenance information, with a limited exception for security research purposes.

    The content in question might reach e.g. iFunny already “signed” and they wouldn’t be able to remove that.

    Of course, I’m saying this without actually fully understanding what fits under covered content (digital representations of copyrighted works). Does my OC on deviantart count as covered content? I think so, but I couldn’t tell you for certain. If anyone can help me understand this, please, that’d be really nice.

    And finally, as was already said by others, I think this does nothing about all the crap companies already did to artists, since the law can’t affect them retroactively. It’s not that cool for small artists, since they’ll still be abused, except big tech would have the legal monopoly on abuse.

    I mean no disrespect by this: did you read the article? I’m genuinely curious how you got the iFunny idea.

    General_Effort ,

    covered content?

    That is any digital representation of any copyrighted content. If it’s on DeviantArt, it is a digital representation. By creating something, you own the copyright. A notable exception is when you do the work for hire, in which case it probably belongs to your employer. (Or if the work lacks human creativity, in which case it is public domain.)

    Anything on DeviantArt is almost certainly covered content, with the possible exception of AI generated images, or re-uploads of public domain content.


    (5) COVERED CONTENT .—The term ‘‘covered content’’ means a digital representation, such as text, an image, or audio or video content, of any work of authorship described in section 102 of title 17, United States Code.

    For reference: www.law.cornell.edu/uscode/text/17/102

    mke ,

    Thank you for clarifying, and with a reference, too. That’s pretty much what I thought. It’s great to have confirmation, though.

    General_Effort ,

    cool for small artists

    It’s certainly very bad for small artists. Can I ask why you think it would be good?

    To answer why it is bad for small artists: Money for license fees will mainly go to major content owners like Getty, or Disney. Small artists will have to go through platforms like Shutterstock or Adobe, which will keep most of the fees. At the same time, AI tools like generative fill are becoming ever more important. Such licensing regimes make artists’ tools more expensive. Major corporations will be able to extract more value.

    Look at Adobe. It has a reputation for abusing its monopoly against small artists, right? Yet Adobe pays license fees for images on its platform, that it used for training its AI tools. Adobe has also created a provenance standard, such as this bill wants to make mandatory. This bill would make Adobe’s preferred business model the mandatory standard by law.

    conciselyverbose ,

    Disney won’t sell licenses.

    They’ll keep their monopoly on 95% of the training data on the market for entertainment content, so they can accelerate their workflows using those tools, and everyone else is now trying to compete with their giant wallet and their extra tooling you’re not allowed to compete with.

    afraid_of_zombies ,

    I am not understanding. They will now be even more effective at what they do and will be building a tech stack that no one else has access to and you think this is a win for the little guy?

    If you owned a store, then a Walmart setup next to it and Walmart rolled out some new barcode system that your scanners didn’t work on how the hell would that benefit you?

    Not only are the already bigger, they are getting efficiencies they aren’t sharing with your business. You want open protocols, you want tool sharing, you want the market as a whole to be getting more efficient. You don’t want some company that already won able to pulverize by sheer size anyone who competes.

    conciselyverbose ,

    No, it’s very obviously not a win for the little guy.

    If you can’t learn from public performances, art dies. That’s all of art for thousands of years.

    sab ,

    So what will happen to art if we only disallow AI models from learning from copyrighted performances, whilst still allowing them to do so from public domain and licensed works (and obviously not changing anything for humans who seek inspiration)?

    conciselyverbose ,

    Disney will own all of it.

    Because they already own most of it and now have a massive productivity advantage.

    fuzzzerd ,

    Nah. They will cross licence with the other big players effectively closing the market to anyone they don’t bless.

    conciselyverbose ,

    They already own all the big players.

    Kowowow ,

    Sure would be fun to expand things to include a section to not let normal people make art of copyrighted material or be an excuse to mess with fair use

    cyd ,

    If this passes, this would have the perverse effect of making China (and maybe to a lesser extent the Middle East) the leading suppliers of open source / open weight AI models…

    Melt ,

    China would be the world leader in making AI model trained on copyrighted content

    catloaf ,

    And as the vast majority of content is not licensed for AI model training, they would have an immensely larger dataset to train on.

    Petter1 ,

    Well, there is also Europe ✌🏻

    General_Effort ,

    No. In the EU, the lobbyists have already won. Major countries, like Germany, have always had very conservative copyright laws. I believe it’s one reason why their cultures are losing so hard.

    Surprisingly, Japan has adopted a very sensible law on AI training.

    afraid_of_zombies ,

    I am just sitting here with my eye twitching thinking of all the code I have had to deal with from German companies over the years.

    _sideffect ,

    A bit late now, isn’t it?

    All the big corporations have already trained most of their current ai, so all this does is put the up and comers at a disadvantage.

    MagicShel ,

    It could halt the progress of improving their models and stagnate the whole technology.

    That being said, it only halts progress for American companies. Other countries will happily ignore this law and grow beyond our capabilities. I’m not sure if that’s better or worse than the current situation.

    Kuvwert ,

    From what I understand the next rounds of ai are being trained on further refined versions of the same datasets and supplemented with synthetic data.

    The damage to existing copyrighted content is already done.

    Source: I’m a random internet user

    General_Effort ,

    It’s all still there. No damage was done.

    Kuvwert ,

    Well, perceived damage anyway. I can’t speak to how IP owners have been effected by LLMs, and I don’t believe it would be easy to quantify.

    bionicjoey ,

    Reminds me of Russia before WWI began. They realized they had fallen horribly behind the rest of the world in terms of military technology, so they called an arms limitation treaty conference where they pushed for basically every country in the world to agree to stop inventing any new weapons of any kind.

    fuzzzerd ,

    How’d that work out for them? Answer? Not well. History repeats itself, so here we go!

    admin ,
    @admin@lemmy.my-box.dev avatar

    Seeing as laws can’t be applied retroactively, what would have been the alternative?

    AlexanderESmith ,

    People's attention spans are 5 seconds long, and art/culture change constantly.

    If you prohibit them from training on new content, the models will age super poorly, and they'll fall into disuse.

    General_Effort ,

    It wouldn’t be prohibited. It would just mean that the likes of Reddit or Facebook can charge more for “consent” to train on their content.

    AlexanderESmith ,

    So stop using reddit.

    General_Effort ,

    You want to convince everyone to stop using Reddit, Facebook, etc. so that LLMs go away? You know that’s not going to work.

    AlexanderESmith ,

    Not "go away" so much as "become dated and useless".

    afraid_of_zombies ,

    Well as long as you are honest about your motivations I can give you that much.

    I don’t want Disney destroyed. I want them to pay creatives well and stop with their legal/lobbying games. That’s the difference, I want people to do the morally correct thing you want to punish people.

    AlexanderESmith ,

    I'm not sure what the dishonest motivations would be; I don't really have a problem with content generators, other than;

    • They're trained on data that trainers don't have rights to
    • They are awful, inaccurate, hallucinating garbage

    To the first point; If they (OpenAI, Adobe, Disney, et al) hired a bunch of people, paid them a fair wage to generate art (text, images, whatever), got permission (contractual, with residuals), trained a model, then used it responsibility (for concepts and drafts), then sure; have your models and use 'em.

    To the second point; I mentioned that the models aren't good, and it's because they aren't actually creating anything, just mashing old content together. I also mentioned before that the models need to be used responsibly; You can't just hit "generate" and ship it as final product. You need editors and artists to follow up on the model output. The model should be used to make tedius work easier, not replace talented artists.

    0laura ,
    @0laura@lemmy.world avatar

    you could use the models to train the models to get better at making new things.

    AlexanderESmith ,

    Not really. They start hallucinating pretty quick.

    Doomsider ,

    If you put something on the Internet you are giving up ownership of it. This is reality and companies taking advantage of this for AI have already proven this is true.

    You are not going to be able to put the cat back in the bag. The whole concept of ownership over art, ideas, and our very culture was always ridiculous.

    It is past time to do away with the joke of the legal framework we call IP law. It is merely a tool for monied interests to extract more obscene profit from our culture at this point.

    There is only one way forward and that is sweeping privacy protections. No more data collection, no more targeted advertising, no more dark patterns. The problem is corporations are not going to let that happen without a fight.

    nasi_goreng ,
    @nasi_goreng@lemmy.zip avatar

    deleted_by_author

  • Loading...
  • afraid_of_zombies ,

    Yeah in theory but in practice that isn’t happening. In theory the laws could be structured such that creatives are being paid fairly and distributors make some money and that the general public knows the stuff will be public domain in a relatively short period of time.

    No one is doing it and they had hundreds of years to figure out how to do it. You are asking us to take it on faith and I personally will not.

    LainTrain ,

    Incredibly well-put. IP is just land for the wannabe landlords of information and culture.

    They are just attempting to squeeze the working class dry, take the last freedoms we have so we have to use their corporate products.

    NeoNachtwaechter ,

    LOL

    So I take your photo, remove your watermark, put my own watermark on it, and then I sue you for removing my watermark.

    General_Effort ,

    Don’t be a fool. Of course, content corporations like Disney or the NYT are able to prove just when something was made.

    NeoNachtwaechter ,

    Don’t be a fool either.

    Of course I am going to do this to you, not to Disney etc. because I am way better at creating proof than you are.

    And of course Disney etc. are going to do this to you and me, because they are even better at creating proof than you and me are.

    That’s how foolish this law is.

    Womble ,

    So what you’re saying is that this is a law designed to extend corporate control over information and culture even further?

    General_Effort ,

    This bill reads like it was written by Adobe.

    This provenance labelling scheme already exists. Adobe was a major force behind it. (see here: en.wikipedia.org/…/Content_Authenticity_Initiativ… ). This bill would make it so that further development will be tax-funded through organizations like DARPA.

    Of course, they are also against fair use. They pay license fees for AI training. For them, it means more cash flow.

    explodicle ,

    It’s pretty cheap to just time stamp everything.

    riodoro1 ,

    So the rich have already scalped what they could. Now it can be made illegal

    admin ,
    @admin@lemmy.my-box.dev avatar

    Because even when some of the water has gotten out, you still go plug the dam.

    The best moment was earlier. The second best moment is now.

    Grimy OP ,

    This is more akin to diverting a public river into private land so the landowner can charge everyone what they were getting for free.

    The river cannot be dammed and this bill doesn’t aim to even try.

    A better solution would be to make all models copyleft, so even if corporations dip their cup in the water, whatever they produce has to be thrown back in.

    trollbearpig ,

    Maybe I’m missing something, but I don’t understand what you guys mean by “the river cannot be dammed”. The LLM models need to be retrained all the time to include new data and in general to get them to change their behavior in any way. Wouldn’t this bill apply to all these companies as soon as they retrain their models?

    I mean, I get the point that old models would be exempt from the law since laws can’t be retroactive. But I don’t get how that’s such a big deal. These companies would be stuck with old models if they refuse to train new ones. And as much hype as there is around AI, current models are still shit for the most part.

    Also, can you explain why you guys think this would stop open source models? I have always though that the best solution to stop these fucking plagiarism machines was for the open source community to create an open source training set were people contribute their art/text/whatever. Does this law prevents this? Honestly to me this panic sounds like people without any artistic talent wanted to steal the work of artists and they are now mad they can’t do it.

    Grimy OP ,

    The game right now is about better training methods and curating current datasets, new data is not needed.

    Obviously though, eventually they will want new data so their models aren’t stuck in the past but this won’t stop them from getting it. There isn’t a future where individuals negotiate with google on how much they get paid, all that data is already owned by the platform it’s being posted on. Almost all websites slap on their own copyright or something similar, even for images. Deviant art and even Cara, the platform that’s suppose to be artist friendly, does this. Anything uploaded to Google maps gets a copyright on it if I’m not mistaken, Reddit as well. This data will be prohibitively expensive as to create a moat and strengthen soft monopolies.

    Public datasets are great but aren’t enough in most cases. This is also the equivalent of saying “well they diverted the river, why don’t you build yourself a stream”. It’s also problematic since by it’s public nature, it means corporations can come over, dip their cup in the water and throw it into their river. It brings down their costs while making sure nothing can actually compete with them.

    Also worth noting that there is no worthy public dataset for videos. 98% of the data is owned by YouTube or Hollywood.

    trollbearpig ,

    My man, I think you are mixin a lot of things. Let’s go by parts.

    First, you are right that almost all websites get some copyright rights when you post on their platforms. At best, some license the content as Creative Commons or similar licenses. But that’s not new, that has been this way forever. If people are surprised that they are paying with their data at this point I don’t know what to say hahaha. The change with this law would be that no one, big tech companies or open source, gets to use this content for free to train new models right?

    Which brings me back to my previous question, this law applies to old data too right? You say “new data is not needed” (which is not true for chat LLMs that want to include new data for example), but old data is still needed to use the new methods or to curate the datasets. And most of this old data was acquired by ignoring copyright laws. What I get from this law is that no one, including these companies, gets to keep using this “illegaly” acquired data now right? I mean, I’m pretty sure this is the case since movie studios and similar are the ones pushing for this law, they will not go like “it’s ok you stole all our previous libraries, just don’t steal the new stuff” hahahaha.

    I do get your point that the most likely end result is that movie studios, record labels, social media platforms, etc, will just start selling the rights to train on their data and the only companies who will be able to afford this are the big tech companies. But still, I think this is a net possitive (weird times for me to be on the side of these awful companies hahaha).

    First of all, it means no one, including big tech companies, get to steal content that is not theirs or given to them willingly. I’m particularly interested in open source code, but the same applies to indie art and any other form of art outside of the big companies. When we say that we want to stop the plagiarism it’s not a joke. Tech companies are using LLMs to attack the open source community by stealing the code under the excuse of LLMs being transformative (bullshit of course). Any law that stops this is a possitive to me.

    And second of all, consider the 2 futures we have in front of us. Option one is we get laws like this, forcing AI to comply with copyright law. Which basically means we maintain the current status quo for intellectual property. Not great obviously, but the alrtenative is so much worse. Option two is we allow people to use LLMs to steal all the intellectual property they want, which puts an end to basically any market incentives to produce art by humans. Again, the current copyright system is awful. But why do you guys want a system were we as individuals have to keep complying with copyright but any company can bypass that with an LLM? Or how do you guys think this is going to pan out if we just don’t regulate AI?

    Grimy OP ,

    Google already paid 6 million to Reddit for their dataset (preemptively since I’m guessing they are lobbying for laws like this), I didn’t get a dime. Who do you think this helps here?

    The change with this law would be that no one, big tech companies or open source, gets to use this content for free to train new models right?

    My point is that this essentially insure that ONLY big tech companies will get to use the content. Do you think they mind spending a few million if it gives them a monopoly? They actively want this.

    If it’s between the platform I used getting paid for my content while I get nothing and then I have to pay Openai to use a tool built with my content or the platform and me getting nothing while I get free AI, I will chose the latter.

    There are two scenarios and in both, AI massively brings up productivity and huge layoffs happen. The difference is in one scenario, the tools are priced low enough so it’s economical to replace 5 workers with them but high enough so those same workers can’t afford them and compete with the business that just fired them. A situation where no company can remain competitive without paying Openai or Google 50k a month is a dystopian nightmare.

    Open source is the best way to make sure this doesn’t happen and while these laws are the smallest of speed bumps for big tech companies, it is a literal wall for FOSS.

    The best solution would be to copyleft all models using public data, the second best would be to leave things as is. This isn’t a solution but regulatory capture.

    trollbearpig ,

    My man, I think you are delisuonal hahahaha. You are giving AI way too much credit to a technology that’s just a glorified autocomoplete. But I guess I get your point, if you think that AI (and LLMs in particular hahahaha) is the way of the future and all that, then this is apocalyptic hahahahaha.

    But you are delisuonal my man. The only practical use so far for these stupid LLMs is autocomplete which works great when it works. And bypassing copyright law by pretending it’s producing novel shit. But that’s a whole other discussion, time will show this is just another bubble like crypto hahahaha. For now, I hope they at least force everyone to stop plagiarising other peoples work with AI.

    Grimy OP ,

    Prohibits the use of “covered content” (digital representations of copyrighted works) with content provenance to either train an AI- /algorithm-based system or create synthetic content without the express, informed consent and adherence to the terms of use of such content, including compensation

    This affects a lot more than just llms and essentially fucks any use of machine learning. You do not understand what you are defending. This kills kaggle and huggingface over night since I figure corporation will be able to keep already created datasets for internal use but distribution will be a no go.

    You also have to be willfully blind to seriously think llms have no use cases. Ignoring the entertainment value, it’s a huge productivity boost, chatbots using it are now commonplace on websites (I preferred when it was actual people but that’s beside the point). I work in research and we are currently building a bunch of internal tools to use with our data.

    Hahaha all you want but you are defending something completely against your own self interests and those of society.

    trollbearpig ,

    So you are saying that content scraped before the law is fair game to train new models? If so it’s fucking terrible. But again, I doubt this is the case since this would be against the interests of the big copyright holders. And if it’s not the case you are just creating a storm in glass of water since this affects the companies too.

    As a side point, I’m really curious about LLM uses. As a programmer the only useful product I have seen so far is copilot and similar tools. And I ended up disabling the fucking thing because it produces too much garbage hahaha. But I’m the first to admit I haven’t been following this hype cycle hahahaha, so I’m really curious what the big things will be. You clearly know so much, so want to enligten me?

    Grimy OP ,

    This bill is being built with the interests of the big tech companies in mind imo, big copyright holders are just an afterthought. I figure since big tech spent quite a bit of money building those datasets and since they were built before the law, they will be able to keep using them as long as they don’t add anything new but I can’t be certain.

    The use cases are vast. This is a huge boon for the indie gaming and animation industry. I’m seriously excited to have NPCs running on llms and don’t want to be forced into a subscription just to play my games. It’s also going to bring smart homes to an other level. Systems can be built that are much stronger than Alexa without having to send all that insanely private data to Amazon. There’s a huge privacy issue if all the available models only run on Google or openais cloud, but I won’t get into that (not to mention that these corporate llms will eventually be trained for advertisement and will essentially be poisoned to prefer whoever is paying its creator).

    I’ll give some more concrete example with my work but it will be a bit vague to preserve my anonymity.

    I work in research (I originally studied software engineering and robotics) and we have about 20 years worth of projects. None of it is standardized and it’s honestly a mess. I built a system in the space of a few days that grabs everyone of those docs, reads through it with an LLM and then classifies them doc per doc into an excel sheet with a SharePoint link. I’ve got 20 columns in there, it summarizes them, choses from a list of 30 types of documents I gave it, extracts related towns and people as well as companies and domain, it extracts the columns if there are any tables inside and generally establishes a bunch of different relationships. It doesn’t sound like much but doing it by hand would have been weeks of tedious work. My computer did it in 20 minutes using a local LLM so any sensitive client data doesn’t leave the building.

    Right now I’m working on a GraphRAG system that will take all those docuuments and turns into into vectors, then an LLM adds relationships to those vectors. It will be incorporated into an internal chatbots so people can ask questions and not only get a natural language answer but have the references where the information was found and quick access to it. It’s vector search on steroids and will cost nothing to run. I’m planning on eventually training the chatbots itself on our data so it can have a better understanding of our research sector as well as direct access to all the documents.

    Next is building something that gets info automatically from the web. Sometimes we have to create long Excel sheets with a bunch of different data points. We stay at a state level usually but it can sometimes mean 1000 businesses and we have to google each one manually and find the info. It’s sometimes weeks of work and honestly sucks doing. Llms are entirely capable of doing this kind of work and would take a few hours at most, again at no cost.

    These things are seriously great whenever it’s dealing with data that isn’t just numbers and is hard to quantify. I hate Reddit and will never create an account there after what happened but I still go daily to the localllama subreddit, it’s a great source of information if you want to keep abreast with what’s happening.

    trollbearpig ,

    I figure since big tech spent quite a bit of money building those datasets and since they were built before the law, they will be able to keep using them as long as they don’t add anything new but I can’t be certain.

    This is a very weird assumption you are making man. The quoted text you sent above pretty much says the opposite. It says everyone who wants to train their models wirh copyrigthed data needs to get permission from the copyright holders. That is great for me period. No one, not a big company nor the open source community, gets to steal the work of people producing art, code, etc. I honestly don’t get why you assume all the data scrapped before would be exempt. Again, very weird assumption.

    As for ML algorithms having use, of course they have. Hell, pretty much every company I have worked with has used them for decades. But take a look at the examples you provided. None of them requires you or your company scrapping a bunch of information from randoms on the internet. Specially not copyrighted art, literature, or code. And that’s the point here, you are acting like all of that stops with these laws but that’s ridiculous.

    Grimy OP ,

    The article is pro corpo, I’m looking at the bill and it’s quite clear where it’s headed.

    None of what I mentioned is possible without the LLM that’s at its heart. Just training an LLM is a million or two in compute power. We don’t get the next generation for free if laws like this tack on an extra 80 million. 6 million for Reddit and that was when you could scrap it for free, and that’s just a drop in the bucket.

    afraid_of_zombies ,

    Yeah it is really messed up that Disney made untold tens of billions of dollars on public domain stories, effectively cut us off from our own culture, then extended the duration to indefinite. I wonder why near everyone was silent about this issue for multiple decades until it became cliche to pretend to care about furry porn creators.

    Creatives have always been screwed, we are the first civilization to not only screw them but screw the general public. As shit as it was in the past you could just copy a freaken scroll.

    Anyway you guys have fun defending some of the worst assholes in human history while acting like you care about people you weren’t even willing to give a buck a month to on patreon.

    96VXb9ktTjFnRi ,

    I don’t like AI but I hate intellectual property. And the people that want to restrict AI don’t seem to understand the implications that has. I am ok with copying as I think copyright is a load of bullocks. But they aren’t even reproducing the content verbatim are they? They’re ‘taking inspiration’ if you will, transforming it into something completely different. Seems like fair use to me. It’s just that people hate AI, and hate the companies behind it, and don’t get me wrong, rightfully so, but that shouldn’t get us all to stop thinking critically about intellectual property laws.

    admin ,
    @admin@lemmy.my-box.dev avatar

    I’m the opposite, actually. I like generative AI. But as a creator who shares his work with the public for their (non-commercial) enjoyment, I am not okay with a billionaire industry training their models on my content without my permission, and then use those models as a money machine.

    interdimensionalmeme ,

    This law will ensure only giant tech company have this power. Hobbyists and home players will be prevented.

    admin ,
    @admin@lemmy.my-box.dev avatar

    What are you basing that on?

    Content owners, including broadcasters, artists, and newspapers, could sue companies they believe used their materials without permission or tampered with authentication markers.

    Doesn’t say anything about the right just applying to giant tech companies, it specifically mentions artists as part of the protected content owners.

    interdimensionalmeme ,

    That’s like saying you are just as protected regardless which side of the mote you stand on.

    It’s pretty clear the way things are shaping up is only the big tech elite will control AI and they will lord us over with it.

    The worst thing that could happen with AI. It falling into the hands of the elites, is happening.

    admin ,
    @admin@lemmy.my-box.dev avatar

    I respectfully disagree. I think small time AI (read: pretty much all the custom models on hugging face) will get a giant boost out of this, since they can get away with training on “custom” data sets - since they are too small to be held accountable.

    However, those models will become worthless to enterprise level models, since they wouldn’t be able to account for the legality. In other words, once you make big bucks of of AI you’ll have to prove your models were sourced properly. But if you’re just creating a model for small time use, you can get away with a lot.

    interdimensionalmeme ,

    I am skeptical that this is how it will turn out. I don’t really believe there will be a path from 0$ to challenging big tech without a roadblock of lawyers shutting you down with no way out on the way.

    admin ,
    @admin@lemmy.my-box.dev avatar

    I don’t think so either, but to me that is the purpose.

    Somewhere between small time personal-use ML and commercial exploitation, there should be ethical sourcing of input data, rather than the current method of “scrape all you can find, fuck copyright” that OpenAI & co are getting away with.

    interdimensionalmeme ,

    I mean this is exactly the kind of regulation that microsoft/openai is begging for to cement their position. Then is going to be just a matter of digesting their surviving competitors until only one competitor remains, similar to Intel / AMD relationship. Then they can have a 20 year period of stagnation while they progressively screw over customers and suppliers.

    I think that’s the bad ending. By desperately trying to keep the old model of intellectual property going, they’re going to make the real AI nightmare of an elite few in control of the technology with an unconstrained ability to leverage the benefits and further solidifying their lead over everyone else.

    The collective knowledge of humanity is not their exclusive property. It also isn’t the property of whoever is the lastest person to lay a claim to an idea in effective perpetuity.

    admin ,
    @admin@lemmy.my-box.dev avatar

    Why?

    Once this passes, OpenAI can’t build ChatGPT on the same (“stolen”) dataset. How does that cement their position?

    Taking someone’s creation (without their permission) and turning it into a commercial venture, without giving payment or even attribution is immoral.

    If a creator (in the widest meaning of the word) is fine with their works being used as such - great, go ahead. But otherwise you’ll just have to wait before the work becomes public domain (which obviously does not mean publicly available).

    rekorse ,

    Just because intellectual property laws currently can be exploited doesnt mean there is no place for it at all.

    96VXb9ktTjFnRi ,

    That’s an opinion you can have, but I can just as well hold mine, which is that restricting any form of copying is unnatural and harmful to society.

    rekorse ,

    Do you believe noone should be able to charge money for their art?

    96VXb9ktTjFnRi ,

    That’s right. They can put their art up for sale, but if someone wants to take a free copy nothing should be able to stop them.

    rekorse ,

    That effectively makes all art free. At best its donation based.

    96VXb9ktTjFnRi ,

    Yes, that would be best.

    rekorse ,

    That would lead to most art being produced by people who are wealthy enough to afford to produce it for free, wouldn’t it?

    What incentive would a working person have to work on becoming an artist? Its not like artists are decided at birth or something.

    96VXb9ktTjFnRi ,

    Most people who make art don’t make any money from it. Some make a little bit of money. A small number of people can afford a living just by making art, and just a fraction of that actually get most of the money that’s being earned by artists, and then of course there is a lot of money that’s being paid for art that never reaches the artist. The business as it is is not working very well for anyone except for some big media companies. The complete lack of commercial success hasn’t stopped a lot of artists, it won’t stop them in the future. Thank god, because it wouldn’t be the first time that after decades of no commercial success whatsoever such an outsider is discovered by the masses. Sure, lack of commercial success has stopped others, but that’s happening now just as it will happen without copyright laws. If donating to artists out of free will would be the norm, and people knew that that’s the main source of income for certain types of artists, then I’m sure a lot of people would do so. And aside from private donations there could be governments and all sorts of institutions financing art. And if someone still can’t make a living, then still none of that could legitimize copyright in my view. We should strive for a world where everyone that wants to follow up on their creative impulses has time and opportunity to do so, irrespective of their commercial success. But there should also be unrestricted access to knowledge, ideas, art, etc. Brilliant research, photography or music shouldn’t be reserved for those who can afford access. The public domain should be the norm so that our shared project of human creativity can reach maximum potential. Copyright seems to me to be a rather bizarre level of control over the freedom of others. It’s making something public for others to see, but then telling these people you’re not allowed to be inspired by it, you can’t take a free copy to show others, you can’t take the idea and do with it as you please. It’s severely limiting us culturally, it’s harming human creativity. And at the same time it’s hypocritical. Artistic ideas are often completely based of the ideas of others, everyone can see that the output is the result of a collective effort. The Beatles didn’t invent pop music, they just made some songs, precisely copying all that came before them, and then added a tiny bit of their own. And that’s not a criticism, that’s how human creativity functions. That’s what people should strive for. To limit copying, is to limit humanity in it’s core. Again, human creativity is very clearly a collective effort. But despite this fact, when someone gets successful suddenly it’s a personal achievement and they are allowed to ask for a lot of money for it. Well my answer is, yes they are allowed to ask, and I am very willing to pay, but they shouldn’t be allowed to go beyond asking, they shouldn’t be allowed to restrict access of something that has been published.

    rekorse ,

    What if there was some sort of model that would pay an artist outright for their contributions now and into the future. Like crowdsourcing art from your favorite artists.

    It might cost a lot if a lot of people want something from them of course, if demand is high. They might even work out a limited payment scheme where you pay for limited access to the art for less.

    Sound a lot like we have now?

    And right now, I have to disagree, most artists create with the hope they can make big money, which wouldnt exist without artists who make big money. All artists should be making more money, and even the wealthy artists now have people above them making more money than them who have nothing to do with art.

    We dont need to throw out all of our ideas, we just need to keep increasing visibility into industries and advocating for the artist (or the entry level worker, or the 9-5ers, or any other of those who produce everything a company profits off of but are unfairly compensated for it).

    For you to argue AI will help artists is absurd. They’ve been stolen from, and now the result of that theft is driving them out of work. It only is good for artists if by artists you mean yourself, and anyone else who only cares about the self. Same people who tend to use societal arguments only when it benefits them somehow, which is ironic isnt it?

    afraid_of_zombies ,

    True but you people have had hundreds of years to fix the system and have not.

    Adderbox76 ,

    They’re ‘taking inspiration’ if you will, transforming it into something completely different.

    That is not at all what takes place with A.I.

    An A.I. doesn’t “learn” like a human does. It aggregates multiple chunks from multiple sources. It’s just really really tiny chunks so it’s hard to tell sometimes.

    That’s why you can ask two AI’s to write a story based on the same prompt and some of their lines will be exactly the same. Because it’s not taking inspiration from, it’s literally copying bits and pieces of other works and it happens that they both chose that particular bit.

    If you do that when writing a paper in university it’s called plagerism.

    Get the fuck out of here with your “A.I. takes inspiration…” it copies nothing more. It doesn’t add anything new to the sum total of the creative zeitgeist because it’s just remixes of things that already exist.

    ricdeh ,
    @ricdeh@lemmy.world avatar

    You just reiterate what other anti-ML extremists have said like a sad little parrot. No, LLMs don’t just copy. They network information and associations and can output entirely new combinations of them. To do this, they make use of neural networks, which are computational concepts analogous to the way your brain works. If, according to you, LLMs just copy, then that’s all that you do as well.

    LainTrain , (edited )

    it copies nothing more

    it’s just remixes of things that already exist.

    So it does do more than copying? Because as you said - it remixes.

    It sounds like the line you’re trying to draw is not only arbitrary, but you yourself can’t even stick with it for more than one sentence.

    Everything new is some unique combination of things that already exist, the elements it draws from are called sources and influences, and rules according to which they’re remixed are called techniques/structures e.g. most movies are three acts, and many feature specific techniques like J-cuts.

    Heck even re-arranging elements of just one thing is a unique and different thing, or is your favourite song and a remix of it literally the same? Or does the remix not have artistic value, even though someone out there probably likes the remix, but not the original?

    I think your confusion stems from the fact you’re a top shelf, grade-A Moron.

    You’re an organic, locally sourced and ethically produced idiot, and you need to learn how basic ML works, what “new” is, and glance at some basic epistemology and metaphysics before you lead us to ruin because you don’t even understand what “new” entails, before your reactionary rhetoric leads us all down straight to cyberpunk dystopias.

    NikkiDimes ,

    Damn, attack the argument, not the person, homie.

    LainTrain ,

    Yeah, sorry

    afraid_of_zombies ,

    You can do the same thing with the Hardy Boys. You can find the same page word for word in different books. You can also do that with the Bible. The authors were plagiarizing each other.

    It doesn’t add anything new to the sum total of the creative zeitgeist because it’s just remixes of things that already exist.

    Do yourself a favor and never ever go into design of infrastructure equipment or eat at a Pizza Hut or get a drink from Starbucks or work for an American car company or be connected to Boeing.

    Everyone has this super impressive view of human creativity and I am waiting to see any of it. As far as I can tell the less creative you are the more success you will have. But let me guess you ride a Segway, wear those shoes with toes, have gone through every recipe of Julia Childs, and compose novels that look like Finnegan’s Wake got into a car crash with EE Cummings and Gravity’s Rainbow.

    Now leave me alone with I eat the same burger as everyone else and watch reruns of Family Guy in my house that looks like all the other ones on the street

    IzzyJ ,
    @IzzyJ@lemmy.world avatar

    Consider youtube poop, Im serious. Everyclip in them is sourced from preexisting audio and video, and mixed or distorted in a comedic format. You could make an AI to make youtube poops using those same clips and other “poops” as training data. What it outputs might be of lower quality, but in a technical sense it would be made in an identical fashion. And, to the chagrin of Disney, Nintendo, and Viacom, these are considered legally distinct entities; because I dont watch Frying Nemo in place of Finding Nemo. So why would it be any different when an AI makes it?

    General_Effort ,

    This is a brutally dystopian law. Forget the AI angle and turn on your brain.

    Any information will get a label saying who owns it and what can be done with it. Tampering with these labels becomes a crime. This is the infrastructure for the complete control of the flow of all information.

    Throw_away_migrator ,

    Maybe I’m missing something, but my read is that it creates a mechanism/standard for labeling content. If content is labeled under this standard, it is illegal to remove the labeling or use it in a way the labeling prohibits. But I don’t see a requirement to label content with this mechanism.

    If that’s the case I don’t see a problem. Now, if all the content is required to be labeled, then yes it’s a privacy nightmare. But my interpretation was that this is a mechanism to prevent AI companies from gobbling up content without consent and saying, “What? There’s nothing saying I couldn’t use it.”

    IzzyJ ,
    @IzzyJ@lemmy.world avatar

    Most everyone from corporations to tumblr artists will be opting into that. While it doesnt guarantee an information dystopia, it does enable it

    I download images from the internet and remove watermarks to edit them in youtube videos as visual aid. I add a credit to the description because Im not a cunt, I just do it to make the video look better. I dont monetize content. Utterly and totally harmless, and would be illegal with such a label

    General_Effort ,

    It’s rather more than that. In the very least, it is a DRM system, meant to curtail fair use. We’re not just talking about AI training. The AutoTLDR bot here would also be affected. Manually copy/pasting articles while removing the metadata becomes illegal. Platforms have a legal duty to stop copyright infringement. In practice, they will probably have to use the metadata label to stop reposts and re-uploads of images and articles.

    This bill was obviously written by lobbyists for major corpos like Adobe. This wants to make the C2PA standard legally binding. They have been working on this for the last couple years. OpenAI already uses it.

    In the very least, this bill will entrench the monopolies of the corporations behind it; at the expense of the rights of ordinary people.


    I don’t think it’ll stop there. Look at age verification laws in various red states and around the world. Once you have this system in place, it would be obvious to demand mandatory content warnings in the metadata. We’re not just talking about erotic images but also about articles on LGBTQ matters.

    More control over the flow of information is the way we are going anyway. From age-verification to copyright enforcement, it’s all about making sure that only the right people can access certain information. Copyright used to be about what businesses can print a book. Now it’s about what you can do at home with your own computer. We’re moving in this dystopian direction, anyway, and this bill is a big step.


    The bill talks about “provenance”. The ambition is literally a system to track where information comes from and how it is processed. If this was merely DRM, that would be bad enough. But this is an intentionally dystopian overreach.

    EG you have cameras that automatically add the tracking data to all photos and then photoshop adds data about all post-processing. Obviously, this can’t be secure. (NB: This is real and not hypothetical. More)

    The thing is, a door lock isn’t secure either. It takes seconds to break down a door, or to break a window instead. The secret ingredient is surveillance and punishment. Someone hears or sees something and calls the police. To make the ambition work, you need something at the hardware level in any device that can process and store data. You also need a lot of surveillance to crack down on people who deal in illegal hardware.

    I’m afraid, this is not as crazy as it sounds. You may have heard about the recent “Chat Control” debate in the EU. That is a proposal, with a lot of support, that would let police scan the files on a phone to look for “child porn” (mind that this includes sexy selfies that 17-year-olds exchange with their friends). Mandatory watermarking, that let the government trace a photo to the camera and its owner, is mild by comparison.


    The bill wants government agencies like DARPA to help in the development of better tracking systems. Nice for the corpos that they get some of that tax money. But it also creates a dynamic in the government that will make it much more likely that we continue on a dystopian path. For agencies, funding will be on the line; plus there are egos. Meanwhile, you still have the content industry lobbying for more control over its intellectual “property”.

    msgraves ,

    Exactly, this isn’t about any sort of AI, this is the old playbook of trying to digitally track images, just with the current label slapped on. Regardless of your opinion on AI, this is a terrible way to solve this.

    ArchRecord ,

    It’s like applying DRM law to all media ever. And we know the problems with DRM already, as exemplified 2 decades ago by Cory Doctorow in his talk at Microsoft to convince them not to endorse and use it.

    werefreeatlast ,

    Introducing Chat-Stupid! It just like Chat-GPT but it wishes for any conversation with humans so it can legally learn…don’t disclose company secrets or it will legally learn those too.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines