Activity - AI models don’t resynthesize their training data. They use their training data... - kbin.life

There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

BluesF , 1 month ago

AI models don’t resynthesize their training data. They use their training data to determine parameters which enable them to predict a response to an input.

Consider a simple model (too simple to be called AI but really the underlying concepts are very similar) - a linear regression. In linear regression we produce a model which follows a straight line through the “middle” of our training data. We can then use this to predict values outside the range of the original data - albeit will less certainty about the likely error.

In the same way, an LLM can give answers to questions that were never asked in its training data - it’s not taking that data and shuffling it around, it’s synthesising an answer by predicting tokens. Also similarly, it does this less well the further outside the training data you go. Feed them the right gibberish and it doesn’t know how to respond. ChatGPT is very good at dealing with nonsense, but if you’ve ever worked with simpler LLMs you’ll know that typos can throw them off notably… They still respond OK, but things get weirder as they go.

Now it’s certainly true that (at least some) models were trained on CSAM, but it’s also definitely possible that a model that wasn’t could still produce sexual content featuring children. It’s training set need only contain enough disparate elements for it to correctly predict what the prompt is asking for. For example, if the training set contained images of children it will “know” what children look like, and if it contains pornography it will “know” what pornography looks like - conceivably it could mix these two together to produce generated CSAM. It will probably look odd, if I had to guess? Like LLMs struggling with typos, and regression models being unreliable outside their training range, image generation of something totally outside the training set is going to be a bit weird, but it will still work.

None of this is to defend generating AI CSAM, to be clear, just to say that it is possible to generate things that a model hasn’t “seen”.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Federation

Status:

/m/[email protected]

Threads (14800)

Thread

jeffw

@[email protected]

Added: 1 month ago
Views: 11
Online: -
Ratio: 0

Magazine

news

@[email protected]

Welcome to the News community!

Rules:

Be civil___ Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. Trolling is uncivil and is grounds for removal and/or a community ban. ___
All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.___ Obvious right or left wing sources will be removed at the mods discretion. We have an actively updated blocklist, which you can see here: lemmy.world/post/2246130 if you feel like any website is missing, contact the mods. Supporting links can be added in comments or posted seperately but not to the post body. ___
No bots, spam or self-promotion.___ Only approved bots, which follow the guidelines for bots set by the instance, are allowed. ___
Post titles should be the same as the article used as source.___ Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post. ___
Only recent news is allowed.___ Posts must be news from the most recent 30 days. ___
All posts must be news articles.___ No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis. ___
No duplicate posts.___ If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5. ___
Misinformation is prohibited.___ Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided. ___
No link shorteners.___ The auto mod will contact you if a link shortener is detected, please delete your post if they are right. ___
Don't copy entire article in your post body___ For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

Created: 11 months ago
Owner: r00ty
Subscribers: 1
Online: -

Threads 14800
Comments 384984
Posts 14
Replies 157
Moderators 1
Moderation log 1942

Moderators

r00ty

Active people