AI models fed AI-generated data quickly spew nonsense

Etterra , 40 minutes ago

How many times did you say this went through a copy machine?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

andallthat , 1 hour ago

I only have a limited and basic understanding of Machine Learning, but doesn’t training models basically work like: “you, machine, spit out several versions of stuff and I, programmer, give you a way of evaluating how ‘good’ they are, so over time you ‘learn’ to generate better stuff”? Theoretically giving a newer model the output of a previous one should improve on the result, if the new model has a way of evaluating “improved”.

If I feed a ML model with pictures of eldritch beings and tell them that “this is what a human face looks like” I don’t think it’s surprising that quality deteriorates. What am I missing?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

takeda , 1 hour ago

I find it surprising that anyone is surprised by it. This was my initial reaction when I learned about it.

I thought that since they know the subject better than myself they must have figured this one out, and I simply don’t understand it, but if you have a model that can create something, because it needs to be trained, you can’t just use itself to train it. It is similar to not being able to generate truly random numbers algorithmically without some external input.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

veganpizza69 , 1 hour ago

GOOD.

This “informational incest” is present in many aspects of society and needs to be stopped (one of the worst places is in the Intelligence sector).

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Etterra , 29 minutes ago

Informational Incest is my least favorite IT company.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Whirling_Cloudburst , 2 hours ago

When you fed your AI too much mescaline.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Willy , 3 hours ago

Huh. Who would have thought talking mostly or only to yourself would drive you mad?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Evotech , 3 hours ago

As long as you verify the output to be correct before feeding it back is probably not bad.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

AllNewTypeFace , 3 hours ago

The Habsburg Singularity

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Boxscape , 3 hours ago

AI like:

https://media1.tenor.com/m/tS2rXuhTts0AAAAC/james-franco-smile.gif

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

metaStatic , 4 hours ago

we have to be very careful about what ends up in our training data

Don't worry, the big tech companies took a snapshot of the internet before it was poisoned so they can easily profit from LLMs without allowing competitors into the market. That's who "We" is right?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

veganpizza69 , 1 hour ago

The retroactive enclosure of the digital commons.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Hamartiogonic , 4 hours ago (edited 4 hours ago)

A few years ago, people assumed that these AIs will continue to get better every year. Seems that we are already hitting some limits, and improving the models keeps getting harder and harder. It’s like the linewidth limits we have with CPU design.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

ArcticDagger OP , 3 hours ago

I think that hypothesis still holds as it has always assumed training data of sufficient quality. This study is more saying that the places where we’ve traditionally harvested training data from are beginning to be polluted by low-quality training data

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

HowManyNimons , 3 hours ago

It’s almost like we need some kind of flag on AI-generated content to prevent it from ruining things.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

ArcticDagger OP , 4 hours ago

From the article:

To demonstrate model collapse, the researchers took a pre-trained LLM and fine-tuned it by training it using a data set based on Wikipedia entries. They then asked the resulting model to generate its own Wikipedia-style articles. To train the next generation of the model, they started with the same pre-trained LLM, but fine-tuned it on the articles created by its predecessor. They judged the performance of each model by giving it an opening paragraph and asking it to predict the next few sentences, then comparing the output to that of the model trained on real data. The team expected to see errors crop up, says Shumaylov, but were surprised to see “things go wrong very quickly”, he says.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...