There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

Willy ,

Huh. Who would have thought talking mostly or only to yourself would drive you mad?

Evotech ,

As long as you verify the output to be correct before feeding it back is probably not bad.

AllNewTypeFace ,
@AllNewTypeFace@leminal.space avatar

The Habsburg Singularity

Boxscape ,
@Boxscape@lemmy.sdf.org avatar
metaStatic ,

we have to be very careful about what ends up in our training data

Don't worry, the big tech companies took a snapshot of the internet before it was poisoned so they can easily profit from LLMs without allowing competitors into the market. That's who "We" is right?

Hamartiogonic , (edited )
@Hamartiogonic@sopuli.xyz avatar

A few years ago, people assumed that these AIs will continue to get better every year. Seems that we are already hitting some limits, and improving the models keeps getting harder and harder. It’s like the linewidth limits we have with CPU design.

ArcticDagger OP ,

I think that hypothesis still holds as it has always assumed training data of sufficient quality. This study is more saying that the places where we’ve traditionally harvested training data from are beginning to be polluted by low-quality training data

HowManyNimons ,

It’s almost like we need some kind of flag on AI-generated content to prevent it from ruining things.

ArcticDagger OP ,

From the article:

To demonstrate model collapse, the researchers took a pre-trained LLM and fine-tuned it by training it using a data set based on Wikipedia entries. They then asked the resulting model to generate its own Wikipedia-style articles. To train the next generation of the model, they started with the same pre-trained LLM, but fine-tuned it on the articles created by its predecessor. They judged the performance of each model by giving it an opening paragraph and asking it to predict the next few sentences, then comparing the output to that of the model trained on real data. The team expected to see errors crop up, says Shumaylov, but were surprised to see “things go wrong very quickly”, he says.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines