There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

hedge OP ,
@hedge@beehaw.org avatar

Oops, sorry, forgot: archive.ph/ylJHc

FaceDeer ,
@FaceDeer@kbin.social avatar

For those who can't get through the paywall, this is an article about a system called Kudurru that is monitoring a bunch of websites with images listed in the LAION-5B metadata set. When it sees the same IP address downloading images from those websites simultaneously, it assumes that it must be a bot that's scraping the data in order to train an AI with it and either blocks them or "poisons" the scrape by sending incorrect images back.

Frankly, I don't see much likely impact from this. AI training has moved beyond simply using LAION-5B, we're discovering that a smaller higher-quality dataset is better than just throwing mountains of data at the AI in training. So anything a trainer is downloading is going to be extensively curated before being used for training and this sort of obstruction will be fixed or filtered out.

Moonrise2473 ,

Thanks

mkhoury ,
@mkhoury@lemmy.ca avatar

But the main result is achieved anyway, right? The picture that the system tried to download did not make it into the training set.

FaceDeer ,
@FaceDeer@kbin.social avatar

Unless the "this sort of obstruction will be fixed" part means the image is downloaded anyway. This is the weakest sort of DRM.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines