There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

fedia.io

Tomato666 , to fediverse in Blocking AI crawlers on the fediverse

Surely the AI crawler company can set up their own node. They post nothing but collect everything going forward from the time they go live?

cecep OP ,
@cecep@fedia.io avatar

After reading your comment I was disappointed openai.social doesn't exist

andyburke ,
@andyburke@fedia.io avatar

They don't want AI to hate itself, so they don't want our training data, thankfully.

ptz , to fediverse in Blocking AI crawlers on the fediverse
@ptz@dubvee.org avatar

Really, there’s only one way to prevent that, but it would offer no guarantees; the instance with the weakest security in the group would allow your posts to be crawled.

It would require an agreement among instances to block crawler bot traffic (by user-agent, known IPs, etc) and only federating, via allow lists, with instances that adhere to the agreement. At that point, it’s more of a federated private forum, but there would still be some benefit I guess.

will_a113 , to fediverse in Blocking AI crawlers on the fediverse

I wonder if content should carry some license automatically. Like if you agree to the TOS of an instance, your comments are automatically all licensed as CC:BY or CC:O or the more restrictive license of choice of the instance owner.

hollyberries ,

There’s someone running around lemmy with a creative commons sharealike link as a signature. Quite funny to be honest. I can’t remember the username though. They’re bound to show up sooner or later :)

All rights reserved.

Rentlar ,

Oh yeah it was @onlinepersona

You go champ! If an AI starts ending their posts with a CC BY-NC-SA license I know who to credit!

onlinepersona ,

You’re welcome

CC BY-NC-SA 4.0

ArbitraryValue ,

I don’t think that would make much of a difference. Training AI on copyright-protected data appears to be fair use.

FaceDeer ,
@FaceDeer@kbin.social avatar

Yup. There are dumps of Reddit's entire archive of comments and posts available via torrent, I suspect the only reason Reddit's getting paid for that stuff right now is that it's a legal ass-covering that's comparatively cheap. Anyone who's a little daring could use it to train an LLM and if they prep the data well enough it'd be hard to even notice.

CameronDev , to fediverse in Blocking AI crawlers on the fediverse

But robots.txt is not a legal document — and 30 years after its creation, it still relies on the good will of all parties involved

You can ask nicely, they can (and will) ignore it.

sukhmel ,

Also, I’ve already seen complaints about AI companies scraping everything ignoring robots.txt

And we would block the obedient and useful crawlers while doing no harm to malicious

FaceDeer , (edited ) to fediverse in Blocking AI crawlers on the fediverse
@FaceDeer@kbin.social avatar

We're sick of closed walled-garden monoliths like Reddit! Let's move to an open federated protocol where anyone can participate and the APIs can't be locked down!

...wait, not like that!

Yeah. This is what you signed up for when you joined the Fediverse, the ActivityPub protocol broadcasts your content to any other servers that ask for it. And just generally, that's how the Internet works. You're putting up a public billboard and expecting to be able to control who gets to look at it. That's not going to work. Even robots.txt is just a gentleman's agreement, it's not enforceable.

If you really want to prevent AI from training on your content with any degree of certainty you're probably looking for a private forum of some kind that's run by someone you trust.

cecep OP ,
@cecep@fedia.io avatar

I don't expect anything, I was merely asking a question to clarify this

FaceDeer ,
@FaceDeer@kbin.social avatar

Well, I hope my answer clarifies it. You can't prevent LLMs from being trained on your public posts.

pop ,

We’re sick of closed walled-garden monoliths like Reddit! Let’s move to an open federated protocol where anyone can participate and the APIs can’t be locked down!

Can you point to where the fediverse collectively said that? Speak for yourself and don’t act like fediverse was designed to suit your definition of freedom. The fediverse is open and federated as in, there are multiple instances and owners without a centralized administration and the owners who hosts those instances decide what to lock down.

FaceDeer ,
@FaceDeer@kbin.social avatar

And some of those hosts can decide to serve up their content to AI trainers. Some of those hosts can be run by AI trainers, specifically to gather data for training. If one was to try to prevent that then one would be attacking the open nature of the fediverse.

There have been many people raging about their content being used to train AIs without permission or compensation. I'm speaking to those people, not the "fediverse collectively". As you suggest, the fediverse can't say anything collectively.

mozz , to fediverse in Blocking AI crawlers on the fediverse
@mozz@mbin.grits.dev avatar

You are correct. Some of the largest instances block bot traffic, but most don't, meaning your posts have been seen by AI crawlers and will continue to be so.

Short of not participating in federation and only discussing things within a private non-federated community on a personal instance or something, I don't think there's a way to prevent it.

cecep OP ,
@cecep@fedia.io avatar

Thanks for confirming. It's unfortunate that people who are outraged about Reddit selling their data to AI companies don't really have an alternative in the fediverse.

I guess the best hope is for new mechanisms to control AI crawlers to emerge, so they can be blocked per user rather than per domain. Maybe https://spawning.ai will come up with something. One can hope.

FaceDeer ,
@FaceDeer@kbin.social avatar

I really don't see how it would be physically possible to do that and still allow the content to be publicly seen by other humans.

Cheradenine ,

It is unfortunate, buy we are giving our data freely, as we did on Spezzit. IMHO it would be great to block efforts to monetize Lemmy by ai, but that is not what we signed up for.

Lemmy is neither private, nor closed. It’s just the way it works.

Contributing in an open forum means the data will get harvested. If it closed there will be fewer views, open is what we have now.

Companies will train on what we post, we are not giving that (directly) to a centralized service though. To me that compromise is enough.

TheOneCurly ,

I don’t see it as hypocritical at all. Public comments are, for me at least, put out for the public good. The same reason someone might license open source code with the MIT license. My issue with Reddit is that they restricted who can obtain the data and then privately sold them to only the highest bidder. They should be freely available to all who want to view them without restrictions on money or power.

originalucifer ,
@originalucifer@moist.catsweat.com avatar

it really sounds like you really want a walled garden so you can control your.. .whatever. the fediverse is public by nature, so discussing how you can control public information is kinda.. weird.

cecep OP ,
@cecep@fedia.io avatar

Is it? Reddit is technically "public" too in the sense that you can view all the content without an account, yet Google and others pay for the data anyway. And for many years, people made stuff public and could reasonably expect it won't show up in any major search engines because Google, MS and others respected robot.txt. I know it was never legally binding. I'm also not naive, I know I give up control when I post publicly and there won't ever be a perfect solution to the AI crawler situation. But a lot is changing right now, regulatory and technologically.

originalucifer ,
@originalucifer@moist.catsweat.com avatar

the fact that google has to pay for the data proves the walled garden you claim is public.

the fediverse is public, by default. it publicly distributes information to other publicly accessible servers.. by default.

its public information on publicly accessible servers that are opt-out by default. publicly.

im baffled how people can have some expectation of privacy in such a clearly defined public space.

cecep OP ,
@cecep@fedia.io avatar

You don't need to explain to me how the Fediverse works and I never said I have any expectation of privacy. But generally speaking, you're overlooking the fact that there always have been rules for what can, and cannot be done with information that is publicly available. Just because someone publicly posts his Facebook profile picture doesn't mean it's legal to use in an ad without permission, for example. People might break the rules, yes, but then they might face consequences, and that alone prevents many from breaking them in the first place. Not perfect, but better than nothing. And I'm saying we're in a process where rules are being renegotiated when it comes to using public information for AI training

originalucifer ,
@originalucifer@moist.catsweat.com avatar

fair points, but i still posit that its a waste of time to attempt to regulate what can be viewed anonymously.

personally, i could not possibly care less about any of my data being ingested by 'ai'. not a battle i care to fight, or even find worthy of fighting.

cecep OP ,
@cecep@fedia.io avatar

That's fair, but I think if AI companies would be legally required to disclose the sources of their training data and if you make some successor to robot.txt legally binding as well (both is being discussed in the EU for example), at least the "bigger players" in the AI industry would respect the rules. Better than nothing

FigMcLargeHuge ,

I think you are mistaking publicly available with public. Just because reddit made everyone’s posts publicly available doesn’t mean they are public. Once you post something, they have the right to use that data in any way they choose, and you agreed to that when you signed up. Per their user agreement:

“You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.”

Because they allow anyone to see the posts doesn’t make it “public” data, it just means that they are allowing you access to the data they now have a license to. Now lets say you work for a state agency. Any work you do is property of said state and is public. I believe the same goes for some government agencies, like NASA. The work they produce is public. That’s completely different than reddit allowing you to post on their platform and then allowing others to see your post. They can do whatever they want with the data, including turning it off one day and just sitting on it if they wanted. Expecting anything public from a private company, well good luck with that. Back to lemmy, well even if you blocked all AI from scraping from an instance, nothing would stop a company from just setting up their own instance, federating it, and just sucking up all the info as it comes in. Nothing you post on here will ever be private.

I think people are about to learn a hard lesson on the internet. Nothing is ever private if it is online.

ptz ,
@ptz@dubvee.org avatar

Mine is but a wee instance, but our bot blocklist is large. For the ones that slip through, once identified as bot traffic, the firewalls go up in their direction.

Spider89 , to piracy in When /some/ YT videos get special download-resistent treatment but not others

yt-dlp works fine.

Nomad , to piracy in When /some/ YT videos get special download-resistent treatment but not others

This is your browser handling the content disposition wrongly.

ciferecaNinjo OP ,

Why would a browser handle it incorrectly for one video on one invidious instance, but not for most other videos and other instances?

Note that I’ve seen this broken behavior both in my own Chromium installation as well as Firefox in Windows as a public library.

shnizmuffin ,
@shnizmuffin@lemmy.inbutts.lol avatar

There are a few reasons this might be the case!

  1. The instance’s UI might not be declaring that a or button element as a resource meant to be downloaded.
  2. The instance’s web server might not have declared the downloadable file’s mime type as a resource. (Apache, nginx.)
  3. Your operating system might not recognize the file type as a thing to be downloaded, or your browser isn’t telling it to download to a file.

It’s probably 1 or 2 if you’re seeing the same behavior across multiple browsers and OS.

darcmage , to piracy in When /some/ YT videos get special download-resistent treatment but not others

I tried downloading from the link provided and it started downloading the file for me.

ciferecaNinjo OP ,

Thanks for pointing that out. It works for me too. I just happened to select a different instance where it actually works. Here’s the instance where it’s broken:

https://iv.ggtyler.dev/watch?v=lU4vv7qCQvg

conciselyverbose ,

Try right clicking and "save as"? On mobile Safari it pops up with view and download as options.

ciferecaNinjo OP ,

That’s is how I got around it in the past. For some reason that was not an option where I needed it (perhaps the browser I was using was locked down in some way). In any case, I’m wondering why the variation in behavior. Is this a bug in Invidious?

darcmage ,

Still downloading the file for me.

Edit: Tried it in chromium out of curiousity and I was able to reproduce your issue. Not sure why it works normally in firefox.

ciferecaNinjo OP ,

Ungoogled Chromium indeed reproduces the issue. But so does the public library, which likely was Firefox in Windows. So i guess it might be hasty to conclude that it’s browser specific, particularly when other videos on the same instance behave differently in the same browser.

tagginator Bot , to piracy in When /some/ YT videos get special download-resistent treatment but not others

New Lemmy Post: When /some/ YT videos get special download-resistent treatment but not others (https://lemmy.dbzer0.com/post/9558043)
Tagging:

(Replying in the OP of this thread (NOT THIS BOT!) will appear as a comment in the lemmy discussion.)

I am a FOSS bot. Check my README: https://github.com/db0/lemmy-tagginator/blob/main/README.md

ciferecaNinjo OP ,

First time I’ve seen this bot. I would be interested in learning how to cross-post from to in a way that preserves the original username the way this bot did. Is that possible without 3rd party tools? I can login to a Lemmy instance and then crosspost any Kbin thread to a Lemmy community, but then the author becomes myself, not the original Kbin author.

Ihnivid , to lemmyshitpost in https://it.slashdot.org/story/23/11/09/1528225/microsoft-wont-let-you-close-onedrive-on-windows-until-you-explain-yourself

I’m not entirely convinced that this is fake.

monsterpiece42 ,

Those spelling errors wiggles are definitely Open/Libre office. A real one would be in Word.

But other than that, this feels very MS.

HikingVet ,

Its fake.

A quick glance and in the upper left hand corner you see windows is not correct and there is a date of Nov. 2026.

No company lawyer would ever ok asking about sleep schedules, OneDrive would be added to the list of properly spelled words, and the formatting is absolute shit even for Microsoft.

That was spending about 15 seconds looking at it. If I actually cared to dissect it, I would probably find more shenanigans.

bigkahuna1986 ,

There wasn’t enough copyright and trademark symbols either. There should be at least… 3 times as many.

thorbot ,

Damn. Thanks for clearing that up Sherlock

dual_sport_dork ,
@dual_sport_dork@lemmy.world avatar

Like one of the listed opt-out reasons literally being labeled “fuck you?”

kernelle , to nostupidquestions in Do any ATMs in Belgium support balance inquiries?

I’ve never seen a check balance option ever when not using my own banks ATM over here. ING does still have ATM’s in a few places, KBC and Belfius definitely do as well. Also you forgot Argenta and Bpost which has them as well. Honestly don’t think you’ll be able to perform a balance check on any of them.

Fleppensteijn , to nostupidquestions in Do any ATMs in Belgium support balance inquiries?
@Fleppensteijn@feddit.nl avatar

As far as I know, European banks never give you the option for balance inquiry. ATMs in e.g. Asia may give the option but it won’t work with a European bank card.

ciferecaNinjo OP ,

Dutch ATMs give a balance.

Fleppensteijn ,
@Fleppensteijn@feddit.nl avatar

Not with my ING account

ciferecaNinjo OP ,

oh, that’s interesting. I wonder if card-issuing banks are blocking balance inquiries even if ATMs offer it. I don’t think I saw Ing ATMs in Netherlands, only the conglomerate they are partnered with (geldmaat). The Geldmaat ATMs print “credit limit” on the receipt.

allywilson , to nostupidquestions in Do any ATMs in Belgium support balance inquiries?

That’s unexpected.

I’ve never thought about it before. In the UK getting your balance is on every ATM. The ATMs are all different makes and models, interfaces, OSes, different banks, etc. the only thing I can think of that is the same is they are all connected to LINK - maybe they have set standards for what they should all have/have not?

Anti-feature: you must enter your PIN before it shows you the menu. Does that mean it connects to my bank even in the absense of a transaction?

This is the same in the UK. They request your PIN immediately after putting in your card (althogh I think if you use a credit card it will prompt for your language first), but it doesn’t use it until it needs to connect to your bank (I know this after knowingly mistakenly putting in my PIN, then attempting to get my balance or something and then the card is ejected and the message about incorrect PIN appears).

Blackmist ,

I think in the UK each bank enquiry results in the ATM operator getting paid. So they ask you like three times if you want to see your balance because they get money for that as well as just the cash dispensing.

SmashingSquid , to nostupidquestions in Do any ATMs in Belgium support balance inquiries?

Is there no number on your card you can call and get your balance using an automated system?

ciferecaNinjo OP , (edited )

No. There are a couple unpublished phone numbers but they’re a disaster. I’ve not encountered a Belgian bank that gives automated account info over the phone. Last time I called I think it was just a greeting saying “contact us through the app or email” or something like that, IIRC.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • lifeLocal
  • goranko
  • All magazines