Really, there’s only one way to prevent that, but it would offer no guarantees; the instance with the weakest security in the group would allow your posts to be crawled.
It would require an agreement among instances to block crawler bot traffic (by user-agent, known IPs, etc) and only federating, via allow lists, with instances that adhere to the agreement. At that point, it’s more of a federated private forum, but there would still be some benefit I guess.
I wonder if content should carry some license automatically. Like if you agree to the TOS of an instance, your comments are automatically all licensed as CC:BY or CC:O or the more restrictive license of choice of the instance owner.
There’s someone running around lemmy with a creative commons sharealike link as a signature. Quite funny to be honest. I can’t remember the username though. They’re bound to show up sooner or later :)
Yup. There are dumps of Reddit's entire archive of comments and posts available via torrent, I suspect the only reason Reddit's getting paid for that stuff right now is that it's a legal ass-covering that's comparatively cheap. Anyone who's a little daring could use it to train an LLM and if they prep the data well enough it'd be hard to even notice.
We're sick of closed walled-garden monoliths like Reddit! Let's move to an open federated protocol where anyone can participate and the APIs can't be locked down!
...wait, not like that!
Yeah. This is what you signed up for when you joined the Fediverse, the ActivityPub protocol broadcasts your content to any other servers that ask for it. And just generally, that's how the Internet works. You're putting up a public billboard and expecting to be able to control who gets to look at it. That's not going to work. Even robots.txt is just a gentleman's agreement, it's not enforceable.
If you really want to prevent AI from training on your content with any degree of certainty you're probably looking for a private forum of some kind that's run by someone you trust.
We’re sick of closed walled-garden monoliths like Reddit! Let’s move to an open federated protocol where anyone can participate and the APIs can’t be locked down!
Can you point to where the fediverse collectively said that? Speak for yourself and don’t act like fediverse was designed to suit your definition of freedom. The fediverse is open and federated as in, there are multiple instances and owners without a centralized administration and the owners who hosts those instances decide what to lock down.
And some of those hosts can decide to serve up their content to AI trainers. Some of those hosts can be run by AI trainers, specifically to gather data for training. If one was to try to prevent that then one would be attacking the open nature of the fediverse.
There have been many people raging about their content being used to train AIs without permission or compensation. I'm speaking to those people, not the "fediverse collectively". As you suggest, the fediverse can't say anything collectively.
You are correct. Some of the largest instances block bot traffic, but most don't, meaning your posts have been seen by AI crawlers and will continue to be so.
Short of not participating in federation and only discussing things within a private non-federated community on a personal instance or something, I don't think there's a way to prevent it.
Thanks for confirming. It's unfortunate that people who are outraged about Reddit selling their data to AI companies don't really have an alternative in the fediverse.
I guess the best hope is for new mechanisms to control AI crawlers to emerge, so they can be blocked per user rather than per domain. Maybe https://spawning.ai will come up with something. One can hope.
It is unfortunate, buy we are giving our data freely, as we did on Spezzit. IMHO it would be great to block efforts to monetize Lemmy by ai, but that is not what we signed up for.
Lemmy is neither private, nor closed. It’s just the way it works.
Contributing in an open forum means the data will get harvested. If it closed there will be fewer views, open is what we have now.
Companies will train on what we post, we are not giving that (directly) to a centralized service though. To me that compromise is enough.
I don’t see it as hypocritical at all. Public comments are, for me at least, put out for the public good. The same reason someone might license open source code with the MIT license. My issue with Reddit is that they restricted who can obtain the data and then privately sold them to only the highest bidder. They should be freely available to all who want to view them without restrictions on money or power.
it really sounds like you really want a walled garden so you can control your.. .whatever. the fediverse is public by nature, so discussing how you can control public information is kinda.. weird.
Is it? Reddit is technically "public" too in the sense that you can view all the content without an account, yet Google and others pay for the data anyway. And for many years, people made stuff public and could reasonably expect it won't show up in any major search engines because Google, MS and others respected robot.txt. I know it was never legally binding. I'm also not naive, I know I give up control when I post publicly and there won't ever be a perfect solution to the AI crawler situation. But a lot is changing right now, regulatory and technologically.
You don't need to explain to me how the Fediverse works and I never said I have any expectation of privacy. But generally speaking, you're overlooking the fact that there always have been rules for what can, and cannot be done with information that is publicly available. Just because someone publicly posts his Facebook profile picture doesn't mean it's legal to use in an ad without permission, for example. People might break the rules, yes, but then they might face consequences, and that alone prevents many from breaking them in the first place. Not perfect, but better than nothing. And I'm saying we're in a process where rules are being renegotiated when it comes to using public information for AI training
That's fair, but I think if AI companies would be legally required to disclose the sources of their training data and if you make some successor to robot.txt legally binding as well (both is being discussed in the EU for example), at least the "bigger players" in the AI industry would respect the rules. Better than nothing
I think you are mistaking publicly available with public. Just because reddit made everyone’s posts publicly available doesn’t mean they are public. Once you post something, they have the right to use that data in any way they choose, and you agreed to that when you signed up. Per their user agreement:
“You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.”
Because they allow anyone to see the posts doesn’t make it “public” data, it just means that they are allowing you access to the data they now have a license to. Now lets say you work for a state agency. Any work you do is property of said state and is public. I believe the same goes for some government agencies, like NASA. The work they produce is public. That’s completely different than reddit allowing you to post on their platform and then allowing others to see your post. They can do whatever they want with the data, including turning it off one day and just sitting on it if they wanted. Expecting anything public from a private company, well good luck with that. Back to lemmy, well even if you blocked all AI from scraping from an instance, nothing would stop a company from just setting up their own instance, federating it, and just sucking up all the info as it comes in. Nothing you post on here will ever be private.
I think people are about to learn a hard lesson on the internet. Nothing is ever private if it is online.
Mine is but a wee instance, but our bot blocklist is large. For the ones that slip through, once identified as bot traffic, the firewalls go up in their direction.
Thanks for pointing that out. It works for me too. I just happened to select a different instance where it actually works. Here’s the instance where it’s broken:
That’s is how I got around it in the past. For some reason that was not an option where I needed it (perhaps the browser I was using was locked down in some way). In any case, I’m wondering why the variation in behavior. Is this a bug in Invidious?
Ungoogled Chromium indeed reproduces the issue. But so does the public library, which likely was Firefox in Windows. So i guess it might be hasty to conclude that it’s browser specific, particularly when other videos on the same instance behave differently in the same browser.
First time I’ve seen this bot. I would be interested in learning how to cross-post from #kbin to #Lemmy in a way that preserves the original username the way this bot did. Is that possible without 3rd party tools? I can login to a Lemmy instance and then crosspost any Kbin thread to a Lemmy community, but then the author becomes myself, not the original Kbin author.
A quick glance and in the upper left hand corner you see windows is not correct and there is a date of Nov. 2026.
No company lawyer would ever ok asking about sleep schedules, OneDrive would be added to the list of properly spelled words, and the formatting is absolute shit even for Microsoft.
That was spending about 15 seconds looking at it. If I actually cared to dissect it, I would probably find more shenanigans.
I’ve never seen a check balance option ever when not using my own banks ATM over here. ING does still have ATM’s in a few places, KBC and Belfius definitely do as well. Also you forgot Argenta and Bpost which has them as well. Honestly don’t think you’ll be able to perform a balance check on any of them.
As far as I know, European banks never give you the option for balance inquiry. ATMs in e.g. Asia may give the option but it won’t work with a European bank card.
oh, that’s interesting. I wonder if card-issuing banks are blocking balance inquiries even if ATMs offer it. I don’t think I saw Ing ATMs in Netherlands, only the conglomerate they are partnered with (geldmaat). The Geldmaat ATMs print “credit limit” on the receipt.
I’ve never thought about it before. In the UK getting your balance is on every ATM. The ATMs are all different makes and models, interfaces, OSes, different banks, etc. the only thing I can think of that is the same is they are all connected to LINK - maybe they have set standards for what they should all have/have not?
Anti-feature: you must enter your PIN before it shows you the menu. Does that mean it connects to my bank even in the absense of a transaction?
This is the same in the UK. They request your PIN immediately after putting in your card (althogh I think if you use a credit card it will prompt for your language first), but it doesn’t use it until it needs to connect to your bank (I know this after knowingly mistakenly putting in my PIN, then attempting to get my balance or something and then the card is ejected and the message about incorrect PIN appears).
I think in the UK each bank enquiry results in the ATM operator getting paid. So they ask you like three times if you want to see your balance because they get money for that as well as just the cash dispensing.
No. There are a couple unpublished phone numbers but they’re a disaster. I’ve not encountered a Belgian bank that gives automated account info over the phone. Last time I called I think it was just a greeting saying “contact us through the app or email” or something like that, IIRC.
fedia.io
Newest