There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

hahattpro ,

Let’s two of them die together

tal ,
@tal@lemmy.today avatar

Blocking other search engines will hurt Reddit, all else held equal. But not by that much. Google is seriously dominant in the search engine market.

kagis

Yeah.

gs.statcounter.com/search-engine-market-share

According to this, Google has 91.06% of the search engine market. So for Reddit, they’re talking about cutting themselves off from a little under 9% of people searching out there. Which…I mean, it isn’t insignificant, but it isn’t likely gonna hurt them all that badly.

pewgar_seemsimandroid ,

with threads too

best_username_ever ,

Is there a downside? I’m confused.

SteveFromMySpace ,

Yes. They are making other search engines less useful through what is functionally an exclusivity deal. They are also relying on Reddit to function as useful results since they ruined google search over the past few years. They’ve enshittified their own product and now they are making it everyone else’s problem.

This is bad for anyone who thinks we should be able to search the internet without being locked into google. The door this opens is awful as well - what happens as this practice expands and you suddenly need multiple search engines to find things online? What happens when a search engine cuts a deal with news outlets?

What a mess.

OsaErisXero ,

I'm excited for this to start triggering anti-trust legislation

WhatAmLemmy ,

It obviously should, but it won’t, because the US is a capitalist dictatorship masquerading as a democracy. The oligarchy own the government, and the regulators.

Plopp ,

But other search engines like Bing are also American capitalist corporations and they don’t want this I’m sure.

TheBat ,
@TheBat@lemmy.world avatar

letthemfight.gif

Mac ,

“sorry bro, I can’t search that website—it’s not covered by my subscription package”

Captainvaqina ,

Google already signaled they want to charge for their trash AI search.

gedaliyah OP ,
@gedaliyah@lemmy.world avatar

“Would you like to expand your search to include human-created content? Upgrade to Google Advanced* to unlock the power of the human web!”

mannycalavera ,
@mannycalavera@feddit.uk avatar

Makes sense they’ve spent years curating other people’s content and are now selling it… Oh wait 😯.

Vanth ,
@Vanth@reddthat.com avatar

I wonder what kind of contract they went with.

I can’t imagine this being a great long-term deal for Google. There’s minimal good new content being created on Reddit. Searching for useful information mostly brings up old posts, while new posts are heavily spam generated or designed to support AI learning.

I imagine buying access to historic reddit content from creation to ~2020 would be valuable. While paying for ongoing access to new content is going to be far less valuable and turn into AI devolution as we get to where AI is learning from other AI and spiraling into progressively worse outputs.

tal , (edited )
@tal@lemmy.today avatar

I wonder what kind of contract they went with.

reuters.com/…/reddit-ai-content-licensing-deal-wi…

SAN FRANCISCO, Feb 21 (Reuters) - Social media platform Reddit has struck a deal with Google (GOOGL.O) , opens new tab to make its content available for training the search engine giant’s artificial intelligence models, three people familiar with the matter said.

The contract with Alphabet-owned Google is worth about $60 million per year, according to one of the sources.

For perspective:

cbsnews.com/…/google-reddit-60-million-deal-ai-tr…

In documents filed with the Securities and Exchange Commission, Reddit said it reported net income of $18.5 million — its first profit in two years — in the October-December quarter on revenue of $249.8 million.

So if you annualize that, Reddit’s seeing revenue of about $1 billion/year, and net income of about $74 million/year.

Given that Reddit granting exclusive indexing to Google happened at about the same time, I would assume that that AI-training deal included the exclusivity indexing agreement, but maybe it’s separate.

My gut feeling is that the exclusivity thing is probably worth more than $60 million/year, that Google’s probably getting a pretty good deal. Like, Google did not buy Reddit, and Google’s done some pretty big acquisitions, like YouTube, and that’d have been another way for Google to get exclusive access. So I’d think that this deal is probably better for Google than buying Reddit. Reddit’s market capitalization is $10 billion, so Google is maybe paying 0.6% the value of Reddit per year to have exclusive training rights to their content and to be the only search engine indexing them; aside from Reddit users themselves running into content in subreddits, I’d guess that those two forms are probably the main way in which one might leverage the content there.

Plus, my impression is that the idea that a number of companies have – which may or may not be valid – is that this is the beginning of the move away from search engines. Like, the idea is that down the line, the typical person doesn’t use a search engine to find a webpage somewhere that’s a primary source to find material. Instead, they just query an AI. That compiles all the data that it can see and spits out an answer. Saves some human searcher time and reduces complexity, and maybe can solve some problems if AIs can ultimately do a better job of filtering out erroneous information than humans. We definitely aren’t there yet in 2024, but if that’s where things are going, I think that it might make a lot of strategic sense for Google. If Google can lock up major sources of training data, keep Microsoft out, then it’s gonna put Microsoft in a difficult spot if Microsoft is gunning for the same thing.

Vanth ,
@Vanth@reddthat.com avatar

Cool, thank you. You seem to know quite a bit about this stuff.

If we do end up at a point without search engines, where AI does the search and summarizes an answer, what do you think their level of ability to tie back to source material will be?

I’m thinking in cases of asking about a technical detail for a hobby, “how do I get x to work”. I don’t necessarily want a response like “connect blue wire to red”. What I really want is the forum posts discussing the troubleshooting and solutions from various people. If an AI search can’t get me to those forums, it’s of little value to me and when I do figure out an answer acceptable to my application, I’m not tied into that forum to share my findings (and generate new content for the AI to index).

Related to that, I’m thinking about these stories of lawyers relying on AI to write their briefs, and the AI cites non-existent cases as if they were real. It seems to me, not at all a programmer, that getting an AI to the point where it knows what’s real and what’s a hallucination would be a challenge. And until we get to that point, it’s hard to put full trust into an AI search.

kate ,

have you tried perplexity? it’s probably the best ai search engine right now although it still misunderstands context sometimes. it’s pretty good at citing its sources though

tal , (edited )
@tal@lemmy.today avatar

If we do end up at a point without search engines, where AI does the search and summarizes an answer, what do you think their level of ability to tie back to source material will be?

I haven’t used the text-based search queries myself; I’ve used LLM software, but not for this, so I don’t know what the current situation is like. My understanding is that current approach doesn’t really permit for it. And there are two issues with that:

  • There isn’t a direct link between one source and what’s being generated; the model isn’t really structured so as to retain this.
  • Many different sources probably contribute to the answer.

All information contributes a little bit to the probability of the next word that the thing is spitting out. It’s not that the software rapidly looks through all pages out there and then finds a given single reputable source that could then cite, the way a human might. That is, you aren’t searching an enormous database when the query comes in, but repeatedly making use of a prediction that the next word in the correct response is a given word, and that probability is derived from many different sources. Maybe tens of thousands of people have made posts on a given subject; the response isn’t just a quote from one, and the generated text may appear in none of them.

To maybe put that in terms of how a human might think, place you in the generative AI’s shoes, suppose I say to you “draw a house”. You draw a house with two windows, a flowerbed out front, whatever. I say “which house is that”? You can’t tell me, because you’re not trying to remember and present one house – you’re presenting me with a synthetic aggregate of many different houses; probably all houses have mentally contributed a bit to it. Maybe you could think of a given house that you’ve seen in the past that looks a fair bit like that house, but that’s not quite what I’m asking you to tell me. The answer is really “it doesn’t reflect a single house in the real world”, which isn’t really what you want to hear.

It might be possible to basically run a traditional search for a generated response to find an example of that text, if it amounts to a quote (which it may not!)

And if Google produces some kind of “reliability score” for a given piece of material and weights the material in the training set by that (which I will guess that if they don’t now, they will), they could maybe use the reliability score to try to rank various sources when doing that backwards search for relevant sources.

But there’s no guarantee that that will succeed, because they’re ultimately synthesizing the response, not just quoting it, and because it can come from many sources. There may potentially be no one source that says what Google is handing back.

It’s possible that there will be other methods than the present ones used for generating responses in the future, and those could have very different characteristics. Like, I would not be surprised, if this takes off, if the resulting system ten years down the road is considerably more complex than what is presently being done, even if to a user, the changes under the hood aren’t really directly visible.

There’s been some discussion about developing systems that do permit for this, and I believe that if you want to read up on it, the term used is “attributability”, but I have not been reading research on it.

Vanth ,
@Vanth@reddthat.com avatar

Attribution, great term to search. Thank you.

Websearching “attribution + AI” brings up a lot of hits on copyright concerns. Which opens up even more questions. If we get to the point where AI attributes it’s sources with some sort of scoring, then it’s near certainly going to be using copyrighted materials at times. And depending on the copyright and what profits the AI company is gaining from their use and probably a bunch more detailed copyright stuff beyond my civilian acknowledge, there’s probably financial and legal reasons for AI searches to not publicly attribute sources. Which loops me back to, I want to see conflicting materials and make a judgement call on final summary myself in many cases.

I’m sure there are many people much smarter than me with nothing but pure, ethical intentions figuring all this out. Who knows, maybe this will be the tipping point for better copyright and intellectual property protections in the US and elsewhere.

MeatsOfRage ,

Around here we love the idea of Reddit being totally devoid of life but the fact is it’s still one of the most active public facing sites on the web. The attrition to sites like Lemmy pretty negligible to the overall Reddit activity and bot AI activity only really affects the largest subreddits which have always been a bit spammy and click batey. The medium and small subreddits are still full of active people. Don’t get me wrong, Lemmy is my daily driver for this content but I won’t pretend everyone fled Reddit for this.

Additionally, exclusivity with Google isn’t necessary just to keep the search results but to prevent their biggest AI competition ChatGPT and their ties to Microsoft from getting access to what is the Internet’s largest database of public facing conversation.

GreatAlbatross ,
@GreatAlbatross@feddit.uk avatar

At least on some smaller subs, there seems to be a suspicious amount of brand new accounts asking one question to get human answers.
It would not surprise me if reddit, or some other service, are seeding to get more LLM-able content. Of course, this might backfire if people start giving stupid answers to eff up the data.

Wiz ,

Ah, so Google signed a contract with the company that trained their AI to … (checks notes) … suggest putting glue on pizza.

Sounds like a perfect match.

tal , (edited )
@tal@lemmy.today avatar

I’d look at what will be, rather than what is. I think that it’s probably not controversial to say that AI is going to improve; these are early days. The question is to what extent.

If one is to assume that AI will improve very little over time, that ten years from now the kind of responses that you’ll get generated by a computer ten years hence in response to a question will be about the same as they are today, then, yeah, it’s probably an error to commit major resources to AI stuff or to expend resources acquiring training data for it.

But that assumption may not hold.

z3rOR0ne ,

I’ve posted this elsewhere, but it bears repeating:

Just use ddg bangs if you use Duckduckgo and you can search reddit directly.


<span style="color:#323232;">!reddit search term
</span>

or:


<span style="color:#323232;">!r search term
</span>

It still picks up latest posts related to reddit, it just searches reddit directly instead of searching Bing’s results. It’s that simple.

You can even use a redirect extension like Libredirect in conjunction with this Duckduckgo feature to redirect your search to a privacy respecting frontend like redlib.

Kyouki ,

DDG is awesome, been using it for years.

lennivelkant ,

I used to sneer at the kids in my class that used it. Must have been fairly shortly after it launched, something like fourteen to fifteen years ago. I’m still grappling with a certain inertia when it comes to switching away from something I have relied on for so long, but I’m coming around to the idea of giving DDG a try at least (irrational as it is, I’ve been reluctant to even try - I suspect out of fear of liking it and having to change).

Past Me would be exasperated that Present Me is even toying with the idea. But then, Past Me had a lot of stupid takes anyway.

unconfirmedsourcesDOTgov ,

I went through the same process that you’re describing. In the end, I gave it a shot and, anecdotally, I feel like I find the things I’m looking for faster than I was with Google and with no shoddy ai summaries.

pcouy ,

With all the botting going on on Reddit, this whole Google AI deal makes me think of the recent paper that demonstrates that, as common sens would suggest, deep learning models collapse when successive generations are trained on the previous generations’ output

card797 ,

Block Reddit!

Plopp ,

But muh porn!

card797 ,

Exactly. You’re addicted, Plopp.

tal ,
@tal@lemmy.today avatar

The shackles and manacles were made of gold, but they were still there.

JeeBaiChow ,

I’m seldom on reddit after the exodus, but when I am, I noscript the duck out of it.

Plopp ,

You quack.

nyan ,

Actually, he doesn’t, since he’s removing the duck (and shipping it off to DuckDuckGo for reuse, no doubt).

admin ,
@admin@lemmy.my-box.dev avatar

How many times is this going to be posted? I’ve seen this several times now over the past few days.

gedaliyah OP ,
@gedaliyah@lemmy.world avatar

Sorry, I haven’t seen it. If it’s been posted here before, Send me the link to the previous post, and I’ll take this one down. Even better, you can report the post, and the mods will investigate it.

Thank you!

admin ,
@admin@lemmy.my-box.dev avatar

Since you asked, here are the other four times it was posted.

There was a fifth one, but that one has since been removed.

gedaliyah OP ,
@gedaliyah@lemmy.world avatar

Thanks, this looks like different reporting on the same story. That happens with major news, but I can understand why it may seem like excess if it’s not a story you’re interested in.

Binette ,

Oh well. Time to post more questions on lemmy

emb , (edited )

Just like Reddit’s changes last year, seems like a clear and reasonaly expected consequence of the ‘our text is so valuable because AI’ idea.

The web will probably continue to become more gated and more fragmented as a result of that, plus trying to get more control to force ads.

steal_your_face ,
@steal_your_face@lemmy.ml avatar

Still seems to work on Kagi

palordrolap ,

Kagi is a search aggregator, so those results are from Google.

steal_your_face ,
@steal_your_face@lemmy.ml avatar

You sure you’re not thinking of searxng?

palordrolap ,

No, but SearX does similar things. I've been learning about Kagi recently, and as far as I can tell, they don't index pages on their own, they just use APIs provided by the real search engines.

Kolanaki ,
@Kolanaki@yiffit.net avatar

That just means the dumbasses will get even less traffic. Way to shoot yourself in the foot, Spazz.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines