How has no one worked on a new search engine over the last decade or so where Google has been on a clear decline in its flagship product!
I know of the likes of DDG, and Bing has worked hard to catch up, but I’m genuinely surprised that a startup hasn’t risen to find a novel way of attacking reliable web search. Some will say it’s a “solved problem”, but I’d argue that it was, but no longer.
A web search engine that crawls and searches historic versions of a web page could be an incredibly useful resource. If someone can also find a novel way to rank and crawl web applications or to find ways to “open” the closed web, it could pair with web search to be a genuine Google killer.
There’s a lot of startups trying to make better search engines. Brave for example is one of them. There’s even one Lemmy user, but I forget what the name of theirs is.
But it’s borderline impossible. In the old days, Google used webscrapers and key word search. When people started uploading the whole dictionary in white text on their pages, Google added some antispam and context logic. When that got beat, they handled web credibility by the number of “inlinks” from other websites. Then SEO came out to beat link farmers, and you know the rest from there.
An indexable version of Archive.org is feasible, borderline trivial with ElasticSearch, but the problem is who wants that? Sure you want I may, but no one else cares. Also, let’s say you want to search up something specific - each page could be indexed, with slight differences, thousands of times. Which one will you pick? Maybe you’ll want to set your “search date” to a specific year? Well guess what, Google has that feature as well.
They’ve had a history of controversy over their life, ranging from replacing ads with their own affiliate links to bundling an opt-out crypto miner. Every time something like this happened, the CEO went on a marketing campaign across social media, effectively drowning out the controversial story with an influx of new users. The CEO meanwhile has got in trouble for his comments on same-sex marriage and covid-19.
In general, it’s always seemed like it would take a very small sack of money for Brave to sell out its users. Also, their browser is Chromium based, so it’s still contributing to Google’s market dominance and dictatorial position over web technologies.
Bing’s copilot is genuinely pretty good, the AI answer is often pretty accurate and the way it’s able to weave links into its answer is handy. I find it way more useful than Google search these days and I’m pretty much just using it on principle as Google is just pissing me off with killing their services, a few of which I’ve used.
I don’t think Microsoft is some saint but copilot is just a good product.
Yes, that would be a Google killer. If you somehow find the money to provide it for free.
Finding a novel way of searching is one thing. Finding a novel way of financing the whole endeavor (and not going the exact route Google is) is another.
Fuck. I sometimes use the text-only version to access sites with too many moving elements or when the site is geoblocked or doesn’t respect cookies choices and denies access. So far, it has been the most reliable one for me.
It was meant for helping people access pages when way back, you often couldn’t depend on a page loading,” Google’s Danny Sullivan wrote. “These days, things have greatly improved. So, it was decided to retire it."
They still go down, Danny. And fairly frequently at that. Y’all are fuckin’ stupid.
I’d say things are much worse than they used to be. Sure, in the past sites would disappear or completely fail more often. But, because most sites were static, those were the only ways they could fail. These days the cache feature is useful for websites that have javascript bugs preventing them from displaying properly, or where the content-management-system still pretends the link works but where it silently just loads different content.
It sucks because it’s sometimes (but not very often) useful but it’s not like they are under any obligation to support it or are getting any money from doing it.
I dunno, but I suspect that they aren’t using Google’s cache if that’s the case.
My guess is that the site uses its own scrapper that acts like a search engine and because websites want to be seen to search engines they allow them to see everything. This is just my guess, so it might very well be completely wrong.
At least some of these tools change their “user agent” to be whatever google’s crawler is.
When you browse in, say, Firefox, one of the headers that firefox sends to the website is “I am using Firefox” which might affect how the website should display to you or let the admin knkw they need firefox compatibility (or be used to fingerprint you…).
You can just lie on that, though. Some privacy tools will change it to Chrome, since that’s the most common.
Or, you say “i am the google web crawler”, which they let past the paywall so it can be added to google.
Or, you say “i am the google web crawler”, which they let past the paywall so it can be added to google.
If I’m not wrong, Google has a set range of IP addresses for their crawlers, so not all sites will let you through just because your UA claims to be Googlebot
Depends. Not every site, or its pages, will be crawled by the Internet Archive. Many pages are available only because someone has submitted it to be archived. Whereas Google search will typically cache after indexed.
Seeing many comments here shitting on this decision by google, is this really that big of a deal? I’ve personally never used the cached feature of Google and if I ever needed to see a page that is currently down, it’d be via wayback machine. If nobody used the feature, why have it waste a ton worth of storage space? Feel free to prove me wrong though.
It was also useful when the page had changed inbetween google indexing it and now, so if you loaded the page and couldn’t find the text you were searching for because it was deleted, you could find it on the cached page.
By they way, I just found out that they removed the button, but typing cache:www.example.com into Google still redirects you to the cached version (if it exists). But who knows for how long. And there’s the question whether they’ll continue to cache new pages.
I hope they only kill the announced feature but keep the cache part.
Just today I had to use it because some random rss aggregator website had the search result I wanted but redirected me somewhere completely different…
Quotes are fucking awful now. You have to change the search terms to verbatim now which takes way fucking longer. Google has enshittified almost everything. I’m just waiting for them to ruin Maps.
My guess is that a cached page is just a byproduct when the page is indexed by the crawler. The need a local copy to parse text, links etc. and see the difference to the previous page.
Beyond that, the money is still going to Google, Yandex, Brave, Bing etc via API payments. If they actually created their own search engine that was any good I’d be more inclined to pay for access.
I have been looking at kagi but their pricing is definitely made to force people to buy the professional $10 package.
100 or even 300 searches/day would be unusable for me, you quickly spend 10 searches refining a query for something special, and when developing you do like 5-10 searches/hour.
engadget.com
Newest