HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Disturbing Proof They're Quietly Deleting the Internet...

Bright Insight · Youtube · 18 HN points · 4 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Bright Insight's video "Disturbing Proof They're Quietly Deleting the Internet...".
Youtube Summary
Try this 30 second experiment and see firsthand just how much trouble we're in...

Thanks for watching! I'm Jimmy Corsetti, and my channel is called Bright Insight.
Follow and Support me on these other platforms where I can speak my mind and share TRUTH!
https://brightinsight.locals.com/support
https://www.subscribestar.com/bright-insight
https://www.patreon.com/BrightInsight
Follow me on Instagram: https://www.instagram.com/bright_insight/
Follow me on Rumble: https://rumble.com/c/BrightInsight
Follow me on Odysee: https://odysee.com/@BrightInsight:c
Follow me on TikTok: https://vm.tiktok.com/ZM8u3XBhL/
Or, Tip me on Venmo! @bright_insight

Must see Podcast I had with Danica Patrick! https://youtu.be/Nihxp-Vkk-U
Check out my podcast with Joe Rogan for FREE on Spotify: https://open.spotify.com/episode/1dELONn67xCHBPpWYytMDO?si=5G33crypSyynvgTMX7tr5g
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
https://youtu.be/bWbytHBp0zI

Google search is severely broken.

xaedes
I was feeling this as well, but didn't pay attention _how_ bad it is. I mean it really is... Is there ANY search engine left that returns more than 500 results for any search? Are there any community driven search engines? I mean at this point it can't be hard to build a better alternative...
Melatonic
Kagi is great so far!
xaedes
Tried it, but returning lots of results doesn't seem to be their focus.

Now I installed OpenSearchServer and see how far a local index gets me. The simple query "a" gets me at most 394 results at google, after letting OpenSearchServer crawl/index for just 2h I already get over 900 results for "a". Well.. I guess it really is not hard to beat that meager 394 results.

Yeah, indeed. And not only on controversial topics either.

Check these videos out.

TruthStream's production: https://www.youtube.com/watch?v=6zyJB45ewvU

Jimmy from BrightInsight: https://www.youtube.com/watch?v=bWbytHBp0zI

https://www.youtube.com/watch?v=8O_NvPpbsbw

Aug 15, 2022 · 14 points, 16 comments · submitted by kk6mrp
PaulHoule
Google's search results have never been particularly that good past the first page. Now that they are stuffed full of ads even the first page sucks.
rekabis
Unlock Origin. Doesn’t remove “promoted content” and other in-line crap, but it does remove all the content from their ad network.
mrguyorama
Google has openly stated the first "number of results" numbers are literally made up, and have no relevance to actual things. It was way too expensive to calculate for every query when the vast majority of them ended after the first page, so they don't even try.

Nothing is being "deleted" especially not actively. This is also why you shouldn't use "number of results" as data in research, because it is meaningless.

This video is worse than misinformation and clickbait.

pauldenton
So Google making up the information they present to users is not Misinformation and clickbait, but this youtube video is.
ninju
citation please
hammyhavoc
Yes, this requires a source. I've never heard anything like it, especially considering all I hear from Google about AI.
mrguyorama
I can't edit anymore but to anyone looking, I found stackexchange references to un-named googlers and other third hand accounts. I am wrong, Google is NOT open about this fact, but I am still confident in stating you cannot use the "estimated results" for anything.

My bet is that you can compare the original "estimated" results number to the actual number google gives you at page tenish for thousands of search terms and queries and find no relation between the two.

If you are still concerned about things google is ACTUALLY fucking with when it comes to search results, check out the Mozilla organization's research into the matter.

pcdoodle
Break up Google. They are too big.
blinded
the good stuff stays the not so good get deleted, this isn't an episode of hoarders is it?
anon22334556
I think the point is “what is the good stuff”

If it says 6.8 billion but only had 448 total…

I’ve had an issue in google where a study I found in 2012 is no longer available in 2022 even in their search

hammyhavoc
And the link to that study is what? Because the domain may have expired, data may have been pulled, company may have gone bust, DNS may be wrong, they may have been hacked and started serving malicious content. I could go on for about 30 minutes as to what might have happened to it.
salawat
The thing about an actual index is it's reversible, and transparent. You've digested a bunch of information, and if I want to see the least relevant result by some query, that should be doable.

I would not be surprised if the folks at Google Search have forgotten that pne of the first tenants of organizing information is not to omit it.

Either that, or they implemented their delist functionality as "force relevance zero, and truncate results before you get there".

hammyhavoc
The word is "tenet", not "tenant".
salawat
Internal hash table collisions for English are a real wench at times.
PaulHoule
Google doesn't think that way at all.

Back when Gerard Salton was writing the first papers on IR he had a set of 60 or so documents and kept his index on a deck of punched cards.

With a small set of documents the main problem you run into is that some of the relevant documents don't use the exact words in the query so you might miss most of the relevant results.

With Google on the other hand you could have millions or billions of relevant document and the challenge is to do so well for the first few results that odds are good (say 70% -- this is limited by the ambiguity of the query) that the first result is a "direct hit".

If you are answering questions like that in a huge distributed system the query process probably looks like a set of funnels that feed answers into funnels that feed answers into funnels. If you want to answer questions quickly at low cost the best you can do is kill low relevance documents as quickly as possible.

I worked on a search engine for patents that had multiple nodes and could get slightly different answers from different nodes because each node had a neural net for semantic indexing of its contents. You might have the system report that there were 15,091 relevant results one time, another time it might be 15,094. Management thought that our customers would lose confidence in our search because of this so we implemented something that made the selection of nodes used for a particular query deterministic, which hurt the scalability of our search.

salawat
Given that the distributed architecture for massively parallel processing adds complications.

Given that neural nets add in levels of "Oh, what the hell?" That explains a lot actually about the uselessness of Google results without getting really creative.

I still grade Google at failing to produce a semantically valid index, that checks off the usability criteria of an actual index.

If I have a corpus of data, I want to be able to examine the structure, even if only through a window. Way back before the through neural nets at things (if that is what they are doing now), you could actually get a sense for it.

I used to love trawling Google results into the hundreds of pages, because what you'd get real feedback w.r.t the effects of additional predicates on your query. It was more of a data processing and query refinent exercise than "throw it at their ML models and hope they decide to be useful today."

Organizing information isn't just about vomiting results... It's about imposing enough structure that people can help themselves it's like Architecture. A poorly planned building, or an excellently planned building optimized for discomfort is a hell on Earth.

One that actually reflects and accommodates the natural flow and needs of the occupants/users is a joy for all to behold.

Google had that. Now it doesn't, and it's increasingly difficult to get the darn thing to stop playing Bayesian/gradient descent/backprop buggers and just show me stuff that matches what I asked for in the Boolean sense, and don't you dare tell me there are only 13 bloody results.

There is search, then there is Search. I prefer the latter.

Aug 13, 2022 · 2 points, 0 comments · submitted by vlindos
Aug 12, 2022 · 2 points, 4 comments · submitted by kvetching
kvetching
Tldr of video: Google truncates 6.5b results for "climate change" to just a few hundred. Same thing occurs on other search terms and other search engines.

You don't know they limit your results until you reach the last page. To try, keep going towards the end of the results, keep going to the farthest page possible and it will end shockingly soon. Only then does the results number change. (I've only tested on desktop)

Once you get to their last page, It will tell you it omitted results and you can click a button to show all, but it only adds about double, so instead of 20 pages, you get 40 pages. Out of billions of pages, you get less than 1000.

abrichr
Is this just for certain topics (eg climate change) or is it default behavior?
kvetching
All topics. If you try to get it to move farther in the results using a URL, you will get this message:

"Sorry, Google does not serve more than 1000 results for any query."

salawat
I've known this for years.

I've always known there's perception management involved, because they just drop inconvenient things out of the search index altogether. Sorry, can't show you that Dave.

Aug 12, 2022 · kvetching on ignore
wrong link, can't edit.

https://youtu.be/bWbytHBp0zI

keep going to next page, until you can't anymore and it changes. If you ask it to add more and do the same thing, it will once again end

https://youtu.be/bWbytHBp0zI

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.