What's Wacky with Google?
There are always going to be oddities with any big online service, but this one seems to be persisting. Join the discussion in trying to figure out a pattern. For maybe a week, Google has been returning zero results or "1-1 of about xxx,000" for common searches. One-word searches seem unaffected, but there are certain two-word combinations of common words like
candle truck
or
speaker bracelet.
Reversing the order can affect searches too:
motorcycle candles
vs.
candles motorcycle.
The strange thing is that usually the 1 or 2 results found are to commerce sites. Read the
Search Basics,
compare your notes to
GoogleWhack's,
have fun looking for patterns, but remember that Google always returns slightly different results for different IP numbers.
(Update: 13:56 GMT by J : When I first posted this story it said the problems have been occurring "for several weeks at least" -- but it seems to be more like one week.)
I am so glad someone else noticed this!!! I've been so pissed I haven't been able to get any speaker bracelets recently. God google... forcing me to use other search engines to get my fix.
SkyNet is becoming self-aware.
It's just a glitch in The Matrix, of course.
What possesses someone to try such weird random words in google. Thats the real trick...google wrote an engine to amuse the crazy users.
That's why you can't trust google for anything critical. You are at their mercy, and if they choose to do biased, or screwed up searches, you either don't know, or can't do anything about it...
I propose an opensource web based search engine... No more weirdness, no more screwups, no more censorship!
---
Programming is like sex... Make one mistake and support it the rest of your life.
I am sure the next Google Zeitgeist will show numerous searches for candle truck or speaker bracelet in October 2003. And nobody at Google will have an explanation for this ;-)
Check out this - all 25 hits on the quoted words "candle truck" should be showing up in the non-quoted search ...
Maybe it has something to do with the counter that was meantioned in a slashdot post earlyer today?
This sig was generated by a barrel of trained kittens for SeXy_Red (550409).
for a few weeks, when I do a search on google groups, it'll come back with the results just fine - but when I click on the View Thread on a result, it tells me it can't display the thread and gives me a link to view that individual message. Then once that message comes up, I click on View Thread on that message, and up pops the whole thread, like it should have before.
:)
Perhaps being on the top is getting to their CPU's
Has anyone else noticed that the "spam" sort of sites that are nothing but link farms and Gator popups are getting much better at finding their way into Google's rankings? I switched to Google back in the day after search engines like altavista became overrun with such sites. Now I've noticed that they occasionally creep into their rankings...I guess entropy is the way of the universe after all.
At the risk of making you look bad, for phrase searches you have to put the phrase in quotes.
For example, I searched for "to be or not to be" phrase origin , and got what I consider to be useful results.
YMMV, of course.
Xentax
You shouldn't verb words.
I realized the other day that although searching for 13 - 867 - 5309 causes google to go into calculator mode, searching for 123 - 867 - 5309 does not cause google to use calculator mode.
All sorts of odd things will both pull up an answer from google's calculator and also do a search - for example, searching for avogadros number or hbar.
So why do searches that might fit US telephone conventions not trigger calculator? Is it because some design decision makes it impossible to trigger both calculator and their phone lookup service. (Yes kids, google is a reverse phone directory, albeit with old data)
"q=site:www.google.com google" - (third result)
This is what I'm seeing...
http://www.sminkybang.com/google.png
should produce about 50% error rate or we are really in trouble ;-)
For any who are interested, Google.ca is behaving correctly. All search results listed (that I've tried so far) from googlewack.com are working properly and returning 1-1 of 1, or displaying as they should.
I wish I could compare to google.com, but for the past year or so, google.com automatically forwards all canadian IP's to google.ca
0110100100100000011000010110110100100000011000100
Does anybody else see the story change? I'm getting two different versions if I reload. One with the additional lines:
"The order of words matters also, with motorcycle candle revealing different results to candle motorcycle."
"Read the Search Basics, compare your notes to GoogleWhack's"
and one without.
Complete text of the two versions are:
"There are always going to be oddities with any big online service, but this one seems to be persisting. Join the discussion in trying to figure out a pattern. For several weeks at least, Google has been returning zero results or "1-1 of about xxx,000" for common searches. One-word searches seem unaffected, but certain two-word combinations of common words like candle truck or speaker bracelet are affected. The strange thing is that usually the 1 or 2 results found are to commerce sites. Have fun looking for patterns but remember that Google always returns slightly different results for different IP numbers."
and
"There are always going to be oddities with any big online service, but this one seems to be persisting. Join the discussion in trying to figure out a pattern. For several weeks at least, Google has been returning zero results or "1-1 of about xxx,000" for common searches. One-word searches seem unaffected, but there are certain two-word combinations of common words like candle truck or speaker bracelet. Reversing the order can affect searches too: motorcycle candles vs. candles motorcycle. The strange thing is that usually the 1 or 2 results found are to commerce sites. Read the Search Basics, compare your notes to GoogleWhack's, have fun looking for patterns, but remember that Google always returns slightly different results for different IP numbers."
Strange.
Ok, now I'm a guy who deals with audio equipment on a regular basis. This, of course, includes speakers. I have never, ever, heard of a speaker bracelet, and can't imagine why one would search for it.
Now this isn't to say that these people havn't perhaps discovered an interesting bug in Google, but trying to play it as a conspiracy for "common" search terms is bullshit. The terms listed are things that no normal person would EVER search for. Hell, they are terms that even someone involved with one of the terms would never search for. Bracelets have nothing to do with speakers. If Google was truly trying to push advertisers, well, they'd be doing a shitty job of it since only geeks with too much time on their hands would discover such things.
Give it a rest, the world is not out to get you. It's either a bug, or Google having some fun (something they are known to do). They are certinaly not trying to pimp a certian manufacturer of speaker bracelets, since such a thing is something that noone would know about, care about or want to own.
For regular searches, Google continues to work great.
No, stories don't have to move through the cluster, and there's no concurrency bug. We have a front-end cluster of webheads but they all read from the same DBs. The only "moving through" is from our main DB to our replicated slave reader DBs, but they are typically only 0 to 1 seconds behind reality, so that's not an issue.
In this case, the problem was that Hemos and I were both editing the story at the same time. He added an icon and posted it at 9:36 EDT live, then I tweaked the text and posted it at 9:38 which was about 40 seconds in the future, then around 9:39 I went back and edited its time back to 9:36... so there were a few seconds there where the story went from front-page to subscriber-only and back.
The Slash backend is obviously too powerful for idiots like us :)
Not wanting to kill anybody, we wait until the last two guys wander up to the candle truck.
I prefer not to even click on that one, and just speculate.
The coolest voice ever.
It's broke. Just put a sign on it and someone call the super.
Strange women lying in ponds distributing swords is no basis for a system of government.
Um, yeah. Actually, I don't know what you're talking about. Entering the phrase "to be or not to be" -- with quotes, so as to indicate you want the phrase, not just the collection of words -- yielded the first two pages of results all having that phrase. Not all of them were for pages on Shakespeare, but then again, that phrase is now deeply buried in the common memespace. If you make the search phrase
you do indeed get results with the phrase and exclusively referring to Shakespeare. Oh, I get it. You don't like the idea you need to actually construct a reasonable search phrase. You're mad that Google isn't, I don't know, telepathic. Your best bet is the SFWIWNFWIS search engine -- search for what I want, not for what I say.
The Mongrel Dogs Who Teach
Mwahhahahah!
1. Register speakerbracelet.com
2. Be the top 1 of 2 search results on google.
3. ????
4. Profit!
Rocket science is easy. Neurosurgery, now *that's* difficult.
I've read that there's a real time search monitor in the lobby of Google's HQ. The nastiest words are removed, but other than that you can se exactly what people are searching for.
They have to be pretty confused right now, when thousands of searches for speaker bracelets, motorcycle candles and candle trucks show up on the display!
Martin
What a cockamamy way to run a search engine.
You are kidding, right? There's a reason that Google is by far the most popular search engine on the web, and it's got a lot to do with the "cockamamy" way it's run.
Perhaps you prefer the good old days when you'd have to check half a dozen search engines and trawl through countless useless links until you found something that was useful.
There are a handful of websites that should be in everyone's bookmarks. Top of the list is Google. Nuff said.
Oh, and as several people will have mentioned by now, and as Google's FAQ surely does, putting your search parameter in quotes will give you exact phrase results. This is pretty standard amongst all search engines, so it's amazing that you don't know this already.
Either you're new to the web and search engines in general or you haven't got a clue how to use one. Regardless, if you're going to comment on how "cockamamy" Google is, you should at least have an idea of how to use it first.
"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
Weird. Very weird. Adding another word to a search should narrow down the result set, not widen it.
Try it.
Flourescent (adj): smelling like ground wheat.
Obviously, Google has to do a lot of acrobatics to keep its service as fast as possible. One of the things it does is distributing its database over a lot of servers. There is no way that they can dynamically sift through hundreds of millions of pages for each common word, so they obviously just look at the top pages for each word. Which pages are top is probably determined by pagerank or something similar.
When you do this, there is no guarantee that you will get hits for every single combination of words out there. However, it may very well be possible to calculate the probability of relevant results not showing up and using this measure to make a more or less optimal trade-off between response time and user satisfaction.
When you start tweaking this trade-off, certain queries are bound to get screwed up. It probably takes them some time to notice this behavior, gather statistics and re-tweak their formula.
Another thing that crossed my mind recently is that they might be using precooked phrases or word collocations instead of single words. This makes sense since they use an implicit AND operator, it improves statistics and words are often strongly correlated anyway so your vocabulary probably wouldn't swell as much as you'd expect.
Mind you, this is pure speculation. I don't have any intimate knowledge about Google's inner workings.
Being well balanced is overrated. -- John Carmack
... and the crazy users wrote scripts to use the Google engine!
(shameless self plug) Its surprising what sites can appear when querying Google. Try my site that queries Google with random words to find random webpages. Its quite powerful and a good timewaster.
google uses tons of DB entries to cross-index pages. I wonder if there's some simple hash-tables per page that it uses internally to speed things up that makes assumptions, and doesn't resolve collisions.
So you can search for one thing, and conceivably the checksum/hashes for each term match those of another page that has nothing to do with it, and it's returned as a relevant match by accident.
This might explain a lot of result sillyness.
Fuck Beta. Fuck Dice
Man, no wonder... You need to turn Safe Search OFF when you look up nasty stuff like that.
Here's Googles somewhat hilarious cache of the Mamufilms.com page. The page includes links for everything from "Peter Paul and Mary mp3" to "preteen bra images". The text is vaguely reminiscent of actual gramatical English. Here's one sentence:
...Nothing interesting here. Just move along...
If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
If you're looking for the product "VB.NET", you need to search for it as a term.
For ordinary searches, punctuation marks like "." are treated as spaces, which mean logical ANDs. And some words (in this case "vb" and "net") are ignored as being too common. If you search for "vb.net", which I suppose is what you get from an "exact phrase", you find "vb" followed by a space or punctuation and then "net".
Google tries to be intuitive, which means guessing what most people would expect, which of course means that sometimes you're surprised.
I too think it sucks that you can't open the window on the airplane.
Ironic, considering that it would suck if you could...
Google doesn't do simplistic phrase matching. If it did, it'd be the same (and as useless) as altavista. Google does relevancy searches. tobeornottobe.com is relevent to a search for "to be or not to be".
I spoke with a friend who helps maintain the google engine. She said that they were running into some problems with a "cleaning agent." Because of all the sites taking advantage of the word revelancy, there are useless sites that simply have a list of words or phrases. It's been posted before that there are many pages designed for GATOR/GAIN spreading or other spyware/adware. She quoted the percentage of junk pages being at 35% to 40%. The cleaning agent was supposed to run through its own searches and check for junk and keep a log.
She didn't say if the problem was that the cleaning agent was clogging searches or if any logged junk pages had been blocked. If so maybe the agent is flawed. In any case, they've stopped using it for the time being.
The counts have been broken for the last five weeks. A count for the word "the" produced fairly consistent results until then of about 3.4 billion. Then it shifted five weeks ago to 5.2 billion. Lately it has been under 2 billion. Now it's just over 2 billion.
Webmasters who have various directories and know exactly how many pages are in each directory, began noticing five weeks ago that Google was reporting approximately twice the number of pages in each directory than have ever existed in that directory. Prior to five weeks ago, Google used to be fairly close to the actual number (assuming that you get a full crawl).
GoogleWatch speculates on the reason why Google has been behaving strangely ever since it stopped doing the traditional deep crawl once per month. The last standard deep crawl was in April but it wasn't used -- Google threw out this data (by their own admission) and reverted to earlier data. The speculative piece was written last June.
Since it was written, Google has started showing "supplemental results" on many searches. It looks like they are running a parallel index. Why would they do this? All the problems Google has been having, along with the supplemental index, seem to support GoogleWatch's theory.