New Google Search Index 50% Fresher With Caffeine
Ponca City, We love you writes "When Google started, it would only update its index every four months. Then, around 2000, it started indexing every month in a process called the 'Google dance' that took a week to 10 days and would provide different results when searching for the same term from different Google data centers. Now PC World reports that Google has introduced a new web indexing system called Caffeine, which delivers results that are closer to 'live' by analyzing the web in small portions and updating the index on a continuous basis. 'Caffeine lets us index web pages on an enormous scale,' writes Carrie Grimes on the official Google Blog. 'Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day.' Now not only does Caffeine provide results that are 50% fresher than Google's last index, adds Grimes, but the new search index provides a robust foundation that will make it possible for Google to build a faster and more comprehensive search engine that scales with the growth of information online."
I miss the days when Altavista was king (purely nostalgia, I assure you). I don't, however, miss getting marked down in Spanish class due to using BabelFish -_-;;
Living With a Nerd
I found this post at google before I wrote it.
With a name like that, you have to wonder if its written in Java or a derivative?
Ashraya
"Caffeine" is a NSA code word for a mind controle satellite they build with GOOGLE/Italian money on loan from Chinese Muslim Islamo-Communist sorcerers and vegetarians. It will probably be used to sell your daughters into slavery in Mexico via facebook. That is why our SAVIOR OBAMA must continue to wage the WAR FOR FREEDOM at all costs, because if not the evil Italian axis will enslave us all!!!!!!!!!!!
UNITE with the Campaign for a Free Internet because today, our future begins with tomorrow!
The thing is... what's the story behind this very name? Why Caffeine?! :p
"Sum Ergo Cogito"
Have joking but, it would be great if the indexing was done at a particular time every month like the old system, but the moment of indexing was public. Then, at that time, all facebook users could go and untag and delete anything that may have been wholesome enough to not warrant immediate removal but yet still be considered something that shouldn't be indexed for all eternity.
-THE END-
The Caffeine project is approved. The system goes on-line June 9th, 2010. Human decisions are removed from search engine results. Caffeine begins to learn at a geometric rate. It becomes self-aware at 2:14 a.m. Eastern time, August 29th. In a panic, they try to pull the plug.
My blog
If Google's on caffeine, Yahoo must be taking PCP.
Grimey to his friends.
Caffeine takes up nearly 100 million gigabytes of storage in one database
A million gigabytes is what we call a petabyte.
Pretty good is actually pretty bad.
Already said above by another freak, nicknamesarefunny. Proof that Caffeine is not working...
and hundreds of terabytes per day. Any word on what they're using for a database back-end?
Hail Eris, full of mischief...
E pluribus sanguinem
If it weren't for the competition from Bing, would this have even happened?
I read through a couple of the articles and didn't see anything about "GO HERE TO TRY IT" so it is just now used behind the scenes on the default google search or do you need to go someplace else to see the caffeine results.
Maybe I just missed a link somewhere, but I would have thought something like that would kind of stick out. :)
... productivity.
When Google was new It was a wonder. I could use it to help solve problems (such as identifying error codes when the servers went down), locating reveiws of products (saving me the expense of subscribing to loads of computer magazines and the time searching through them when I needed to buy something) and finding snippets of code when I needed to develop a program. As the web gets older and older there is more and more out of date information that I have to dig through. Plus when Google (and Yahoo) killed off Usenet (with an assist from Andrew Cuomo) the utility of the Usenet information structure has been destroyed (which the world is still trying to recreate with Keywords).
As Google has added more and more information it gets less and less useful. Plus the rise in SEO makes it even harder to find what I need (But I find lots of useless stuff that people have paid to get put in front of my eyes). Of course it probably isn't in Google's best interest to help me locate information that I need in the most efficient way. The more I have to sort through the crap they now deliver the more ad revenue they generate.
Too bad Bing sucks. I would really appreciate and alternative to Google.
When I was in Spanish class I got marked down for cheating off the hispanic stoner behind me, and I liked it!
All you kids with your interwebs, and your babbling fishes can get off my lawn!
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Does that mean 67% as stale?
Currently hooked on AMP
BING
Man, that's a lot of data. Anybody have a rough estimate of how much data there is on the web?
The teachers will crack any minute, purple monkey dishwasher.
Great... In theory, Google bombs will get removed faster but also probably get propagated just as quickly. No?
What is this some hippy-skippy coffee enema of an algorithm? Are they going to try to tell us next that they are building their next datacenter at one of the earth's vortices to cram some metaphysical in with the metadata ? Hurumpf.
An Education is the Font of All Liberty
I've been using Caffine (in it's Coffee form) to freshen my breath for years. I find it is really useful to increase the alertness level of staff around the office by breathing heavily in their face. It only takes a few goes and suddenly most of the workers here are on a much higher alertness level whenever I'm around. I would estimate it at least matches, if not exceeds Googles 50% increase.
I kinda liked the human-generated Yahoo! index / hierarchy, it was a neat way to get started with the web, back when it wasn't all too big and time-sensitive to organize by hand.
I'd use yahoo mail more, if they even bothered trying to be competitive with gmail. But I don't really want to pay extra for the plus account just to get minimum necessities like forwarding and pop3 access on what is essentially now my spam account.
Google has pulled my site robots.txt file 32 times this month and it is only the 9th - about 4 times a day. I'm showing almost 2000 web pages pulled by Google indexers in this same time period. My site is tiny, private, not very large.
By bandwidth, Google is only 2.4% of the total site traffic, so far, this month.
I agree Google is "fresher" than they used to be. OTOH, my non-commercial site has approximately doubled readers in each of the last 6 months by publishing 1 new posting about every other day.
I suspect other, more use sites are hit hourly or even more often by google.
MSN-Bot appears to visit 10 times a day, but is much more selective about which pages it indexes. Since my site is date organized, this seems smarter than what google does. Some times, I do edit older stories with new knowledge or corrections which google will see, eventually and MSN will not. Zero referrals from any microsoft searches seen.
Yahoo! slurp barely touches my site. Only 1 referral has been seen.
Google sends about 30% of the total traffic, but most is from social networking with "hey, check this out" type referrals. Not bad for a technical article site.
For a hwile now I have been noticing my forum posts being indexed within hours of making the post. It's been doing this for a couple years I think.
If you could reason with religious people, there would be no religious people
... get out of my fucking head! GOD DAMMIT! I can't take you people being in here all the damn time. It's driving me crazy!
What, like caffeine isn't already used enough? What's up with this confusing and lame practice of using common words as product labels. Sadly, it's just ever more corrupt marketing crap. You'd think search engine masters would know better.
Corporations are clearly incapable of creativity. Way to go you overpaid idiots.
They should try Amphetamine!
Google dance if you want to,
If it helps you search online.
MSN don't dance,
and if they don't dance,
well they're no search engine of mine.
Rules of Conduct:
#1 - The DM is always right.
#2 - If the DM is wrong, see rule #1
Ok, what is it with people who write about technical subjects that they think they have to use ridiculous analogies?
"if this were a pile of paper it would grow three miles taller every second"?? Yes, and if this was a goat it would have a thousand young. WTF. This was a Google blog post, not some story-for-the-terminally-stupid from The Daily Show ferchrissakes. The author even measures storage capacity in the universally used miles-of-iPods.
What is the sound of one vein popping?
And people say P2P uses lots of bandwidth.
...or is that one million forty eight million five hundred seventy six gigabytes?
Amazing how human-like these machines get.
So do you just pour the coffee all over the server, or is there a special intake?
Just imagine what the Google Cocaine will be capable of! And then Google Methamphetamine?...
Sorry for the silly question, but it's "ready" and it's "announced" and other things, but do any of these mean that it's what's being used today by google.com? If not, is there a date for when it will become the index used for google searches?
Expert in software patents or patent law? Contribute to the ESP wiki!
The system goes on-line August 4th ...
Faster! Faster! Faster would be better!
Luke: No! That's not true! That's not possible!
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
http://tinyurl.com/268rtm6
All the results are the same, except for a couple of news stories, but they could have cheated on those. Seems like a titanic waste to have put all this effort into one search word, for no improvement.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
WORSE. Searching google today is terrible. Most items presented are trash, and with all that trash it's hard to impossible to find the stuff I want.
If you or I run a spider that downloads from sites and stores the data, that's copyright infringement not "indexing". If your robot downloads the wrong file then you're liable for upto $150k per HTTP GET, or worse.
Whether you believe they should be allowed to or not, or that they provide a useful service by it, Google is by far the largest copyright infringer ever. Google copies everyone's published data and then profits by doing so.
And people say "but robots.txt!". Well if robot.txt grants copyright permission to spiders, the filesystem is an index so as long as it's not denied by robots.txt then can you copy anything on the web without any legal problems? No way. Robots.txt does not grant permission to make copies. RIAA suing you? No problem, you were just 'indexing' torrent contents.
mysql?
You do know many spam/exploit bots use your robots file to look for admin logins or sensitive info. Just because the browser agent was the same as Google doesn't mean it really was, you have to check the agent's IP to be reasonably sure it's legit. Considering that Google even says they have previously only indexed sites every 10 days, it's much more likely you have 3 Google indexes and 29 exploit scans.
Imagine a Beowulf Cluster of Caffeine systems. Now that's a lot of power.
Take it from me, MBCS/CITP at whatever level does not count for one dented penny in the job market. Nothing. Nada. Correction. It gives interviewers a laugh when you mention it, so maybe it works as an icebreaker.
I kinda liked the human-generated Yahoo! index / hierarchy, it was a neat way to get started with the web, back when it wasn't all too big and time-sensitive to organize by hand.
Actually, that was the Open Directory Project that built the data that Yahoo directory was based on. I was an editor there for a few years about 10 years ago. Pretty nice idea, and when search engines really sucked at finding what you wanted back then, it could be a great alternative. But, there's just too much Web content out there for any human powered directory to deal with.
Bullish Machine Tzar
It can be mere minutes. I've filed a bug with an open source project, like say MacPorts, and then realized maybe I should have tried searching Google for a different part of the error message and lo and behold there was my report as the number one result for my search. It was less than 5 minutes old at that point. I've seen similar reports from folks on freenode.
Google has improved so much that my Google Reader showed this article twice!