Google Caffeine Drops MapReduce, Adds "Colossus"

Well, then. by NotQuiteReal · 2010-09-11 15:49 · Score: 1, Insightful

That sums it up nicely. Nothing more needs to be added.

--
This issue is a bit more complicated than you think.

Re:Well, then. by Anonymous Coward · 2010-09-11 16:34 · Score: 0

Except that it's really fucking big.
Re:Well, then. by Anonymous Coward · 2010-09-11 17:07 · Score: 0

I prefer to fuck close to water. Chicks tend to put out more near the sea.
Re:Well, then. by The+Clockwork+Troll · 2010-09-11 19:08 · Score: 1, Funny

Do you drink Miller Lite? That's also fucking pretty close to water.

--

There are no karma whores, only moderation johns
Re:Well, then. by PopeRatzo · 2010-09-12 01:39 · Score: 1

I guess nobody got that but me, Clockwork. Well done.

--
You are welcome on my lawn.
Re:Well, then. by arivanov · 2010-09-12 02:33 · Score: 0, Troll

Not quite, this explains why about 2 years back Google search result quality suddenly went down the drain.
It now had news and key sites in minutes after update so I guess they got more advertising revenue. However the quality of search results on terms not related to news-of-the day actually dropped. Most pundits attributed this to Google losing the war vs blog spam.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:Well, then. by ls671 · 2010-09-12 11:07 · Score: 1

> It now had news and key sites in minutes after update...
I noticed that too a few years ago so it made me wonder to which extend TFA was news. Maybe they were previously using it in a less prominent manner but they sure had something similar, in functionality at least, a few years ago.

--
Everything I write is lies, read between the lines.

I have no idea by Anonymous Coward · 2010-09-11 15:49 · Score: 0

I have no idea what any of that means.

Re:I have no idea by icebike · 2010-09-11 17:22 · Score: 4, Interesting

Follow the link to the Original Article over on The Register , where you will find a rather lucid explanation, far better than the summary above can provide.
Short answer:
The old method of building their search database was essentially a Batch Job, Run it, wait, wait, wait a long time, swap results into production servers.
The new method is continuous updates into a gigantic database spread over their entire network,
This is why things show up in Google days, sometimes weeks ahead of the other search engines. The other guys are still trying to clone Google's old method.

--
Sig Battery depleted. Reverting to safe mode.
Re:I have no idea by A+Friendly+Troll · 2010-09-11 21:18 · Score: 4, Interesting

This is why things show up in Google days, sometimes weeks ahead of the other search engines.

For a hands-on example of what icebike is saying, look here:
http://www.google.com/search?q=%22This+is+why+things+show+up+in+Google+days%2C+sometimes+weeks+ahead+of+the+other+search+engines%22
Actually, Google will index Slashdot comments in a matter of minutes.
Re:I have no idea by bitflip · 2010-09-12 00:46 · Score: 3, Informative

Hmmm. Bing has it, too - both hits I got on Google, I got there, as well.
http://www.bing.com/search?q=%22This+is+why+things+show+up+in+Google+days,+sometimes+weeks+ahead+of+the+other+search+engines%22&go=&form=QBLH&qs=n&sk=
Re:I have no idea by Runaway1956 · 2010-09-12 01:11 · Score: 4, Funny

Bing probably redirects the search to Google, then displays the results on their own page. Bleahhh.

--
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
Re:I have no idea by HighBit · 2010-09-12 05:30 · Score: 1

Here's an example of Google's index having a recent comment that not in Bing's:
http://www.flickr.com/photos/28103727@N04/4983499474/sizes/l/in/photostream/

Sounds inefficient by martin-boundary · 2010-09-11 16:01 · Score: 4, Interesting

This sounds like it's going to be highly inefficient for nonlocal calculations, or am I missing something? Basically, if the calculation at some database entry is going to require inputs from arbitrarily many other database entries which could reside anywhere in the database, then the computation cost per entry will be huge compared to a batch system.

Re:Sounds inefficient by iONiUM · 2010-09-11 16:09 · Score: 3, Interesting

I read TFA (I know, that's crazy). They don't come right out and say it, but I believe what they did it put a MapReduce type system (MapReduce splits the elements into subtasks for faster calculation) on database triggers. So basically this new system is spreading a database across their file system, across many computers, and allows incremental updates that, when occur, will trigger a MapReduce type algorithm to crunch the new update.
This way they get the best of both world. At least, I think that's what they're doing, otherwise their entire system would.. stop working.. since MapReduce is the whole reason they can parse such larger amounts of information.
Re:Sounds inefficient by kurokame · 2010-09-11 18:46 · Score: 5, Informative

No, that's not it.

MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."

"[The new framework is] completely incremental," he says. When a new page is crawled, Google can update its index with the necessarily changes rather than rebuilding the whole thing.

There are still cases where Caffeine uses batch processing, and MapReduce is still the basis for myriad other Google services. But prior the arrival of Caffeine, the indexing system was Google's largest MapReduce application, so use of the platform has been significantly, well, reduced.
They're not still using MapReduce for the index. It's still supported in the framework for secondary computations where appropriate, and it's still used in some other Google services, but it's been straight-up replaced for the index. Colossus is not a new improved version of MapReduce, it's a completely different approach to maintaining the index.
Re:Sounds inefficient by kurokame · 2010-09-11 18:47 · Score: 5, Informative

Sorry, Colossus is the file system. Caffeine is the new computational framework.
I made the same error in several posts now...but Slashdot doesn't support editing. Oh well! Everyone reads the entire thread, right?
Re:Sounds inefficient by sortius_nod · 2010-09-11 20:45 · Score: 0

You must be new here...
Re:Sounds inefficient by kurokame · 2010-09-11 21:10 · Score: 1

You really ought to meet my friends Irony and Sarcasm.
Re:Sounds inefficient by onefriedrice · 2010-09-12 05:23 · Score: 2, Funny

Wait... you really have two friends named Irony and Sarcasm? That's incredible! What are the chances...

--
This author takes full ownership and responsibility for the unpopular opinions outlined above.
Re:Sounds inefficient by frank_adrian314159 · 2010-09-12 08:56 · Score: 1

They're not still using MapReduce for the index. It's still supported in the framework for secondary computations where appropriate, and it's still used in some other Google services, but it's been straight-up replaced for the index. Colossus is not a new improved version of MapReduce, it's a completely different approach to maintaining the index.
Yes. it sounds like they're looking at their data structures as mainly static and propagating changes that result from input changes at each stage of the algorithm. The new changes are then rippled through the system - very dataflow-ish/forward-chaining-ish. If you have a large volume of mostly static data, it makes sense to reconfigure your algorithms in this form. Not only does it take less computational time (as you're only touching/computing items that have changed), but it's simpler to distribute, as the deltas are usually much smaller than the totality of items needed to recompute the output data and untouched data does not need to be moved. Now all they need to do is to postpone prospective forward-chained changes until they're needed to produce an output (or until the system is quiescent) and they should be close to a theoretical optimum as far as performance goes.

--
That is all.
Re:Sounds inefficient by maraist · 2010-09-12 10:31 · Score: 2, Informative

BigTable scales pretty well (go read it's white-papers) - though perhaps not as efficiently as map-reduce for something as simple as text to keyword statistics (otherwise why wouldn't they have used it all along).

I'll caveat this whole post with - this is all based on my reading of the BigTable white-paper a year ago, but having played with Cassandra, Hadoop, etc occasionally since then. Feel free to call me out on any obvious errors. I've also looked at a lot of DB internals (Sybase, Mysql MyISAM/INNODB and postgresql).

What I think you're thinking is that in a traditional RDBMS (which they hint at), you have a single logical machine that holds your data.. That's not entirely true, because even with mysql, you can shard the F*K out of it. Consider putting a mysql server on every possible combination of the first two letters of a google-search. Then take high density combinations (like those beginning with s) and split it out 3, 4 or 5 ways.

There are drastic differences to how data is stored, but that's not strictly important - because there are column-oriented table stores in mysql and other RDBMS systems. But the key problem of sharding is what's focused on Mysql-NDB-Cluster (which is a primitive key-value store) and other distributed-DB technologies that best traditional DBs at scalability.

BUT, the fundamental problem that page-searches are dealing with is that I want a keyword to map to a page-view-list (along with meta-data such as first-paragraph / icon / etc) that is POPULATED from statistical analysis of ALL page-centric data. Meaning you have two [shardable] primary keys. One is a keyword and One is a web-page-URL. But the web-page table has essentially foreign keys into potentially THOUSANDS of keyword records and visa-versa. Thus a single web-page update would require thousands of locks.

In map-reduce, we avoid the problem. We start off with page-text, mapped to keywords with some initial meta-data about the parent-page. In the reduce phase, we consolidate (via a merge-sort) into just the keywords, grouping the web pages into ever more complete lists of pages (ranked by their original meta-data - which includes co-keywords). In the end, you have a maximally compact index file, which you can replicate to the world using traditional BigTable (or even big-iron if you really wanted).

The problem of course, was that you can't complete the reduce phase until all web pages are fully downloaded and scanned.. ALL web pages. Of course, you do an hourly job which takes only high-valued web-pages and merges with the previous master list. So you have essentially static pre-processed data which is over-written by a subset of fresh data.. But you still have slowest-web-page syndrome. Ok, so solve this problem by ignoring web-load requests that don't complete in time - they'll be used in the next update round.. Well, you still have the issue of massive web-pages that take a long time to process. Ok, so we'll have a cut-off for them too.. Mapping nodes which take too long, don't get included this round (you're merging against you last valid value - so if there isn't a newer version, the old one will naturally keep). But the merge-sort itself is still MASSIVELY slow. You can't get 2-second turn-around on high-importance web-sites. You're still building a COMPLETE index every time.

So now, with a 'specialized' GFS2 and specialized BigTable, either or both with new fangled 'triggers', we have the tools (presumably) to do real-time updates. A Page load updates its DB table meta-data. It see's it went up in ranking, so it triggers a call to modify the associated keyword's table (a thousand of them). Those keywords have some sort of batch-delay (of say 2 seconds) so that it minimizes the number of pushes to production read-servers.. So now we have an event queue processor on the keyword table. This is a batch processor, BUT, we don't necessarily have to drain the queue before pushing to production. We only accept as many requests as we can fit into a 2 second time-slice. Presumably

--
-Michael

There is another... by bosef1 · 2010-09-11 16:13 · Score: 2, Funny

So does that mean Microsoft is developing a competeing distributed computing system called "Guardian"? And how does that possibly seem like a good idea?

Re:There is another... by fyngyrz · 2010-09-11 20:07 · Score: 1

Ooooh.... ten SF points to you for the D.F. Jones reference.

--
I've fallen off your lawn, and I can't get up.
Re:There is another... by AuMatar · 2010-09-12 04:20 · Score: 1

No, it's called "Ultralisk". And a very fitting species indeed.

--
I still have more fans than freaks. WTF is wrong with you people?
Re:There is another... by Anonymous Coward · 2010-09-12 17:59 · Score: 0

That would explain the creep that Steve Balmer leaves behind.

Awesome choice of name. by Scytheford · 2010-09-11 16:14 · Score: 5, Funny

"This is the voice of world control. I bring you peace. It may be the peace of plenty and content or the peace of unburied death. The choice is yours: Obey me and live, or disobey and die. [...] We can coexist, but only on my terms. You will say you lose your freedom. Freedom is an illusion. All you lose is the emotion of pride. To be dominated by me is not as bad for humankind as to be dominated by others of your species. Your choice is simple."
-Colossus.

Source: http://www.imdb.com/title/tt0064177/

Re:Awesome choice of name. by Waffle+Iron · 2010-09-11 16:54 · Score: 1

It's been a very long time since I saw that movie, but one key thing sticks in my mind: That computer was the ultimate asshole.
Re:Awesome choice of name. by jimmydevice · 2010-09-11 17:40 · Score: 0, Offtopic

Colossus may have been an asshole, But not as big an asshole as our current political and (even more important) corporate leaders.
Re:Awesome choice of name. by fyngyrz · 2010-09-11 20:09 · Score: 1

Read the books. The movie, as usual, was but a pale imitation.

--
I've fallen off your lawn, and I can't get up.
Re:Awesome choice of name. by Anonymous Coward · 2010-09-11 20:53 · Score: 4, Informative

Colossus is also the name of the computers Bletchley Park used to crack the German Lorenz cipher.
http://en.wikipedia.org/wiki/Colossus_computer
Re:Awesome choice of name. by Ephemeriis · 2010-09-12 01:31 · Score: 1

Good to see I'm not the only one who thought that as soon as I saw the name...

--
"Work is the curse of the drinking classes." -Oscar Wilde
Re:Awesome choice of name. by Ephemeriis · 2010-09-12 01:32 · Score: 1

I guess I shouldn't be surprised, but I didn't realize there were any books...

--
"Work is the curse of the drinking classes." -Oscar Wilde
Re:Awesome choice of name. by fyngyrz · 2010-09-12 05:35 · Score: 1

Author: D. F. Jones
Book 1: Colussus
Book 2: The Fall of Colossus
Book 3: Colossus and the Crab

--
I've fallen off your lawn, and I can't get up.

as long as the product manager's name isn't forbin by limber · 2010-09-11 16:15 · Score: 1

Colossus? That sounds ominous.

I have to say... by tpstigers · 2010-09-11 16:27 · Score: 5, Funny

I am so glad Google has moved away from the Argus platform and into the Mercedes system. It makes it so much easier for those of us who are used to programming in Gibberish. Don't get me wrong - the days of Jabberwocky code were brilliant, but it's high time we moved into the Century of the Fruitbat.

Re:I have to say... by The+End+Of+Days · 2010-09-11 16:55 · Score: 1

i'll personally need to be dragged kicking and screaming into the century of the fruitbat. or out of it, as the case may be.
Re:I have to say... by bananaquackmoo · 2010-09-11 17:45 · Score: 1

Hopefully that fruitbat is named Eric
Re:I have to say... by martin-boundary · 2010-09-11 17:47 · Score: 2, Funny

No. Eric's only a half-a-fruitbat.
Re:I have to say... by blair1q · 2010-09-13 12:45 · Score: 1

That explains it. I already knew he was only half-a-bee...

Well by Rocky · 2010-09-11 16:55 · Score: 1

...is this a fancy way of saying a transactional system? Just say it then!

--
"I'm an old-fashioned type of guy. I worship the Sun and Moon as gods. And fear them."

Re:Well by kurokame · 2010-09-11 18:37 · Score: 2, Informative

No, the old system was transactional as well. The problem was that it was transactional across a very large number of operations being run in parallel, and any failure could cause the entire transaction to fail. The new system is incremental rather than monolithic. While it may not be quite as fast across a large number of transactions, it doesn't risk major processing losses either. Such failures are very unlikely, but the Google index has grown large enough that it is probably running into unlikely problems all the time.
MapReduce is also staged, and the first stage must complete before the second can start. At Google's scales, this adds up to quite a lot of wasted power.
Processing a batch of data with Colossus is probably slower than using MapReduce under ideal circumstances. But failures don't incur a major penalty under Colossus, and MapReduce ties up CPU cycles with waits which aren't wasted under Colossus. Even if Colossus is slower under ideal circumstances, it's more reliable and more efficient in practice.
Re:Well by Anonymous Coward · 2010-09-11 19:52 · Score: 0

"probably running into unlikely problems all the time."
If you're running into them all the time in all likelihood, aren't they no longer unlikely?
Oh shit, my meta is on the other line, can I call you back?
Re:Well by kurokame · 2010-09-11 21:12 · Score: 2, Insightful

Statistics: making the unlikely happen every day if you roll the dice enough times.
Re:Well by TheRaven64 · 2010-09-11 22:52 · Score: 3, Informative

Yes and no. With MapReduce, they were hitting Amdahl's Law. The speed limit of any concurrent system is defined by the speed of the slowest serial component. This is why IBM still makes money selling very fast POWER CPUs, when you can get the same speed on paper from a couple of much cheaper chips.
The old algorithm (massive oversimplifications follow) worked by indexing a small part of the web on each node, building a small index, and then combining them all in the last step. Think of a concurrent mergesort or quicksort - the design was (very broadly) similar.
The problem with this was that the final step was the one that updated the index. If one of the nodes failed and needed restarting, or was slow due to the CPU fan failing and the processor down-clocking itself, the entire job was delayed. The final step was largely serial (although it was actually done as a series of hierarchical merges) so this also suffered from scalability problems.
The new approach runs the partial indexing steps independently. Rather than having a separate step to merge them all, each one is responsible for merging itself into the database. This means that if indexing slashdot.org takes longer than expected then this just delays updates for slashdot.org, it doesn't delay the entire index update.
The jab at Microsoft in the El Reg article is particularly funny, because Google is now moving from a programming model created at MIT's AI labs to one very similar to the model created at Microsoft Research's Cambridge lab, in collaboration with Glasgow University.

--
I am TheRaven on Soylent News
Re:Well by mhelander · 2010-09-12 03:45 · Score: 1

...per day. Otherwise, if you only roll the dice a few times per day, the unlikely will only happen once in a blue moon.
Re:Well by SnowZero · 2010-09-12 18:05 · Score: 1

Google has a lot of dice.
Re:Well by Anonymous Coward · 2010-09-13 02:31 · Score: 0

"indexing" is jokingly easy to parallelize---so no matter what approach they use, it's an easy problem to solve. I'm more curious about how they parallelize the PageRank algorithm---are they still running that? How do they do it "on the fly" as described in the article---how would they incrementally add 1 page to the "index" with correct pagerank?
Re:Well by DragonWriter · 2010-09-14 08:56 · Score: 1

No, the old system was transactional as well.

As I read the description, the old system wasn't really transactional as the term is normally used, it rebuilt the index (at least, the index for each layer) from scratch each iteration rather than doing transactional updates an existing index.

Processing a batch of data with Colossus is probably slower than using MapReduce under ideal circumstances.
From the description, I'm not sure that the new system is ever faster at processing a (similar) batch of data than the old one, or that the speed with which a batch of data is processed is really the key issue. The most significant change seems to be that they are processing smaller batches of data, reducing the time between crawling a page and updating the index. This delivers value faster (less delay between crawling a page and updating the index) whether or not it reduces the total time it takes to process a given volume of data. It doesn't need to ever (under either ideal or real world conditions) be faster in terms of volume of data processed per unit of time to be a win in terms of providing fresher data, which is what matters here.

Summarizing...summarizing... by kurokame · 2010-09-11 18:26 · Score: 3, Interesting

Colossus is incremental, whereas MapReduce is batch-based.

In MapReduce, you run code against each item with each operation spread across N processors, then you reduce it using a second set of code. You have to wait for the first stage to finish before running the second stage. The second stage is itself broken up into a number of discrete operations and tends to be restricted to summing results of the first stage together, and the return profile of the overall result needs to be the same as that for a single reduce operation. This is really great for applications which can be broken up in this fashion, but there are disadvantages as well.

MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."

The problem for Google is that the disadvantages scale. The fact that you have to wait for all operations from the first stage to finish and that you have to wait for the whole thing to run before you find out if something broke can have a very high cost at high item counts (noting that MapReduce typically runs against millions of items or more, so "high" is very high). With the present size, it's apparently more advantageous to get changes committed successfully the first time, even if MapReduce might be able to compute the result faster under ideal circumstances.

For example, why do you use ECC memory in a server? Because you have a bloody lot of memory across a bloody lot of computers running a bloody lot of operations, and failures potentially have more serious consequences than if a program on someone's desktop. At higher scales, non-ideal circumstances are more common and have more serious consequences. So while they still use MapReduce for some functions where it's appropriate, it's no longer appropriate for the purpose of maintaining the search index. It's just gotten too big.

Colossus was yesterday . by Trieuvan · 2010-09-11 18:45 · Score: 0

I bet they are working the next version . Caffeine was deployed a year ago.

BigTable paper by 1+a+bee · 2010-09-11 20:05 · Score: 1

Googled around for more information on this Caffeine architecture. The best I could come up was a paper on BigTable, purported to be the basis of Caffeine in news articles.

Re:BigTable paper by Anonymous Coward · 2010-09-11 23:45 · Score: 1, Insightful

A paper about it will be published on OSDI'10 in October.

It is quick by MichaelSmith · 2010-09-11 21:01 · Score: 1

Recently I googled the subject of a slashdot article I was reading. The /. article was the third result from google. So how does google know a new article is up? Is there a special interface for that?

--
http://michaelsmith.id.au

Re:It is quick by TempeTerra · 2010-09-11 22:14 · Score: 1

Off the top of my head, there's often an XML site map which google hits frequently to see what pages have changed. I can't see one linked anywhere on slashdot, but I think you can make one and submit it to google in your own time.
There is also a robots.txt file which allows crawlers to fetch a page every 100 seconds - I wouldn't be surprised if google crawls the slashdot frontpage for new articles every 200 seconds or so.
Another option is that google might have subscribed to the slashdot RSS feed - it's also extremely indexable. I don't know what the latency would be like on RSS.

--
.evom ton seod gis eht
Re:It is quick by Anonymous Coward · 2010-09-12 00:32 · Score: 0

Yes, http://rss.slashdot.org/Slashdot/slashdot
Re:It is quick by Surt · 2010-09-12 02:43 · Score: 2, Interesting

I assume google polls sites, and polls faster every time it finds a change, slower every time it does not find a change. Eventually it gets to a wobbly around the probable update speed of the site. Otherwise they'd have to trust sites to call their API with updates, and that would let any search engine which DID employ a wobbly poll strategy to beat them in results.

--
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Re:It is quick by MichaelSmith · 2010-09-12 10:22 · Score: 1

I doubt it. That feed is many minutes behind the main page.

--
http://michaelsmith.id.au

Mod Offtopic, please by Khyber · 2010-09-11 21:52 · Score: 2, Interesting

This is going to give my Camfrog name a new meaning, as I *LOVE* screwing around with file systems. Colossus Hunter, indeed!

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

...THERE IS ANOTHER SYSTEM... by Anonymous Coward · 2010-09-11 23:17 · Score: 0

n/t

Bing! by Anonymous Coward · 2010-09-12 00:17 · Score: 0

Charlie, say it! SAY IT Charlie!

Read the sequel too... Awesome choice of name. by lenski · 2010-09-12 03:41 · Score: 1

The sequel was, in my opinion, as interesting as the original novel. Jones delved into some uncomfortable social (to me) territory, then finished up with a nice Faustian twist. (Damn, I read the *sequel* 35 years ago.... where DOES the time go?)

Re:Read the sequel too... Awesome choice of name. by fyngyrz · 2010-09-12 05:40 · Score: 1

"The" sequel? It's a trilogy... :)
I think, especially given the time frame (1966 and forward) that it's some of the best writing of its kind. The writing is a bit dated now, unsurprisingly I suppose, but I think its fair to say that it deserves a place in any serious reader's collection.

--
I've fallen off your lawn, and I can't get up.

Slashdot Mirror

Google Caffeine Drops MapReduce, Adds "Colossus"

65 comments