MapReduce — a Major Step Backwards?
The Database Column has an interesting, if negative, look at MapReduce and what it means for the database community. MapReduce is a software framework developed by Google to handle parallel computations over large data sets on cheap or unreliable clusters of computers. "As both educators and researchers, we are amazed at the hype that the MapReduce proponents have spread about how it represents a paradigm shift in the development of scalable, data-intensive applications. MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is: a giant step backward in the programming paradigm for large-scale data intensive applications; a sub-optimal implementation, in that it uses brute force instead of indexing; not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago; missing most of the features that are routinely included in current DBMS; incompatible with all of the tools DBMS users have come to depend on."
I don't know why this article is so harshly critical of MapReduce. They base their critique and criticism on the following five tenets, which they further elaborate in detail in the article:
If you take the time to read the article you'll find they use axiomatic arguments with lemmas like: "schemas are good", and "Separation of the schema from the application is good, etc. First, they make the assumption that these points are relevant and germaine to MapReduce. But, they mostly aren't.
Also taking the five tenets listed, here are my observations:
they don't offer any proof, merely their view... However, the fact that Google used this technique to re-generate their entire internet index leads me to believe that is this were indeed a giant step backward, we must have been pretty darned evolved to step "back" into such a backwards approach
Not sure why brute force is such a poor choice, especially given what this technique is used for. From wikipedia:
Again, not sure why something "old" represents something "bad". The most reliable rockets for getting our space satellites into orbit are the oldest ones.
I would also argue their bold approach to applying these techniques in such a massively aggregated architecture is at least a little novel, and based on results of how Google has used it, effective.
They're mistakenly assuming this is for database programming
See previous bullet
Are these guys just trying to stake a reputation based on being critical of Google?
It's a technical step backwards, they're doing it all wrong, experts say you should do it this other way....
And watch. It'll be massively successful because it works.
Once I saw the word paradigm in the summary I just glazed over like I do whenever our CEO gives a speech.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
Since when did MapReduce have anything to do with databases? It's actually about parallel computations, which are entirely different.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
5. MapReduce is incompatible with the DBMS tools
A modern SQL DBMS has available all of the following classes of tools:
* Report writers (e.g., Crystal reports) to prepare reports for human visualization
Perl? Really Perl was made for doing reports. I am sure that somebody will create a report writer for it. I am just amazed that Chrystal Reports has become the universal solution for so many things.
This is a pretty new bit of kit. If it catches on then people will start porting tools to it. When it comes to database tech I tend to believe that IBM really knows what they are doing. If this interests them I bet there is something too it.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Perhaps the traditional RDBMS experts will return when they can scale their paradigms to datasets that are measured in the tens of terabytes and stored on thousands of computers. Following the airplane rule the solution needs to be able to withstand a crash in a bunch of those hosts without coming unglued.
Now, this is not to say that a more sophisticated approach wouldn't work. It's just that when you have thousands of boxes in a few ethernet segments, communication overhead becomes really quite large, so large in fact that whatever can be saved with brute-force computation it'll usually be worth it. Consider that from what I've heard, at Google these thousands of boxes are mostly containers for RAM modules so there's rather a lot of computation power per gigabyte available to throw away with a brute force system.
Also, I would like to point out that map/reduce is demonstrated to work. Apparently quite well too. Certainly better than any hypothetical "better" massively parallel RDBMS available in a production quality implementation today.
...entry says;
"You seem to not have noticed that mapreduce is not a DBMS."
Exactly. These are the same sort of criticisms that you hear around memcached - the feature set is smaller, etc - and they make the same mistake. It's not a DBMS, and it's not supposed to be. But it does what it does quite well nonetheless!
The Army reading list
"MapReduce is a software framework developed by Google to handle parallel computations over large data sets on cheap or unreliable clusters of computers."
It ought to be a database, but since it isn't a database, it sucks.
"As God is my witness, I thought turkeys could fly." A. Carlson
I'm not at all certain about this but I'd bet that indexes can't solve every problem. I was working on a search routine that would attempt to pick 5 records at random from a database containing potentially a billion records. The search criteria were quite complex and included full-text search of a TEXT field and geographic proximity to a given zip code among other things. They client wanted this done in a fraction of a second.
Personally, I'm amazed at what the various google search engines do and would bet that this technique they describe is what ties together their 200,000 servers. I wouldn't dismiss it so quickly.
in that it uses brute force instead of indexing
Isn't the overhead of a distributed index usually not worth the bother? This scheme sounds similar to the way Teradata handles its distribution and it manages to get a lot done with hardly any secondary indexes. I think the thinking in the article indicates standalone database server box thinking.
I'm glad someone finally had the nerve to put MapReduce into real perspective. MapReduce has absolutely none of the "why didn't I think of that" factor.
it represents a specific implementation of well known techniques developed nearly 25 years ago
/. post told of ray-tracing being soon used for real-time 3D gaming, and how it beats the socks off "rasterized" methods when a critical mass of polygons is involved; the techniques were well known and developed nearly 25 years ago, but only now do we have the CPU horsepower and vast fast memory capacities available for those "old" techniques to really shine. Likewise "old" "brute force" database techniques: they may not be clever and efficient like what we've been using for highly stable processing of relatively small-to-medium databases, but they work marvelously well when involving big unreliable networks of processors working on vast somewhat-incoherent databases - systems where modern shiny techniques just crumble and can't handle the scaling.
There are many classic/old techniques which are only now being used - and very successfully - precisely because the hardware simply wasn't there. A recent
Sometimes the "old" methods are best - you just need the horsepower to pull it off. Clever improvements only scale so long.
Can we get a "-1 Wrong" moderation option?
This article was written from the perspective that map-reduce based architectures is in competition with common relational database architecture. It's not.
Certainly if you were to implement map-reduce within the confines of the relational database world, there are implementation methodologies that would need to be taken to make it easier for the RDBMS developer to work with the storage and querying mechanisms.
The article implies that map-reduce is bad because it doesn't place restrictions common to the database world on developers. When you get down to programming anything at a basic level, the implementation of standards is an optional step to take.
I would agree that abstraction and structure would be good things because developers would be able to concentrate on higher level problems, but I would strongly disagree that anybody learning about map-reduce algorithms should be confined to a particular implementation methodology.
Well, INDBE, but MapReduce seems like a pretty cool idea (even it is old [which in my books does not equate bad]). A similar argument could be made against SQL -- it's not appropriate to all solutions. It's used for most nowadays, in part because it's the simplest to use, but that doesn't make it necessarily better. It (of course) depends on what data you want to represent.
Even more importantly, you can create schemas with MapReduce by how you write your Map/Reduce functions. This is a matter of the datafunction exchange (all data can be represented as a function, likewise all functions can be represented as data). I admit ignorance to how this MapReduce system works, but I would be surprised if you couldn't get a relational database back out.
The advantage is you get with MapReduce is that you aren't necessarily tied to a single representation of data. Especially for companies like Google, which may want to create dynamic groups of data, this could be a big win. Again, this is all speculative, as I have very little experience with these systems.
The reaction seems straightforward enough. The MapReduce paradigm has proved to be very effective for a company that lives and breathes scalability, while it apparently ignores a whole bunch of database work that's been going on in academia. That fact that industry was able to produce something so effective without making use of all this knowledge base at least implicitly undercuts the importance of that work, and is thus threatening to the community which produced that work. Is it any surprise that the researchers whose work was completely side-stepped by this approach aren't happy with the current situation?
Lisp
Even if it was a RDBMS, there are damn good reasons for violating the "rules" in certain situations. If the only tool in your toolbox is a hammer, everything looks like a nail. Knowing the rules and guidelines goes hand in hand in knowing the situations where they don't work or work against you... academics are big on the former and short on the latter that is a real thing in the real world outside of academia.
I had to write a DB application once to handle about 80 full CDs of telephone records from a RDMS. I was able to reduce it so it all fit on one CD and was blazingly fast, but I had to violate several "rules" of proper database programming and layout. It happens.
A sub-optimal implementation, in that it uses brute force instead of indexing
As though these are the exclusive choices. TFA goes on to complain about implementing 25 year old ideas, though they are actually rather older than that--they just didn't strike the RDB types until the eighties. They proceed to insist that the system cannot scale. Arguing google's scalability is like arguing gravity.
illegitimii non ingravare
That's a joke, right?
I think Google's already taken care of all the experimental evaluations you'd need.
In the course of every project, it will become necessary to shoot the scientists and begin production.
If you are starting with a good database, MapReduce is definitely a step backwards. But that isn't what MapReduce is designed to replace. In reality, MapReduce replaces the for loop, and viewed from that perspective, it is a major step forward. Most languages (C, C++, Java, etc.) define the for loop and other iteration facilities in such a way that the compiler can seldom safely parallelize the loop. MapReduce gives the programmer an easy way to convert probably 90% of their for loops into highly scalable code.
"We spent all these years making these complex, elegant algorithms--see how intricate this wonderful indexing algorithm is?--and then they solve things by simply throwing cheap hardware at it. It's not *fair!*"
The point of MapReduce is that It Works. Cheaply. Reliably. It's not a solution for the Cathedral, it's one for the Bazaar.
Comparing it to a DBMS on fanciness is pointless, because the DBMS solution fails where MapReduce succeeds.
The 1st that come to my mind when i read that was the evolution of a programmer, when a "program" evolving started to get back thin in lines didnt meant that were a step backwards.
Sheesh, evil *and* a jerk. -- Jade
I wasn't expecting Google to seize control of the world databases and force people to use their software till at least 2012.
The column was copyright by Vertica. Wouldn't they be concerned about the type of competition that MapReduce presents?
Vertica launches database-focused blog: The Database Column.
Data management is becoming so much more than just the data stored in a DBMS. As a data management geek, it's sad that the authors, experts in my field, fail to put MapReduce in its proper context and recognize its value. My bread and butter is DBMS, and even I could see the potential of MapReduce and the failure of the authors' arguments.
tap
.sig
I gather this is a publication for DBAs. It seems they are worried about their jobs more than anything. With the map-reduce-style databases there isn't a need for any kind of special database expert. The business logic all happens in the application. There is no need for tuning indexes. You don't even need to define a schema. When things get slow any monkey can drop in another computer and you're back up to speed and ready to go.
Traditional RDBMSes have their place, but we're going to see a lot more applications built on this technology in the near future. The big players (Google, Amazon, etc.) have been doing it for quite some time and we're now finally seeing the technology available to the average Joe. It's a very interesting shift in how data is stored and should lead to some interesting applications that we can only dream of today.
"...I taped twenty cents to my transmission
So I could shift my pair 'a dimes..."
I read through the whole article, and was just bemused. According to the article, MapReduce isn't as good as a real database at doing the sorts of things real databases do well. Um, okay, I guess, but MapReduce can do quite a lot of other things that they seem to have missed.
Also, I had a major WTF moment when I read this:
Given the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale.
Empirical evidence to date suggests that MapReduce scales insanely well. Exhibit A: Google, which uses MapReduce running on literally thousands of servers at a time to chew through literally hundreds of terabytes of data. (Google uses MapReduce to index the entire World Wide Web!)
This in turn suggests that the authors of TFA are firmly ensconced in the ivory tower.
They complained that brute-force is slower than indexed searches. Well, nothing about MapReduce rules out the use of indexes; and for common problems, Google can add indexes as desired. (Google uses MapReduce to build their index to the Web in the first place.) And because Google adds servers by the rackful, they have quite a lot of CPU power just waiting to be used. Brute force might not be slower if you split it across thousands of servers!
Likewise, they complain that one can't use standard database report-generating tools with MapReduce; but if the Reduce tasks insert their results into a standard database, one could then use any standard report-generating tools.
MapReduce lets Google folks do crazy one-off jobs like ask every single server they own to check through their system logs for a particular error, and if it's found, return a bunch of config files and log files. Even if you had some sort of distributed database that could run on thousands of machines, any of which might die at any moment, and if you planned ahead and set the machines to copy their system logs into the database, I don't see how a database would be better for that task. That's just a single task I just invented as an example; there are many others, and MapReduce can do them all.
And one of the coolest things about MapReduce is how well it copes with failure. Inevitably some servers will respond very slowly, or will die and not respond; the MapReduce scheduler detects this and sends the Map tasks out to other servers so the job still finishes quickly. And Google keeps statistics on how often a computer is slow. At a lecture, I heard a Google guy explain how there was a BIOS bug that made one server in 50 disable some cache memory, thus greatly slowing down server performance; the MapReduce statistics helped them notice they had a problem, and isolate which computers had the problem.
MapReduce lets you run arbitrary jobs across thousands of machines at once, and all the authors of the article seem to be able to see is that it's not as database-oriented as a real database.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
They should implementation their own Google using "modern techniques" and make billions!!!
And, if mapreduce doesn't generate vast license income for Oracle, it must suck. Imagine the per-processor charges Google would be paying!
September 2011: Looking for Cocoa/iOS work in Boston area Cocoa Programmer Quincy, MA
Sounds like the rumblings of grumpy DBAs.
The whole point of a relational DBMS is to store, link and maintain the integrity of data in tables based on the relationships among the data.
MapReduce is about processing data... it's not focused on maintaining integrity, and the kinds of datasets suitable for MapReduce probably don't have well defined relationships.
Conformity is the jailer of freedom and enemy of growth. -JFK
Indexing works by picking a small slice of the data you have (as a list of hashes), and changing it into a much smaller table mapping the data onto a group of records matching it. The index is smaller and conforms to a certain strict standard, so it's very fast to brute force. Then as you get the list of indices, you brute force them, and this way you get the record.
This works well if you can create such a slice - a piece of data you will match against. It becomes increasingly unwieldy if there are many ways to match a data - multiple columns mean multiple indices. And then if you remove columns entirely, making records just long strings, and start matching random words in the record, index becomes useless - hashes become bigger than chunks of data they match against, indexing all possible combinations of words you can match against results in index bigger than the database, and generally... bummer. Index doesn't work well against freestyle data searchable in random form.
Imagine a database with its main column being VARCHAR(255) and using about full length of it, then search using a lot of LIKE and AND, picking various short pieces out of that column, and the database being terabytes big. Try to invent a way to index it.
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
In the intro they mention that
"a few select universities to teach students how to program such clusters using a software tool called MapReduce [1]. Berkeley has gone so far as to plan on teaching their freshman how to program using the MapReduce framework"
and you would assume that the article argue why this is a bad trend. They may be right that MapReduce might be getting more attention than it deserves but in their article doesn't discuss this at all. Their editor should have pointed out to them that they went way off topic.
1) They don't look like hammers,
2) They don't work like hammers,
3) You can already drive in a screw with a hammer,
4) They aren't good at ripping out nails, and
5) They aren't good at driving nails.
Brought to you by The Hammer Column, a blog written by experts in the hammer industry, and launched by Hammertron, makers of a revolutionary new kind of hammer.
I thought Google search weren't exact. You know, they were more statistical in nature. The entire algorithm is not probably based on absolute number (guessing, but otherwise it would not make sense).
The thing is if Google uses this to create their index-like structure of the internet for their search engine, and it is not exactly like a RDBMS, well, so what? The MapReduce thing seems to be targeted at large sets of data and semi-accurate data mining, not exact results. No one really cares if there are 3,000,000,000 sites or 3,000,000,002 sites with Linux in it somewhere.
Comparing RDBMS to MapReduce is like comparing math function to a paper graph of that function. The first one gives you exact results for all data in its domain. The second gives out quick, pain-free and semi-accurate results for some parts of the domain.
Now, I will not be using MapReduce but then I don't see why Google should not. It is their business.
I understand what they're getting at. What makes modern SQL-driven databases so useful is that they optimize queries. If you're asking for every entry in A that's also in B, any modern database will check whether it's faster to look up every A in B, every B in A, or do a match where both databases are read through sequentially by the same key. The best choice depends on the database record counts, available indices, and key types and lengths. The database system figures that out; it's not in the SQL query.
So the user says what they want, and the system figures out how to do it. It's "do what I mean" that really works. We don't see enough of that in programming.
Google search itself works much more like a database than a map/reduce system. Think about what has to happen when you search for multiple keywords. That's a join, and joins on big data sets take forever if you don't have the right data structures and an optimizer.
...at work. We use it to aggregate millions of dumped events every day, and while it may be missing features that are common in RDBMSes or use brute force rather than special magic, the fact is that we can point it at a cluster of machines and get aggregated stuff out with a lot less computational overhead than if we used anything else. It's not an RDBMS, and we don't use it as one, and therefore don't give a rat's ass if it's any good as one -- it does one thing, and it does it at a good price/scalability/performance/modifiability/ease-of-use multiratio. (And at the risk of being redundant: Photoshop is a crap word-processor, but the problem there isn't Photoshop, it's the fucktard who uses it to write letters.)
This sentence no verb.
Seriously, the DB Community calling something 'backwards' is a joke. Before going after others the DB people maybe should get up to date with their technology and maybe just get rid of that ancient, crappy POS PL called SQL. They should spend their time migrating to some up-to-date LGPLd solution for connection and glue-code. 'Them' using an early 70s interactive terminal hack as cornerstone of their work and calling others 'backwards' is just plain silly.
When rotating HD disks will be replaced by SSDs and start going the way of the do-do, then we'll see who's backwards and outdated. Until then I'd tune low on any wisecracking about something being 'backwards' compared to DB technology.
We suffer more in our imagination than in reality. - Seneca
... is that they misspelled xapping.
That is all.
The whole point of MapReduce is to take an unindexed stream of data and shrink it down based on some criteria where numerous records can be associated (Map) and aggregated (Reduce). It is a process. The *result* of the process is an indexed database, which is often inserted into a relational or time-series database.
It's an apples and oranges comparison, and the author's never eaten an orange.
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
All your comments bring back to my mind the criticisms of XML-based messaging technologies (SOAP, Web services). "A huge step backwards", "incompatible with existing technologies and approaches" (BNF, parsers, languages), "inefficient" (compared to binary formats), etc. Those complaints were right, but they fell on deaf ears, just as these will.... IT is driven by fads and the availability of high-productivity gizmos. Ironically, productivity often suffers in the long run, as people then have to deal with the mess that gets created using approaches that are fundamentally wrong.
So of course rating it like one will fail.
I see map reduce as a really great way to take 10,000,000,000,000 bytes of raw data, map it to a set of computers and reduce the data to a set of tables that could then be placed in a regular database and queried.
Or is that not how google is using it?
Paradigm.
Does that mean Paradigm is a Fnord? As in, I can now say stuff you won't be able to consciously read, because it has the Fnord Paradigm in it?
Don't thank God, thank a doctor!
Note how their blog represents the post as having a single author, when, in fact, it has multiple authors?
That does not sound at all like a database expert to me. It's a simple many-to-many relationship!
Don't thank God, thank a doctor!
a sub-optimal implementation, in that it uses brute force instead of indexing;
For Query-by-Example-like tools, often you cannot predict which columns need indexing: they ALL do. At some point it just seems easier to split the data sets up onto dozens or hundreds of hard-drives and just do a sequencial search on each one in parellel. I cannot say whether it is clearly faster than indexing every column, but it is certainly simpler from a technical standpoint. And, it would possibly require less disk-space because there would only be one copy of each cell, unlike indexing which replicates the contents of the indexed column into the index.
What seems conceptually simpler: maintaining 300 indexes, or simply sequentially scanning tables split across many harddrives? (I've thought about this because I've been kicking around how to build a truely dynamic relational database with auto-columns proof-of-concept because the current "Oracle clones" are too stiff for many kinds of nimble apps.)
Table-ized A.I.
The authors raise two interesting questions on skew and data interchange wrt scaling in section 2. (poor impl), issues others supposedly have solved. Has anyone run into those problems with MapReduce? Are they not important when using MapReduce in the "real world"?
At the risk of quoting myself,
Proponents of MapReduce highlight two advantages:
1. MapReduce makes it very easy to program data transformations, including ones to which relational structures are of little relevance.
2. MapReduce runs in massively parallel mode "for free," without extra programming.
Based on those advantages, MapReduce would indeed seem to have significant uses, including:
* Specialized indexing of large quantities of data. Obviously, MapReduce was built for text indexing of the Web. But it would likely also be useful for, say, preprocessing satellite telemetry or intelligence intercepts, or for doing early steps in large-scale network traffic analysis. MapReduce may not be good for data management, but it looks good for banging stuff into specialized data management systems.
* Computer-scientific research. If you're trying to figure out better ways to, say, digest and analyze huge amounts of astronomical data, MapReduce seems like a great platform. Today's researchers - even the students - aren't nearly as adept at parallel algorithms as one would hope. Perhaps we should take those complications away to let them focus on the unique parts of their work. Breakthrough programming is hard enough anyway, especially if you're trying to do all the work yourself.
To err is human. To forgive is good system design.
I had a lightbulb moment after rereading this thing a few times. The authors of the paper think MapReduce is a distributed query processor, backed by a datastore of unstructured records. They picture this database where every query kicks off a MapReduce operation. Seriously, reread it from that perspective. It makes sense. Too bad for them, their fundamental assumption is wrong. It helps to have even a small amount of experience working with a technology before writing a critique of it.
There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
"You'd be fools to ignore the Boolean anti-binary least-square approach!"
Tsunami -- You can't bring a good wave down!
Just as soon as someone comes up with something else that is a lateral improvment, ( this type of data architecture is a definate improvement. ) someone comes up with the incompatible argument.
HEY! GET A MAC!
It is a definate MISNOMBER to label this type of data architecture 'unreliable.' The failsafe and reliability only make failure a little bit slower. The redundancy is *IMPROVED* by multiple fallover.
I hope this technology takes the industry by storm, making all those 350Lb Database admins actually crack a book.
I mentioned to a java friend of mine, about he was adding interoperability in his CRM to SAP, and I said what a PIG SAP was. He said "No one cares about efficency or formats anymore, its only interoperability, and most of that is just minor 'get this, configure this'
Of course, Being sceptible, I asked a PeopleSoft programmer about interoperability, and the thing she said was "Interoperability is a done deal, we worked that all out with the y3k problem. Its only the us, the programmers that worry about having the data move between clouds. The DBAs dont really care. The real danger to this industury is the EXPENSIVE house of cards that the database infrastructure is, and how cheap/free upstarts like MySQL are making what costs tens of thousands of dollars avaible for free. "
Can you imagine Google Earth as a database browser, like Apples ProjectX/Hot Sauce? ( very very obscure refrence folks ).
This article is even less coherent than a Family Guy episode.
Why the hell are they comparing MapReduce to a DBMS ? I mean, there are some terribly misguided DBMS'es out there (Oracle!), but MapReduce is a distributed computing paradigm.
Saying MapReduce is a crappy DBMS is like saying the Macbook Thin is a crappy pogo stick.
-Billco, Fnarg.com
No, crawling the web isn't a map/reduce type problem. It's a large number of long-running processes feeding a database-like engine.
Map/reduce is for batch-like jobs. Long-running systems with intercommunication have to be organized differently.
Do you like programming? Can you take challenges? Would you match yourself with others?
Here is your chance! Grab it!
The Electrical Engineering Students' Hungarian Association and the Károly Simonyi College for Advanced Studies are proud to present the
8th BME International 24-hour Programming Contest!
If you have missed the previous seven occasions, it is now time to join the adventure!
This contest is a real test of creativity, knowledge, endurance and team-work, an EXTREME CHALLENGE! Sponsors and the offered prizes worth 5000 euros contribute to the high standards of the contest. The team which gaines the utmost points can take home the award and the cup.
Those teams which will have finished registration until 17th February 2008, must do their best during the online preliminary quailifier on 24th February 2008. The best performing thirty teams can participate onsite at the Finals in Budapest Hungary, between 2nd and 4th of May 2008. During the 24-hour advanturous round, the contestants will have to solve one, but extremely complicated and interesting task. They will need all of their knowledge in the field of algorythms theory, artificial intelligence and program design, and also well-used team work competence is desired.
The contestants will be allowed to use their own computers. Technical background and catering will be provided. There are no restrictions on the hardware, software and support they use, but communication with the outside world is strictly forbidden.
For further information and for tasks from the previous years please visit the official homepage of the Challenge where future occurrents will also be available for everyone.
If you are not afraid of eXtreme challenges , you do not have anything else to do just to establish a team of 3 members and register at http://www.challenge24.org/ ! Participation is completely free of charge.
Have fun and good luck!
The Organizers
Deadline for registration: 17th February 2008
Website: http://www.challenge24.org/
For further information please check our website, or contact us by email
8th BME International 24-hour Programming Contest!
The article seems to have been written by someone living in a true IT mindset. Its like suggesting that the space-shuttle's systems should be interconnected with web services because the way they communicate today is old and not user-friendly enough. In reality, it works efficiently and meets all the goals. Mapreduce is similar in that regard.
While there are strong reasons for high level abstractions in applications where you need hordes of cheap available developers (these days most people you see on the street are considered web, database and php designers), there are applications that require custom tools, custom training. And they will be completely unportable, un-reusable, and it doesn't matter.
I think that all really interesting apps fall in this category and require their own unique tools that often need to work on a very low level, and that capable engineers couldn't care less whether they deal with tools that are known to fifty million or five people.
And that is a prerequisite of true innovation, daring to use tools and methods in spite of IT press hacks.
The article states:
However, the paper on MapReduce clearly states:
The column writers claim to be "educators and reasearches" and they can't even read the *only paper* there is on MapReduce?