USPTO Grants Google a Patent On MapReduce
theodp writes "Two years ago, David DeWitt and Michael Stonebraker deemed MapReduce a major step backwards (here are the original paper and a defense of it) that 'represents a specific implementation of well known techniques developed nearly 25 years ago.' A year later, the pair teamed up with other academics and eBay to slam MapReduce again. But the very public complaints didn't stop Google from demanding a patent for MapReduce; nor did it stop the USPTO from granting Google's request (after four rejections). On Tuesday, the USPTO issued U.S. Patent No. 7,650,331 to Google for inventing Efficient Large-Scale Data Processing."
Guess they had to burn some of the karma they earned for standing up to China.....
I want peace on earth and goodwill toward man.
We are the United States Government! We don't do that sort of thing.
Just the other day I couldn't sign up for a gmail account without google demanding my mobile telephone number!
A somewhat optimistic guess is that they'll be restricted to using this defensively. Are they really going to sue Hadoop, the open-source implementation of MapReduce? Hadoop not only implements a version of MapReduce, it even uses its name, so is not at all coy about being a direct infringement of this patent. And yet, I would be surprised if Google sued them, or the many people using it. They certainly haven't said anything yet, as far as I can find--- when things like Amazon Elastic MapReduce were launched, I can't find record of Google saying, "hey, you're stealing our tech!"
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Google has at least 173 issued patents as well as over two hundred pending applications. That doesn't include the various patents (such as the PageRank patent) that it is the exclusive licensee for but does not actually own (Stanford owns it). Google's software patent strategy dates back to at least 1997, when it filed this application, which actually predates the PageRank application.
Your summary of the situation "isn't even wrong". One does not "demand" a patent, one writes an application which is then examined against prior art and other bars to patentability. Seriously, who wrote this?
How do we know the patent is awarded? I'm no expert on reading patents, but I don't see any references to a patent status there.
"You know, Hobbes, some days even my lucky rocketship underpants don't help" -- Calvin
Does this endanger the Hadoop project, or projects using Hadoop? Its MapReduce implementation is a rather crucial part.
Before you go acusing Google of doing Evil (TM), think. If they don't do this, some troll will. The troll will lose, but Google will waste a lot more money defending against it.
This is why IBM takes out so many patents too. Most of them are "defensive" patents.
We (that being everybody except the USPTO) could agree not to take out any more software patents, and the industry would breathe a collective sigh of relief. Trouble is, it only takes a few bad apples to spoil that approach. It's the same reason Communism didn't work.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
We're probably never going to get rid of software patents, odious as they are; at this point there are too many enormous players, of which Google is not at all the worst offender, with way too much invested in them. But it occurs to me that one change to patent law that might be politically feasible, and which would really help cut down on clearly frivolous patents like this one:
If any claim in the patent is held to be invalid, the entire patent is invalid.
Claim 1 of the patent is simply an arcane, legalistic description of the operation of pretty much every parallel processing algorithm ever. Some of the subsequent claims actually do describe novel, non-obvious, and useful ways of handling large data sets across multiple processors. If the patent were restricted to these claims, well, it would still be a software patent and therefore Evil, but it might at least have some claim to promoting "the progress of science and the useful arts."
In general, it seems like this would make both patent trolling, and big companies like Google lawyering small independent developers to death, a little more difficult.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
I wrote a parallel application to process scientific data on multiple servers at a previous place I worked, using just SQL statements with a mod function on a primary key. The resume builders there then hired a consultant to help them rewrite the whole thing (excluding the core atomic algorithm part) using Hadoop and MapReduce, because the previous one didn't use Hadoop and MapReduce. They made a total mess and it's so hard to configure and deploy that IT still uses the version I wrote a year before.
The greybeards have a point there. In my branch of signal processing where have gone through cycles several times as computer hardware evolves. In my experience we've been through minicomputers, array processors, workstations, clusters, stream processors, multi-cores etc. Each configuration as different balance of CPU speed, memory size, memory bandwidth, and so on. So we've gone through the difference algorithms, the integral algorithms, the spectral, the local-transform, cyclic matrices, etc. back and forth several times. Sometimes each new generation of grad students feels it has invented something new if sloppy work by their faculty advisor doesnt correct them.
- did the submitter actually read the claims, before asserting that it was obvious and/or anticipated?
Here's claim 1 (it's a monster): 1. A system for large-scale processing of data, comprising:
a plurality of processes executing on a plurality of interconnected processors;
the plurality of processes including a master process, for coordinating a data processing job for processing a set of input data, and worker processes;
the master process, in response to a request to perform the data processing job, assigning input data blocks of the set of input data to respective ones of the worker processes;
each of a first plurality of the worker processes including an application-independent map module for retrieving a respective input data block assigned to the worker process by the master process and applying an application-specific map operation to the respective input data block to produce intermediate data values, wherein at least a subset of the intermediate data values each comprises a key/value pair, and wherein at least two of the first plurality of the worker processes operate simultaneously so as to perform the application-specific map operation in parallel on distinct, respective input data blocks; a partition operator for processing the produced intermediate data values to produce a plurality of intermediate data sets, wherein each respective intermediate data set includes all key/value pairs for a distinct set of respective keys, and wherein at least one of the respective intermediate data sets includes respective ones of the key/value pairs produced by a plurality of the first plurality of the worker processes; and
each of a second plurality of the worker processes including an application-independent reduce module for retrieving data, the retrieved data comprising at least a subset of the key/value pairs from a respective intermediate data set of the plurality of intermediate data sets and applying an application-specific reduce operation to the retrieved data to produce final output data corresponding to the distinct set of respective keys in the respective intermediate data set of the plurality of intermediate data sets, and wherein at least two of the second plurality of the worker processes operate simultaneously so as to perform the application-specific reduce operation in parallel on multiple respective subsets of the produced intermediate data values.
That's one heck of a detailed claim. Infringement would require some effort; anticipation (every limitation appearing in a single document, arranged in the same manner as the claim) is unlikely.
My research are is HPC, and I sometimes have toyed with trying to work for Google. They seemed like something special.
Now that they're pursuing unjustifiable software patents, I'm forced to sadly put Google into the same mental category as Microsoft and IBM. Like the other two companies, Google does some cool stuff, but I wouldn't feel much better about working for Google than I would for IBM or Microsoft.
Sad.
I didn't know what MapReduce was so I looked it up:
crazy dynamite monkey
A patent is only worth it's strength in court. The USPTO has clearly given up trying to judge if a patent is truly worthy on their own, relying on the courts to decide afterwards when a patent is put to use and put to the test - in court.
What bothers me the most is the fact that anyone can get a patent for anything as long as they keep revising their application.
At the end of the day, those with the biggest wallets will get their patents, and they will also have their guns to fight and win in court.
how do you get a patent awarded on something that has already been released as "open source" (Hadoop)
This does not add up, either Hadoop is not really open source, or US patent office are as FCKING stupid as EVERYONE seems to think they are.
Come on people, don't you get tired of the shame of working for such an organization....don't you want to see freedom and democracy restored to the world..?>?>
Google has never asked for mobile numbers (except, of course, if you want to sign up for google voice).
This article makes reference of MapReduce detractors. Here is my response to them:
With cloud computing pricing following Moore's Law, the cost of distributed brute force is headed to $0. This is preferable to most users than:
a) getting screwed by Oracle and other proprietary DBMS vendors on licensing costs
b) getting screwed by vertically scaled big iron hardware vendors for running enough horsepower for your large Oracle footprint.
At this point I think I'd read Mac Tablet rumors...
Sorry about the mess.
Actually, it can't be 7 years - It's only been around since 2004.
But the real sentiment is that "MY" culture never makes anything that sucks. Incidentally, I'm also always right. I once thought I was wrong, but that was a mistake. If that ever changes it's because evil democrats "healthcared" my good brain cells out and replaced them with more politically correct ones.
Any technology that has been sold or in use for over a year is unpatentable.
A patent based on such technology may not stand up in court, but to start with, in practice "patentable" means something the USPTO will issue a patent on. And the examiner looking at whether to grant such an issue may not be familiar with relevant prior art, not to mention that they may not even have any particular incentive to examine a given patent application closely.
Tweet, tweet.
the run-on sentence.
No, every company doesn't have to have a patent portfolio, many don't and thus Google doesn't have the smallest patent portfolio.
It works on a large scale with todays available processing setups, but its far from 'efficient' in any sense of the term I consider.
Pyramids were built with (so the theory goes) millions of laborers because thats the only way they could handle such a large scale project. Map reduce is the same thing. On that scale, with todays technology, thats the way we do it.
It works, today, so we use they method, but thats where it ends.
Would you build the pyramids today with a million laborers? No, you'd bring in some heavy equipment and a tiny (relative to the original) team and they'd do it in a couple years or less for FAR FAR less money (even slaves cost money since they don't tend to live long if you never feed or water them.)
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I'll reserve judgement until this patent is involved, offensively, defensively or otherwise, in litigation.
Google has got a good reputation so I'm not as quick to condemn them as I am to condemn Microsoft which has a PROVEN track record of evil.
It's entirely plausible that this patent is part of a defensive patent portfolio whose sole purpose is to protect Google.
And considering the zany IP landscape, if anyone's going to have a patent on this, I'd rather it be Google than anyone else. If Microsoft had this club in their arsenal you can bet your bottom dollar they'd make their assault on Tom-Tom look like a puny peashooter.
How is this any different from a patent on distributed merge sort?
Since this is Slashdot, the usual knee-jerk reaction to any patent story is "duh, this is obvious". In most cases, people posting such replies haven't even read the claims, or if they did, do not understand how to interpret them properly.
I'm very skeptical that Google had indeed somehow managed to patent the fundamental principle of MapReduce, given that map and reduce (fold) have been basic FP building blocks for several decades, under these very names. I suspect, rather, that Google patented their particular implementation, complete with intelligent load balancing, hot-swapping, automated error checking and removal of faulty nodes, and whatever other fancy stuff they may have there - which is another matter entirely (even if implemented purely in software).
Since there are still some people here who are proficient in legalese (and specifically its dialect used in patent applications), any one of you care to explain what this actually is about to us simple folk?
If you remember other stories of silly software patents, please help document this problem here:
(On the public swpat.org documentation wiki)
Thanks.
Please help publicise swpat.org - the software patents wiki
If you hate the redirects (and I sure do.. copying URLs is the best), then push for HTML5. Specifically this feature: the ping attribute.
It takes what Google (and many, many another site) is doing and makes it possible to implement the ping separately from the target URL. Seems trivial; could make a huge difference.
Of course, the danger is that it gives extension authors an easy target. It's much easier to develop a privacy-enhancing extension that filters out all ping attributes, than it is to perform the same service on a single URL which conflates the ping with the target.
We'll see; I hold out high hopes for it.
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
Michael Stonebraker co-founded of Vertica, a column-store database system. In the SIGMOD '09 paper that "slammed" MapReduce, he and the other academics use Vertica alongside another unidentified commercial database system to show the weaknesses of the MapReduce model (using Hadoop, the most popular publicly available implementation).
I mean no offense to Stonebraker and this fact alone certainly does not imply anything, but it should still be noted. It appears to be a clear ethics/conflict-of-interest violation to me and it is unfortunate that (to my knowledge) no one in the database community has spoken up. Posted anonymously for obvious reasons.
1. Buy out enough congresscritters.
2. Get them to write into some new law whatever wrinkle you're afraid the Supremes might vacate.
3. ??? -- Actually, wash, rinse, repeat, as often as necessary until either a) the law you want withstands a Supreme Court challenge, or b) your legal opponents run out of money to fight it in the courts, whichever comes soonest.
4. Profit.
Cynically yours,
Anybody care to explain what it means to have both an application-independent reduce module and an application-specific reduce operation? It would seem that these would generally be mutually exclusive.
If, as DeWitt & Stonebreaker claim, MapReduce is a "major step backwards", we ought to be able to skip right past this patent and use whatever the state of the art is... Right?
Gigaom: Michelle Lee, Google Deputy General Counsel, on why Google sought the patent, and whether or not Google would seek to enforce its patent rights: "Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops. While we do not comment about the use of this or any part of our portfolio, we feel that our behavior to date has been inline with our corporate values and priorities."
I would expand on the title but I'll let the absurdity speak for itself.
I call this "fork and merge" and we have been doing it since forever (1993 at least, but we surely didn't invent it).
The general technique is to have multiple processors work on part of the data set, in a potentially wasteful/redundant way, and then when the results are coming in, perform a merge step to arrive at a clean result.
Multi-threading must die. Forking is your past, present and future.
Processing chunks is also more effective, because you give other processes a chance to do some work too.
(" and in one sentence-alert :)
I didn't get the joke, either. Pretty please, don't use Offtopic and Troll as substitutes for Dumb/Disagree.
This post contains no rudeness or derision of any kind. All arguments are friendly. Terms and exclusions may apply.
It claims the database technology is better in enforcing data consistency . They make this assumption keeping applications like banking, payroll etc in mind. Not the Web applications where speed matters a lot. This is where MapReduce score high. Considering speed is utmost factor, would you care design application with all referential integrity constraints or figure ways where u avoid it all together. Database schema and all looks good but doesnot provide speed or rather eats up processing power. Mapreduce allows you to process huge amount of data in parallel. While academics debate about merits of Mapreduce, Google builds new systems quickly, processes data at lightning speed. Mapreduce is the very reason for the success of google infrastructure. It makes easy to processes data, write pipelines to mine data etc.