USPTO Grants Google a Patent On MapReduce
theodp writes "Two years ago, David DeWitt and Michael Stonebraker deemed MapReduce a major step backwards (here are the original paper and a defense of it) that 'represents a specific implementation of well known techniques developed nearly 25 years ago.' A year later, the pair teamed up with other academics and eBay to slam MapReduce again. But the very public complaints didn't stop Google from demanding a patent for MapReduce; nor did it stop the USPTO from granting Google's request (after four rejections). On Tuesday, the USPTO issued U.S. Patent No. 7,650,331 to Google for inventing Efficient Large-Scale Data Processing."
Google has at least 173 issued patents as well as over two hundred pending applications. That doesn't include the various patents (such as the PageRank patent) that it is the exclusive licensee for but does not actually own (Stanford owns it). Google's software patent strategy dates back to at least 1997, when it filed this application, which actually predates the PageRank application.
- did the submitter actually read the claims, before asserting that it was obvious and/or anticipated?
Here's claim 1 (it's a monster): 1. A system for large-scale processing of data, comprising:
a plurality of processes executing on a plurality of interconnected processors;
the plurality of processes including a master process, for coordinating a data processing job for processing a set of input data, and worker processes;
the master process, in response to a request to perform the data processing job, assigning input data blocks of the set of input data to respective ones of the worker processes;
each of a first plurality of the worker processes including an application-independent map module for retrieving a respective input data block assigned to the worker process by the master process and applying an application-specific map operation to the respective input data block to produce intermediate data values, wherein at least a subset of the intermediate data values each comprises a key/value pair, and wherein at least two of the first plurality of the worker processes operate simultaneously so as to perform the application-specific map operation in parallel on distinct, respective input data blocks; a partition operator for processing the produced intermediate data values to produce a plurality of intermediate data sets, wherein each respective intermediate data set includes all key/value pairs for a distinct set of respective keys, and wherein at least one of the respective intermediate data sets includes respective ones of the key/value pairs produced by a plurality of the first plurality of the worker processes; and
each of a second plurality of the worker processes including an application-independent reduce module for retrieving data, the retrieved data comprising at least a subset of the key/value pairs from a respective intermediate data set of the plurality of intermediate data sets and applying an application-specific reduce operation to the retrieved data to produce final output data corresponding to the distinct set of respective keys in the respective intermediate data set of the plurality of intermediate data sets, and wherein at least two of the second plurality of the worker processes operate simultaneously so as to perform the application-specific reduce operation in parallel on multiple respective subsets of the produced intermediate data values.
That's one heck of a detailed claim. Infringement would require some effort; anticipation (every limitation appearing in a single document, arranged in the same manner as the claim) is unlikely.
All documents at http://patft.uspto.gov/ are issued patents.
I've noticed that when you do a google search and mouseover the links, it shows the direct link in the status bar, but that is a lie. If you look at the actual URL in the link properties, you'll see that it redirects through google. Sneaky.
Disclaimer: IANAL. This post is, however, legal advice, and creates an attorney-client relationship.