USPTO Grants Google a Patent On MapReduce
theodp writes "Two years ago, David DeWitt and Michael Stonebraker deemed MapReduce a major step backwards (here are the original paper and a defense of it) that 'represents a specific implementation of well known techniques developed nearly 25 years ago.' A year later, the pair teamed up with other academics and eBay to slam MapReduce again. But the very public complaints didn't stop Google from demanding a patent for MapReduce; nor did it stop the USPTO from granting Google's request (after four rejections). On Tuesday, the USPTO issued U.S. Patent No. 7,650,331 to Google for inventing Efficient Large-Scale Data Processing."
Just the other day I couldn't sign up for a gmail account without google demanding my mobile telephone number!
They already burned their karma adding the "fade-in" menu bar.
This sounds more stupid than evil, which is interesting, because Google doesn't do obviously stupid things very often.
The patent won't do them any good, because it won't stand up in court. They could use it to attack someone small -- an open source developer who would have to back down because they couldn't handle teh legal fees -- but they don't have much of a history of that sort of thing, and there's no reason to think they would in this case, either.
It won't do them any good at all against someone big -- MS and Bing, for example -- because MS would have good lawyers who could demonstrate prior art to a court.
So what's the point?
A somewhat optimistic guess is that they'll be restricted to using this defensively. Are they really going to sue Hadoop, the open-source implementation of MapReduce? Hadoop not only implements a version of MapReduce, it even uses its name, so is not at all coy about being a direct infringement of this patent. And yet, I would be surprised if Google sued them, or the many people using it. They certainly haven't said anything yet, as far as I can find--- when things like Amazon Elastic MapReduce were launched, I can't find record of Google saying, "hey, you're stealing our tech!"
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Google has at least 173 issued patents as well as over two hundred pending applications. That doesn't include the various patents (such as the PageRank patent) that it is the exclusive licensee for but does not actually own (Stanford owns it). Google's software patent strategy dates back to at least 1997, when it filed this application, which actually predates the PageRank application.
Isn't that awful? I can't understand why they did it.
Moving stuff on web pages sucks. Especially on that web page.
The bad thing isn't the fade in itself. It's that Google used to be run by people who knew what sucked and what didn't. Now it seems like there are people who don't know in positions to call some shots. It's a bad omen.
They're probably about 10 years away from their own version of Microsoft's "Bob".
Does this endanger the Hadoop project, or projects using Hadoop? Its MapReduce implementation is a rather crucial part.
Before you go acusing Google of doing Evil (TM), think. If they don't do this, some troll will. The troll will lose, but Google will waste a lot more money defending against it.
This is why IBM takes out so many patents too. Most of them are "defensive" patents.
We (that being everybody except the USPTO) could agree not to take out any more software patents, and the industry would breathe a collective sigh of relief. Trouble is, it only takes a few bad apples to spoil that approach. It's the same reason Communism didn't work.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
We're probably never going to get rid of software patents, odious as they are; at this point there are too many enormous players, of which Google is not at all the worst offender, with way too much invested in them. But it occurs to me that one change to patent law that might be politically feasible, and which would really help cut down on clearly frivolous patents like this one:
If any claim in the patent is held to be invalid, the entire patent is invalid.
Claim 1 of the patent is simply an arcane, legalistic description of the operation of pretty much every parallel processing algorithm ever. Some of the subsequent claims actually do describe novel, non-obvious, and useful ways of handling large data sets across multiple processors. If the patent were restricted to these claims, well, it would still be a software patent and therefore Evil, but it might at least have some claim to promoting "the progress of science and the useful arts."
In general, it seems like this would make both patent trolling, and big companies like Google lawyering small independent developers to death, a little more difficult.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
I wrote a parallel application to process scientific data on multiple servers at a previous place I worked, using just SQL statements with a mod function on a primary key. The resume builders there then hired a consultant to help them rewrite the whole thing (excluding the core atomic algorithm part) using Hadoop and MapReduce, because the previous one didn't use Hadoop and MapReduce. They made a total mess and it's so hard to configure and deploy that IT still uses the version I wrote a year before.
The greybeards have a point there. In my branch of signal processing where have gone through cycles several times as computer hardware evolves. In my experience we've been through minicomputers, array processors, workstations, clusters, stream processors, multi-cores etc. Each configuration as different balance of CPU speed, memory size, memory bandwidth, and so on. So we've gone through the difference algorithms, the integral algorithms, the spectral, the local-transform, cyclic matrices, etc. back and forth several times. Sometimes each new generation of grad students feels it has invented something new if sloppy work by their faculty advisor doesnt correct them.
- did the submitter actually read the claims, before asserting that it was obvious and/or anticipated?
Here's claim 1 (it's a monster): 1. A system for large-scale processing of data, comprising:
a plurality of processes executing on a plurality of interconnected processors;
the plurality of processes including a master process, for coordinating a data processing job for processing a set of input data, and worker processes;
the master process, in response to a request to perform the data processing job, assigning input data blocks of the set of input data to respective ones of the worker processes;
each of a first plurality of the worker processes including an application-independent map module for retrieving a respective input data block assigned to the worker process by the master process and applying an application-specific map operation to the respective input data block to produce intermediate data values, wherein at least a subset of the intermediate data values each comprises a key/value pair, and wherein at least two of the first plurality of the worker processes operate simultaneously so as to perform the application-specific map operation in parallel on distinct, respective input data blocks; a partition operator for processing the produced intermediate data values to produce a plurality of intermediate data sets, wherein each respective intermediate data set includes all key/value pairs for a distinct set of respective keys, and wherein at least one of the respective intermediate data sets includes respective ones of the key/value pairs produced by a plurality of the first plurality of the worker processes; and
each of a second plurality of the worker processes including an application-independent reduce module for retrieving data, the retrieved data comprising at least a subset of the key/value pairs from a respective intermediate data set of the plurality of intermediate data sets and applying an application-specific reduce operation to the retrieved data to produce final output data corresponding to the distinct set of respective keys in the respective intermediate data set of the plurality of intermediate data sets, and wherein at least two of the second plurality of the worker processes operate simultaneously so as to perform the application-specific reduce operation in parallel on multiple respective subsets of the produced intermediate data values.
That's one heck of a detailed claim. Infringement would require some effort; anticipation (every limitation appearing in a single document, arranged in the same manner as the claim) is unlikely.
All documents at http://patft.uspto.gov/ are issued patents.
I didn't know what MapReduce was so I looked it up:
crazy dynamite monkey
The point is probably to create and keep a nice big portfolio of patents to be used the next time Google gets sued for patent infringement. It's common practice for big tech firms (and others, of course) to hold a reserve of patents at the ready in the event that they need to defend against a patent suit. The aggressor company sues for infringement, the defender digs up a few patents that the aggressor is violating, and they settle out of court for a mutual licensing agreement.
Of course it's ridiculous, and sounds stupid, but it's a symptom of the broken patent system, not a peculiarity of Google.
Rule of Slashdot #0: You and people like you are not representative of the larger population. - A.C.
A patent is only worth it's strength in court. The USPTO has clearly given up trying to judge if a patent is truly worthy on their own, relying on the courts to decide afterwards when a patent is put to use and put to the test - in court.
What bothers me the most is the fact that anyone can get a patent for anything as long as they keep revising their application.
At the end of the day, those with the biggest wallets will get their patents, and they will also have their guns to fight and win in court.
how do you get a patent awarded on something that has already been released as "open source" (Hadoop)
This does not add up, either Hadoop is not really open source, or US patent office are as FCKING stupid as EVERYONE seems to think they are.
Come on people, don't you get tired of the shame of working for such an organization....don't you want to see freedom and democracy restored to the world..?>?>
The fade-in is nice. Not so much because it's a fade-in (which is just visually more pleasant than an instant-display), but because you can visit www.google.com and get a very clean page (google logo, search field, and currently a Haiti relief notice), and just type away (as focus is set to the search field) and be done with it. This is very much like how google.com -was- in the very early days.
If you want to access any of the other services that google have started to offer since then, you can move your mouse anywhere within the screen and hey presto those options become available to you. If you don't need them - why clutter up the screen with them?
You can always customize your own google page and set that as your bookmark/start page/whatever and display exact what you want to have displayed from the get-go.
If anything, the change from direct URLs to google redirects at some point is what I find most annoying. I guess it's what enables them to track clicks better / present "We believe this page is dangerous for your health"-warnings, etc. and I can see how that can be good for them as a business, and for users who go clickhappy on fluffy little bunnies promising them cash. But it annoys me that I can't just 1. google for something, 2. recognize the right place, 3. right-click the result and get the basic URL out of it anymore. Now, I just get this (for slashdot):
http://www.google.com/url?sa=t&source=web&ct=res&cd=1&ved=0CBgQFkAA&url=http%3A%2F%2Fslashdot.org%2F&rct=j&q=slashdot&ei=KAtXS8CCLeLMQAeSx8CbDg&usg=AFQjClHLEL_tF-6ZxylM44KJH54-gaJRnQ&s1g2=U223qDAEXHFbHyOw_p2PzQ
wtf.
I'd much prefer they put the actual URL in the link, and let their redirect flow through an onClick.. yeah, they'd lose the javascript-disabled lot.. tough.
The backdoor to that system as we've seen is to sell of a patent to a investment firm which stands up a patent troll company (or buys a small company in the field and turns it into a patent troll) and have them abuse it, the MAD strategy then no longer works as the opponent only exists to spend their cash reserves on the lawsuit and to turn over any profits to the investors.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
I agree about the Google redirects. I know they have been there for awhile now, but I first actually "noticed" them (as in they caused me a problem) just the other day when I was trying to get some links to "further reading" to go into some technical document I was writing. I sure didn't want Google redirect links in my document so I actually ended up going to Bing and doing the same search. That worked better as Bing apparently doesn't do the redirect thing and the links are actually links to the site you searched for. Bing doesn't do anything else better, but Google made their links useless for that function.
Why do you think the recent Google-China issue is either all about Google having a conscience or all about Google acting in their own self interest? It's both, and it's complicated.
For one thing, having a conscience is in Google's best self interest. Public image is crucial for a company like that.
For another, companies Google's size (or any size, if they are competent) don't make decisions based on 1 factor. They take into account many, many factors, including conflicting ones, and they arrive at a decision. In this case, clearly both the conscience issue was a factor as well as the self interest factor.
I've noticed that when you do a google search and mouseover the links, it shows the direct link in the status bar, but that is a lie. If you look at the actual URL in the link properties, you'll see that it redirects through google. Sneaky.
Disclaimer: IANAL. This post is, however, legal advice, and creates an attorney-client relationship.
I did look at the url properties. It was the plain url. A search for "houston chronicle" returns this Houston Chronicle right clicking and copying the link location copies "http://www.chron.com"
I'll reserve judgement until this patent is involved, offensively, defensively or otherwise, in litigation.
Google has got a good reputation so I'm not as quick to condemn them as I am to condemn Microsoft which has a PROVEN track record of evil.
It's entirely plausible that this patent is part of a defensive patent portfolio whose sole purpose is to protect Google.
And considering the zany IP landscape, if anyone's going to have a patent on this, I'd rather it be Google than anyone else. If Microsoft had this club in their arsenal you can bet your bottom dollar they'd make their assault on Tom-Tom look like a puny peashooter.
If you hate the redirects (and I sure do.. copying URLs is the best), then push for HTML5. Specifically this feature: the ping attribute.
It takes what Google (and many, many another site) is doing and makes it possible to implement the ping separately from the target URL. Seems trivial; could make a huge difference.
Of course, the danger is that it gives extension authors an easy target. It's much easier to develop a privacy-enhancing extension that filters out all ping attributes, than it is to perform the same service on a single URL which conflates the ping with the target.
We'll see; I hold out high hopes for it.
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
So why not follow the money and retaliate against the investors? An attack is an attack regardless of whether it is done by proxy. That is in line with MAD thinking too where an attack by or on an ally is escalated against the parent aggressor.
I did look at the url properties. It was the plain url
Yes, the a href=... bit is a plaintext url. But what do you think the onmousedown="return clk( ... bit does?
Answer: it calls a "window.clk" function, which sends a message to Google to tell them that you clicked on such-and-such a link.
It's not a redirect; it's sneakier. Bing and Ask do exactly the same.