February 11, 2010

More patent nonsense — Google MapReduce

Google recently received a patent for MapReduce. The first and most general claim is (formatting and emphasis mine):

A system for large-scale processing of data, comprising:

  • a plurality of processes executing on a plurality of interconnected processors;
  • the plurality of processes including a master process, for coordinating a data processing job for processing a set of input data, and worker processes;
  • the master process, in response to a request to perform the data processing job, assigning input data blocks of the set of input data to respective ones of the worker processes;
  • each of a first plurality of the worker processes including an application-independent map module for retrieving a respective input data block assigned to the worker process by the master process and applying an application-specific map operation to the respective input data block to produce intermediate data values, wherein at least a subset of the intermediate data values each comprises a key/value pair, and wherein at least two of the first plurality of the worker processes operate simultaneously so as to perform the application-specific map operation in parallel on distinct, respective input data blocks;
  • a partition operator for processing the produced intermediate data values to produce a plurality of intermediate data sets, wherein each respective intermediate data set includes all key/value pairs for a distinct set of respective keys, and wherein at least one of the respective intermediate data sets includes respective ones of the key/value pairs produced by a plurality of the first plurality of the worker processes;
  • and each of a second plurality of the worker processes including an application-independent reduce module for retrieving data, the retrieved data comprising at least a subset of the key/value pairs from a respective intermediate data set of the plurality of intermediate data sets and applying an application-specific reduce operation to the retrieved data to produce final output data corresponding to the distinct set of respective keys in the respective intermediate data set of the plurality of intermediate data sets, and wherein at least two of the second plurality of the worker processes operate simultaneously so as to perform the application-specific reduce operation in parallel on multiple respective subsets of the produced intermediate data values.

The way a patent works is that you make a big claim and, just in case it’s later invalidated, you also make more specialized sub-claims. What’s more, in a software patent, you claim everything twice, once as a “system” and once as a “method.”

When a patent takes that long to issue and has a core claim that wordy, one can assume there was much back and forth with the PTO (Patent and Trademark Office) to whittle it down to something they felt they could approve. At a guess, I’d conjecture that the supposedly unique parts of the claim are concentrated in the areas I bolded above, and that the PTO doesn’t think the claim would be patentable unless most or all of them were included.

So should the claim have been approved even so? Let’s consider prior art. Oracle has long been able to parallelize ala MapReduce. I don’t see anything in the claim that isn’t preceded by what Oracle did, except maybe the emphasis on key/value pairs. (And the same statement applies to the other 15 claims in the patent, at least on a quick skim.) I forget the details of SenSage’s quasi-MapReduce, which also preceded the Google patent filing, but I imagine something similar would be true about it.

There is no doubt that Google popularized the ideas of MapReduce — which turns out to have been a worthy public service. In one great example of that popularization, the seminal paper on parallel data mining is almost laughable in how it deviates from MapReduce key/value pair formalism — but it still seems to have been inspired by Google’s MapReduce. But that’s a different matter; popularization != invention, even though there’s a certain connection between the two in patent law. Actually, Google also often does get credit for having “invented” MapReduce, including regrettably in the marketing materials of clients I can’t talk out of saying that and which now might be looking into the barrel of the Google patent (hello Aster); but again, saying something doesn’t make it enforceable in court.

So what it all boils down to is:

Should Google’s patent on the idea of parallelizing the handling of sets of application-visible key/value pairs be regarded as valid?

The United States PTO, which is paid to think about these things, has evidently decided Yes. I disagree. In simplest terms, my reason is that key/value pairs have been around for decades, and so:

Anything which was known or obvious without special reference to key/value pairs doesn’t suddenly become non-obvious when key/value pairs are mixed in.

If Google ever tries to enforce its MapReduce patent, I’m available as an expert witness for the other side.

Related links

Comments

16 Responses to “More patent nonsense — Google MapReduce”

  1. Nigel Thomas on February 12th, 2010 7:10 am

    Sadly, common sense and the demonstrable presence of prior art does not seem to prevent the USPTO granting a patent, and judges later enforcing. See for example the Teilhard/Juxtacomm ETL patent case – admirably summarised by Vincent McBurney here: http://it.toolbox.com/blogs/infosphere/ibm-settles-and-microsoft-bails-and-teilhard-now-owns-data-integration-35050.

    And since that post, a second round of litigation has started against a range of second-string vendors from Axway to Vitria.

  2. Nathan Watson on February 16th, 2010 12:11 am

    I was involved in the SenSage patent’s tech (http://www.patentstorm.us/patents/7024414.html ), specifically mapping multi-stage abstract query (and load) plans to available physical infrastructure, spraying/mapping raw table data and intermediate node filtered/aggregated/transformed data out to other processing nodes via network, invoking reducing filters/transformers/aggregators, job management, etc.

    This patent, filed in 2001, granted in 2006, includes all that Google claims from what I can tell.

  3. Curt Monash on February 16th, 2010 5:09 am

    Hi Nathan,

    Is that the patent number you had in mind? It looks like a patent for columnar DBMS, not for MapReduce.

  4. Google News « DECISION STATS on February 19th, 2010 8:31 pm
  5. Parag Arora on February 22nd, 2010 9:01 am

    I don’t understand the consequences of this patent. Will that mean that people who are using map/reduce would need to shift or what?

  6. Curt Monash on February 22nd, 2010 4:54 pm

    In principle, Google could publish its own MapReduce distribution, put a price on it, and insist that Hadoop users pay Google that price even if they wanted to keep using Hadoop.

  7. Emil Koutanov on March 29th, 2010 7:14 pm

    Breaking news:
    Google has patented the internet. All you comment posters are infringing!

    But on a slightly more serious note, one argument that most people have left out is that this technology is not only prior art, but it was release by Google itself into the public domain (not the source code, but the concepts and the workings of it), automatically precluding it from being patentable.

  8. Emil Koutanov on March 29th, 2010 7:21 pm

    Following on, it seems that a workaround for this patent is a fairly trivial one. Google’s system and method scope a centralised “master” controller to co-ordinate the mapped jobs. It should be quite straightforward to use a non-centralised approach (where one job could be coordinated by one server, but another job by another server), thereby not infringing. And because I’ve just published it on this forum (and I’m confident others have proposed similar things before me), this very text now constitutes both prior art and public domain exposure, so no-one can patent it.

    You’re all welcome. Don’t mention it. :)

  9. Curt Monash on March 29th, 2010 10:02 pm

    @Emil,

    Under US patent law, you just have to get the filing in before publication. What’s published after you file — even or especially by you — doesn’t hurt patentability.

    Interesting thought that being even more parallel would be a patent workaround. I haven’t read closely enough to see whether I agree.

  10. Emil Koutanov on March 31st, 2010 12:10 am

    Actually Curt, you’re right. But after filing, don’t you need to somehow indicate that a patent is pending on the technology. I think a number of people in the industry may have falsely relied on the fact that MapReduce was free technology and have implemented their own equivalents – some of which (like Hardoop’s case) infringe on the patent. In this example, common sense didn’t prevail and the US patent office granted something that it should never had.

    I understand the need for patenting when it comes to large pharmaceutical giants spending billions of dollars on R&D, but I think software patents are a form of fascism and a way of holding the world to ransom for what is in a prevailing majority of cases doesn’t amount to anything inventive or innovative, and often is the only logical solution given a problem. Take patents on GUIs, Amazon’s one-click and now this garbage from Google.

    I think if a software house wants to protect its assets, it needs to start by protecting the source code from leaking into the public domain. Otherwise, one could simply copy 90% of the patent, modify the remaining 10% so as to not infringe and I’ve personally seen this happen. And then there are countries like China that don’t give a flying rats ass about protecting intellectual property.

  11. Chris on May 24th, 2010 1:19 am

    This is a defensive patent to use against someone like say DuckDuckGo or Bing if they conduct a raid.

  12. anonymous on December 20th, 2010 7:54 am

    That was very informative and well written. I look forward for further posts from you. Recently I came across an article titled “A functionality based approach for assessing patentability of software” which I felt, is quite interesting and informative. I would like to bring your kind attention to the above mentioned article. Below given is an excerpt of the article.

    “The whole idea of software is to avoid making specific hardware for every application. We came up with software to be able to dynamically create a new “machine” out of a standard hardware. A software allows the “new” machine to perform a “new” function based on the instructions as part of the software.

    Now, saying that a software invention to be claimed must have a specific machine limitation is like asking the inventor to come up with corresponding hardware embodiment for the software based invention. It really defeats the purpose of…” read more at http://www.sinapseblog.com/2010/12/functionality-based-approach-for.html

  13. A Practical Rant about Software Patents on March 7th, 2011 3:22 am

    […] it has obtained patents for some of its major innovations, such as MapReduce. Let’s put aside questions about the validity of the MapReduce patent — especially since patents enjoy the presumption of validity. The bigger question is to whom […]

  14. Three kinds of software innovation, and whether patents could possibly work for them | DBMS 2 : DataBase Management System Services on June 8th, 2011 11:37 pm

    […] negative comments about patents in the areas of MapReduce and columnar […]

  15. MapReduce Introduction | 采石工人的大教堂 on September 22nd, 2013 9:59 am

    […] Curt Monash. “More patent nonsense — Google MapReduce”. dbms2.com. Retrieved […]

  16. anonymous on April 3rd, 2014 2:05 pm

    Akamai was the actual inventor of the MapReduce paradigm, and had been using it since the late 90s.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.