June 10, 2011

Patent nonsense: Parallel Iron/HDFS edition

Alan Scott commented with concern about Parallel Iron’s patent lawsuit attacking HDFS (Hadoop Distributed File System), filed in — where else? — Eastern Texas. The patent in question — US 7,415,565 — seems to in essence cover any shared-nothing block storage that exploits a “configurable switch fabric”; indeed, it’s more oriented to OLTP (OnLine Transaction Processing) than to analytics. For example, the Background section starts:

The present invention relates to data storage, and more particularly, to methods and systems for a high throughput storage device.

A form of on-line transaction processing (OLTP) applications requiring a high number of data block reads or writes are called H-OLTP applications. A large server or mainframe or several servers typically host an H-OLTP application. Typically, these applications involve the use of a real time operating system, a relational database, optical fiber based networking, distributed communications facilities to a user community, and the application itself. Storage solutions for these applications use a combination of mechanical disk drives and cached memory under stored program control. The techniques for the storage management of H-OLTP applications can use redundant file storage algorithms on multiple disk drives, memory cache replications, data coherency algorithms, and/or load balancing.

and ends

It would be desirable for large capacity storage to provide sufficient throughput for high-volume, real-time applications, especially, for example in emerging applications in financial, defense, research, customer management, and homeland security areas.

The independent claims are:

1. A storage system comprising: one or more memory sections, including one or more memory devices including storage locations that store data, and a memory section controller that provides addresses to the memory devices, the addresses identifying storage locations for a memory device, wherein the memory devices use the provided addresses to perform a function selected from the set of reading out and writing data to/from the memory devices; and one or more switches, comprising a configurable switch fabric, that receive a data request including a data block identifier and switch the data request to one or more of the memory sections determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, the data block identifier identifying a set of storage locations; wherein the memory sections to which the data request was switched forward the received data block identifier to its memory section controller which maps the data block identifier to a set of addresses for the storage locations identified by the data block identifier, and provides the set of addresses to one or more of the memory section’s memory devices.

16. A method for use in a storage system, comprising: storing data in storage locations in a memory device; receiving by a switch comprising a configurable switch fabric, a data request including a data block identifier; the switch switching the data request to a memory section including the memory device determined by applying the data block identifier to an algorithm that selectively configures operation of the switch, the data block identifier identifying a set of storage locations in the memory device; forwarding the received data block identifier to a memory section controller; the memory section controller mapping the data block identifier to a set of addresses for the storage locations identified by the data block identifier; and the memory section controller providing the set of addresses to the memory device; and the memory device using the provided addresses to perform a function selected from the set of reading and writing data to/from the memory device.

26. A storage system, comprising: means for storing, including: means for storing data in storage locations, the means for storing data in storage locations including means for reading data stored in the storage locations using an address; means for controlling the means for storing, the means for controlling including: means for mapping a data block identifier to a set of addresses, means for providing the addresses to the means for storing data in storage locations, the addresses identifying storage locations; means for switching, including means for receiving a data request including a data block identifier; means for switching the data request based on the data block identifier to a means for storing determined by applying the data block identifier to an algorithm that selectively configures operation of the means for switching, the data block identifier identifying a set of storage locations in the means for storing data in storage locations; and means for forwarding the received data block identifier to the means for storing.

27. A storage hub comprising a memory section, including a memory device including storage locations that store data, and a memory section controller that provides an address to the memory device, the address identifying a storage location, wherein the memory device uses the provided address to write data into the memory device; and a switch, comprising a configurable switch fabric, that receives a data request including a data block identifier and transmits the data request to the memory section determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, and that receives write data associated with the data request and transmits the write data to the determined memory section; wherein the memory section forwards the received data block identifier to the memory section controller, which determines from the data block identifier the address of a storage location and provides the address to the memory device, and the memory device stores the write data at the address.

My one thought that could have led to the patent making sense was that maybe the term “configurable switch fabric” was defined in some particularly limited way. But noooo. Indeed, the term is not defined in the patent’s body at all; rather, the patent says (somewhat ungrammatically):

The switches 22 may be any type of switch using any type of switch fabric, such as, for example, a time division multiplexed fabric or a space division multiplexed fabric. As used herein, the term “switch fabric” the physical interconnection architecture that directs data from an incoming interface to an outgoing interface. For example, the switches 22 may be a Fibre Channel switch, an ATM switch, a switched fast Ethernet switch, a switched FDDI switch, or any other type of switch. The switches 22 may also include a controller (not shown) for controlling the switch.

I would be shocked if this patent held up upon reexamination. (If it did, EMC would pretty much be out of business, or at least vulnerable to a considerable cashectomy.) This is a particularly strong example of my belief that performance-enhancement software patents are always bogus. What’s more, it seems strange to worry about this patent’s effect on HDFS in any case, because if you’re that much of a patent wimp, you probably don’t want to run afoul of Google’s (also bogus) MapReduce patent in the first place.

On the whole, I’m somewhat more sympathetic to the idea of replacing HDFS underneath Hadoop than my clients at Cloudera or IBM would wish me to be. But the Parallel Iron patent is not a serious reason in support of such a change.

Comments

8 Responses to “Patent nonsense: Parallel Iron/HDFS edition”

  1. unholyguy on June 10th, 2011 11:49 am

    just a quick point, I think google officially exempted hadoop from their map/reduce patent

  2. Alan Scott on June 13th, 2011 5:40 pm

    It wasn’t my intent to get folks spun up to determine the validity of the patent. It was to make folks aware, so they could protect their businesses. Until this thing is played out, win or lose, this will have implications on organizations wishing to use HDFS, and the suppliers of HDFS. Indemnification will be required to reduce risk to HDFS customers. Since there is already a legal challenge in process, the size of the indemnification will need to be substantial enough to weather a protracted legal challenge and possible settlement. That’s where the financials (cash reserves) of the HDFS provider come into play. If you want to go with HDFS rather than an alternative, go with a supplier that has BIG indemnification pockets.

  3. Curt Monash on June 13th, 2011 6:49 pm

    Alan,

    Yours is one reasonable opinion.

    Another reasonable opinion is that one shouldn’t hold one’s technological competitiveness hostage to whichever frivolous lawsuits have or haven’t been filed at a given point in time.

    You seem to be arguing that big companies can indemnify better than small companies. But don’t big companies also have more customers to indemnify (that’s how they’re big)? Unless your argument is that all the technology one should buy is UNIMPORTANT products at BIG companies (so that they have deep pockets but have their liability limited by only having sold a few copies), I guess I’m not seeing the argument.

    Of course, I know that large enterprises have CYA policies about all sorts of things. I’m just struggling to see why in this case they make any sense.

  4. Alan Scott on June 14th, 2011 5:16 pm

    Its an opinion because the world revolves around risk and the impact of risk. Risk is not limited to just “financial implications” but also to focus. There are companies that have the “means” to make distractions like this go away and there are companies that will be saddled with the distraction for some time.

  5. Curt Monash on June 14th, 2011 8:08 pm

    Alan,

    As I’ve posted elsewhere, I’ve been an analyst of the enterprise software industry for 30 years, and I only recall a single example when a patent issue significantly affected customers (Marcam/Ross Systems, in the 1980s). MANY vendors are faced with attempted patent extortion. Some defend themselves; some buy off their attackers. In neither case are their customers harmed.

    Your approach, while understandable, is a little like only shopping at Mafia-approved grocery stores, in a country where the Mafia has almost no power.

  6. Dave on June 15th, 2011 3:23 am

    Does anyone else get the impression “Alan Scott” might in fact be part the “family”

  7. Curt Monash on June 15th, 2011 3:41 am

    While such suspicion is always understandable when somebody takes a Greenplum-friendly view, Alan is indeed a real employee of a real end-user enterprise.

  8. Derek P. Moore on September 13th, 2011 2:28 pm

    f your i — Hulu, Amazon.com Inc., Amazon Web Services LLC, Twitter, EMC, & others were named as defendants in the most recent round of legal action regarding this patent & mere usage of Hadoop…

    Should people look at using Accumulo instead of Hbase? Parallel Iron’s independent claim is insanely broad and seems it would cover many distributed file systems that use more than one level of indirection.

    How should the small fries listing along-side EMC, Amazon, & Twitter defend themselves?

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.