Alan Scott commented with concern about Parallel Iron’s patent lawsuit attacking HDFS (Hadoop Distributed File System), filed in — where else? — Eastern Texas. The patent in question — US 7,415,565 — seems to in essence cover any shared-nothing block storage that exploits a “configurable switch fabric”; indeed, it’s more oriented to OLTP (OnLine Transaction Processing) than to analytics. For example, the Background section starts:
The present invention relates to data storage, and more particularly, to methods and systems for a high throughput storage device.
A form of on-line transaction processing (OLTP) applications requiring a high number of data block reads or writes are called H-OLTP applications. A large server or mainframe or several servers typically host an H-OLTP application. Typically, these applications involve the use of a real time operating system, a relational database, optical fiber based networking, distributed communications facilities to a user community, and the application itself. Storage solutions for these applications use a combination of mechanical disk drives and cached memory under stored program control. The techniques for the storage management of H-OLTP applications can use redundant file storage algorithms on multiple disk drives, memory cache replications, data coherency algorithms, and/or load balancing.
It would be desirable for large capacity storage to provide sufficient throughput for high-volume, real-time applications, especially, for example in emerging applications in financial, defense, research, customer management, and homeland security areas.
The independent claims are:
1. A storage system comprising: one or more memory sections, including one or more memory devices including storage locations that store data, and a memory section controller that provides addresses to the memory devices, the addresses identifying storage locations for a memory device, wherein the memory devices use the provided addresses to perform a function selected from the set of reading out and writing data to/from the memory devices; and one or more switches, comprising a configurable switch fabric, that receive a data request including a data block identifier and switch the data request to one or more of the memory sections determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, the data block identifier identifying a set of storage locations; wherein the memory sections to which the data request was switched forward the received data block identifier to its memory section controller which maps the data block identifier to a set of addresses for the storage locations identified by the data block identifier, and provides the set of addresses to one or more of the memory section’s memory devices.
16. A method for use in a storage system, comprising: storing data in storage locations in a memory device; receiving by a switch comprising a configurable switch fabric, a data request including a data block identifier; the switch switching the data request to a memory section including the memory device determined by applying the data block identifier to an algorithm that selectively configures operation of the switch, the data block identifier identifying a set of storage locations in the memory device; forwarding the received data block identifier to a memory section controller; the memory section controller mapping the data block identifier to a set of addresses for the storage locations identified by the data block identifier; and the memory section controller providing the set of addresses to the memory device; and the memory device using the provided addresses to perform a function selected from the set of reading and writing data to/from the memory device.
26. A storage system, comprising: means for storing, including: means for storing data in storage locations, the means for storing data in storage locations including means for reading data stored in the storage locations using an address; means for controlling the means for storing, the means for controlling including: means for mapping a data block identifier to a set of addresses, means for providing the addresses to the means for storing data in storage locations, the addresses identifying storage locations; means for switching, including means for receiving a data request including a data block identifier; means for switching the data request based on the data block identifier to a means for storing determined by applying the data block identifier to an algorithm that selectively configures operation of the means for switching, the data block identifier identifying a set of storage locations in the means for storing data in storage locations; and means for forwarding the received data block identifier to the means for storing.
27. A storage hub comprising a memory section, including a memory device including storage locations that store data, and a memory section controller that provides an address to the memory device, the address identifying a storage location, wherein the memory device uses the provided address to write data into the memory device; and a switch, comprising a configurable switch fabric, that receives a data request including a data block identifier and transmits the data request to the memory section determined by applying the data block identifier to an algorithm that selectively configures operation of the switch fabric, and that receives write data associated with the data request and transmits the write data to the determined memory section; wherein the memory section forwards the received data block identifier to the memory section controller, which determines from the data block identifier the address of a storage location and provides the address to the memory device, and the memory device stores the write data at the address.
My one thought that could have led to the patent making sense was that maybe the term “configurable switch fabric” was defined in some particularly limited way. But noooo. Indeed, the term is not defined in the patent’s body at all; rather, the patent says (somewhat ungrammatically):
The switches 22 may be any type of switch using any type of switch fabric, such as, for example, a time division multiplexed fabric or a space division multiplexed fabric. As used herein, the term “switch fabric” the physical interconnection architecture that directs data from an incoming interface to an outgoing interface. For example, the switches 22 may be a Fibre Channel switch, an ATM switch, a switched fast Ethernet switch, a switched FDDI switch, or any other type of switch. The switches 22 may also include a controller (not shown) for controlling the switch.
I would be shocked if this patent held up upon reexamination. (If it did, EMC would pretty much be out of business, or at least vulnerable to a considerable cashectomy.) This is a particularly strong example of my belief that performance-enhancement software patents are always bogus. What’s more, it seems strange to worry about this patent’s effect on HDFS in any case, because if you’re that much of a patent wimp, you probably don’t want to run afoul of Google’s (also bogus) MapReduce patent in the first place.
On the whole, I’m somewhat more sympathetic to the idea of replacing HDFS underneath Hadoop than my clients at Cloudera or IBM would wish me to be. But the Parallel Iron patent is not a serious reason in support of such a change.