Few areas of technology boast more architectural diversity than data warehousing. Mainframe DB2 is different from Teradata, which is different from the leading full-spectrum RDBMS, which are different from disk-based appliances, which are different from memory-centric solutions, which are different from disk-based MOLAP systems, and so on. What’s more, no two members of the same group are architected the same way; even the market-leading general purpose DBMS have important differences in their data warehousing features.
The hot new vendor on the block is DATallegro, which is stealing much of the limelight formerly enjoyed by data warehouse appliance pioneer Netezza. (After some good early discussions, Netezza abruptly reneged on a promise a year ago to explain more about its technology workings to me, and I’ve hardly heard from them since. Yes, they’re still much bigger than DATallegro, but I suspect they’ve hit some technical roadblocks, and their star is fading.)
Like Netezza, DATallegro’s basic strategy is to stream data on and off disk very quickly, sequentially rather than randomly, resolving queries with full table scans. Unlike Netezza, which actually has chip-level optimizations, DATallegro’s most direct I/O acceleration lies in protocol software for talking to Infiniband storage systems.
The “on” part of “on and off disk” is particularly important in DATallegro’s case. V2, released this week, has optimizations for moving data around quickly. This is useful for rebalancing range partitions, to maximize performance. (In general, range partitioning is a big part of DATallegro’s story.) CEO Stuart Frost also claims a much more dramatic benefit when a query creates large intermediate tables, such as the >1 terabyte tables a Microstrategy query can create along the way to a final result. Supposedly, a query that DATallegro can do in a few minutes takes all night on competitive systems.
Other new V2 goodness includes a very simple way to prioritize short queries over long ones, and hence give them consistent response time (which is good for “operational BI” and the like). Namely, if a query thread has been around for along time, it automatically gets lowered in priority. This is a simple matter of operating systems and clocks; the query optimizer is in no way involved.
Ultimately, DATallegro’s technical success story seems a bit paradoxical. It relies on the richer data warehousing functionality found in Ingres vs., say, the PostgreSQL that Netezza relies on. But absent DATallegro, one wouldn’t tend to think of Ingres as a particularly strong data warehousing system at all. Go figure.