Netezza relies on FPGAs. DATallegro essentially uses standard components, but those include Infiniband cards (and there’s a little FPGA action when they do encryption). Greenplum, however, claims to offer a highly competitive data warehouse solution that’s so software-only you can download it from their web site. That said, their main sales mode seems to also be through appliances, specifically ones branded and sold by Sun, combining Greenplum and open source software on a “Thumper” box. And the whole thing supposedly scales even higher than DATallegro and Netezza, because you can manage over a petabyte if you chain together a dozen of the 100 terabyte racks.
As often happens in introductory calls, I came away from my first Greenplum conversation with a much better sense of what it was they’re claiming than I managed to get of why it’s believable, how they’ve achieved what they appear to have, or where the “gotchas” are. Anyhow, here are some highlights of their story so far:
- They offer a proprietary, extended version of PostgreSQL, called Bizgres. As is common in open source-based projects, there are a lot of wrinkles as to what’s closed source, what they’ve created themselves and donated to the open source community, etc., etc.
- Bizgres comes in two flavors. Generic Bizgres is free to download and use to manage up to a few hundred gigabytes of data. It runs on a single processor. Bizgres MPP is free to download and develop with, but costs money to deploy.
- The company has had a somewhat checkered history. In its current setup it’s located in San Mateo, has 30 employees, has a partner with 8 engineers providing Tier 1 and Tier 2 support, and has closed 11 customers in the past 5 months.
- They added some basic data warehousing capabilities to PostgreSQL, such as range partitioning, and bitmaps that work for cardinalities up to 10,000 or so. Note that these probably are not used by Netezza, which built its system on an older version of PostgreSQL, although as usual I’m not sure about anything technical at Netezza, due to their lack of interest in having their technology analyzed. (DATallegro is built on Ingres.)
- Like DATallegro, they previously had an architecture in which queries that couldn’t be executed in one partition sent partial results to a “fat head” node that did the rest of the work – but subsequently have adopted a more sophisticated parallelization strategy. However, they talk of “query shipping,” while DATallegro stresses “repartitioning” of the database, so I suspect their approach is somewhat different, although one way or the other lots of data has to be shipped from node to node. But I’m not clear on the details of how this works in the Greenplum case.
- Like Netezza and unlike DATallegro, they think gigabit Ethernet is just dandy for the internode data transport. DATallegro, however, prefers Infiniband, as it creates almost no processor load (literally, no processor load, and no more than 1% effective processor slowdown), while gigabit Ethernet can slow processors by a factor of two in the worst case.
- The Sun appliance comes in 10, 40, and 100 terabyte sizes. (That’s actual warehouse size. Disk space is more than twice as much, of course.) One customer is seriously evaluating a 200 terabyte configuration. I think the scalability past that is largely theoretical at this point.
- List prices, if they recall correctly, are $370K, $700K, and $1.8 million for the 10, 40, and 100 terabyte versions. I didn’t get a sense of performance. (DATallegro, of course, offers very different data capacities for the same or similar prices, at different performance points.) Uh, I forgot to ask how much of those 100 terabytes are typically data, and how much is index, which I should have done because:
- Unlike Netezza and DATallegro, Greenplum thinks indexes are fine and dandy. As an example, they point out that using an index for a time series in no way interferes with sequential data access. In general, they think it’s important that they merely extend the underlying DBMS rather than build on top of it, as that makes it easier to use all the DBMS’ functionality. (I guess in principle DATallegro could use Ingres’s transactional capabilities, albeit only in low doses since performance would be very unoptimized. However, I have no idea whether they actually built the system that way.)