I talked with Greenplum honchos Bill Cook and Scott Yara yesterday. Bill is the new CEO, formerly head of Sun’s field operations. Scott is president, and in effect the marketing-guy co-founder. I still don’t know whether I really believe their technical story. But I do think I have a feel for what they’re trying to do. Key aspects of the Greenplum strategy include:
- Greenplum rewrote a lot of PostgreSQL to parallelize it, in the correct belief that MPP is the best way to go for high-end data warehousing.
- Indeed, Greenplum claims to have a general solution to DBMS parallelization. Unlike Netezza, DATallegro, Vertica, and Kognitio, Greenplum offers a row-oriented data store with a fairly full set of indexing techniques. You want star indices or bitmaps? They have them. (They even claimed to be used for some text management when last we talked, although that was for O’Reilly and Mark Logic seems to be O’Reilly’s main text-indexing vendor.)
- Greenplum’s main sales strategy is to be part of Sun’s product line, bundled into Thumper boxes as single-part-number Sun offerings. They certainly could add other hardware OEMs, just like Checkpoint sells firewalls through multiple appliance vendors. But at least for now it’s all about Sun.
Like every other Sun sales chief, Bill wasn’t very successful at selling software through the Sun sales force when he worked there. So I asked why he thought he’d succeed now. But he actually offered a good answer to that challenge. Bill claims that the Thumper/Greenplum pairing isn’t any different from Sun selling hardware by touting the virtues of Solaris or NFS. That’s a fair distinction; Sun’s multiple generations of (for example) application server or application development tool screw-ups may not be particularly relevant here.
What I discerned about the product architecture was roughly this. As in the other row-oriented MPP offerings, data is distributed via a hash partition key. The optimizer and execution engine are tied together, whatever that means – perhaps just that the optimizer is parallelization-aware? — for a high degree of parallelization goodness. There are “motion nodes” whose job is to ship tuples around.
In support of the claim to have done a kick-ass job of DBMS parallelization, Scott rattled off a bunch of developer names, each of whom supposedly built a key DBMS subsystem at Tandem, Teradata, Informix, Red Brick, or Microsoft. Impressive though that was, the claim was a little hard to evaluate in real-time, as I’d never actually heard of any of the guys.
On the hardware side, each Thumper box has 24 terabytes of raw storage. You run 4 active instances and 4 mirrored segments. One active master segment contains the parallel cost optimizer; there’s also a warm standby staying in sync via log shipping. Gigabit Ethernet interconnects apparently suffice.
As for market activity: Greenplum claims close to 20 customers. Most are software-only, in the 1-10 terabyte range. But most of the sales pipeline is through Sun, in the 10-300 terabyte range. List pricing on the software-only solution is subscription-based, at $25,000/terabyte/year; obviously, that’s not very attractive for large databases unless one negotiates a big quantity discount. The Sun boxes come in three sizes, with hardware/software bundled list prices at $440,000 for 20 terabytes of usable space, $800,000 for 40 terabytes, and $1.8 million for 100 terabytes. I didn’t drill down into exactly what kinds of apps these customers were concentrated in, but that would be an attractive area for follow-up.