After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as:
- 50 total paying Greenplum customers, over half of whom are already in production.
- 6 Greenplum users in production with >100 terabytes of user data. That may beat anybody except Teradata, among SQL data warehouse specialist vendors.
- 2 Greenplum customers expected to be in production within 60 days with >1 petabyte of user data. That may beat even Teradata. Anyhow, it looks as if Greenplum and Teradata will be 1-2 in some order crossing the 1-petabyte line. (Edit: Here’s more detail on >1 petabyte Greenplum users.)
- 5 Greenplum customers with “multiple 100s of users.” That’s not much by the standards of more mature vendors, but it suffices to show that Greenplum has some kind of a handle on concurrency.
- 3 Greenplum customers with 1000s of tables. That suffices to show that Greenplum’s claims to schema agnosticity are more than academic, even if it’s not enough to show that many enterprises care.
- Greenplum customers using tools from the following list, and I quote: SAS, Unica, Datastage, Information Builders, Informatica, Oracle BI, Microstrategy, Microsoft SSIS and SSRS, Business Objects / BODI, SAP, Talend, Pentaho
- (Again I quote) “Tier 1” customers in the following verticals:
- Retail Banking
- Health Care
- Commercial Banking
- Service Providers
Even though the bulk of Greenplum’s revenue comes from the Sun appliance relationship, 20 paying customers run Greenplum on Linux. Another interesting demographic is that 25-40% of Greenplum’s revenue tends to come from Asia (obviously, the figure fluctuates greatly from quarter to quarter). Perhaps not coincidentally, one of Greenplum’s three salespeople last year was based in Asia. (The current total is 15, and growing fast.)
Technical highlights include:
- Greenplum is row-based, shared-nothing, MPP. It runs on standard hardware and operating systems. (But fortunately for its key partnership, Greenplum evidently does really run best, at least for now, on recommended Sun standard appliance configurations.)
- Most or all of the PostgreSQL data access methods are left intact. The big changes to PostgreSQL lie in the areas of query optimization, planning, and execution. I.e., Greenplum has its own way of breaking up a query into pieces – and of course of seeing that data gets shipped among nodes – but the low-level operators for storage and access are from PostgreSQL.
- Greenplum nodes are just connected to a group of standard switches, via standard 1 gigabit Ethernet. Greenplum insists that interconnect bandwidth is not a problem.
- Currently, there’s a boss node, with all the other nodes being peers. But by now (as opposed to in an early prototype of Greenplum), intermediate results are shipped peer-to-peer rather than back up to the boss node. In the future, compute and storage nodes will be (optionally) split out from each other.
- Compression is being introduced in the next point release, with big numbers (at least by row-based standards) out of the gate. It will initially be just for append-only tables, but that limitation will be lifted later on.
- Also in that release, Greenplum is introducing embedded parallel mathematical packages, such as linear algebra and statistics (specifically, R).
- Greenplum has no current in-the-cloud offering, but one is in the works.
- Greenplum offers an ever-growing variety of administration tools.