April 12, 2010

Greenplum Chorus and Greenplum 4.0

Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.

Greenplum 4.0 highlights and related observations include:

The really interesting part of this announcement, however, is Greenplum Chorus. Greenplum agrees with my assertion that Greenplum Chorus is a new kind of data integration/ETL technology. In particular, Greenplum Chorus is designed around a stance I agree with, namely it’s unrealistic to put everything into a single enterprise data warehouse (EDW); you need to manage data marts as well, preferably in a coordinated way. Mainstream data integration/ETL (Extract/Integration/Load) vendors such as Informatica or Ab Initio would surely say “That’s often quite true, and our technology can handle such scenarios just as it handles single-EDW-data-sink environments.” But Greenplum Chorus offers three capabilities not generally found in traditional data integration products (and offers only those three capabilities), namely:

Greenplum Chorus is heading into early access soon, with general availability slated around midyear. Also in the mix is a Greenplum “Hypervisor” that can somehow relate to an almost unlimited number of nodes or databases; however, I didn’t get a lot of details on the Greenplum Hypervisor technology or on the target dates for delivering and integrating the Hypervisor with other parts of Greenplum’s technology.

When Greenplum first talked about about the enterprise data cloud (EDC) idea, it emphasized the spinning out of physical data marts in an easy way, as opposed to the virtual data marts pushed by Oliver Ratzesberger and Teradata. Greenplum Chorus, however, supports both kinds (as, at least directionally, does Teradata), specifically letting you choose between:

Actually, if you want to recopy data in the same Greenplum database instance, you can do that too, via something called “data sets,” but that’s not the main focus. Either option, I presume, can be configured to provide either or both of the two main benefits of spun-out data marts, namely:

in either case without messing up the performance, SLAs, security, or “one truth-ness” of the existing database.

To provide those capabilities in an analytic DBMS, you need sufficiently robust parallel data movement (for the physical sandboxes) and workload management (for the virtual ones). Greenplum obviously believes it has both. Teradata makes the same claim. Other vendors would make similar assertions, and presumably will offer similar capabilities soon. You also want some kind of ability to ingest data from foreign databases, but that can be pretty routine stuff; e.g., in Release 1 of Chorus, Greenplum is content to offer ODBC access to Oracle, SQL Server, et al.

The “data discovery” and “social networking” aspects of Greenplum Chorus seem to be quite Release 1 as well. Basically, Greenplum lets people post discussion threads about databases and data marts, discussing what value can be derived from them. I guess somebody could include links to web-technology reports based on those databases, but otherwise there’s no integration with business intelligence tools and their collaboration capabilities. Even so, Greenplum reports that business executives liked this capability in early access testing.

Greenplum Chorus is ETL without a lot of T, and without a lot of performance optimizations either. That may not be much of a problem in its paradigmatic use case, spinning out a data mart quickly for some analysis to see if valuable conclusions can be drawn. Presumably, in the most successful cases, business and technical processes would emerge after the fact to pipe up-to-date versions of that analysis into operational systems, mooting any ETL deficiencies in the initial exploration moot. In a world where “data exploration” is becoming an increasingly important concept, something like Greenplum Chorus may suffice to provide significant customer value. But whether Greenplum Chorus’s capabilities are eventually co-opted by more fully-featured data integration suites remains an open question for the future.


