April 12, 2010

Greenplum Chorus and Greenplum 4.0

Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.

Greenplum 4.0 highlights and related observations include:

For the most part, Greenplum 4.0 is focused on general robustness catch-up and Bottleneck Whack-A-Mole, much like the latest releases from fellow analytic DBMS vendors Vertica and Aster Data.
Greenplum has switched its replication approach from logical (execute transactions against two different disks) to block-level (just ship over the blocks that were changed by the original transaction). This seems to increase a Greenplum database’s robustness/performance/uptime in the face of disk/node failure. It also provides Greenplum with an ongoing performance advantage in that data only has to be compressed once in total for both disk writes.
The Greenplum DBMS now has something called “tablespaces,” which sounds as if it extends Greenplum’s “polymorphic storage” to accommodate different kinds of storage device. Everybody has to do and for the most part is doing this, e.g. Teradata and Sybase. At least for now, you need to have the same mix of storage technology at every Greenplum node. That said, while Greenplum’s customers will surely want solid-state storage in the future, that’s not quite yet a major current issue.
The timetable on Greenplum 4.0 is a salami-thin-slicer’s delight:
- Greenplum 4.0 has been used in POCs (Proofs of Concept) for a while.
- Greenplum 4.0 has been in early access for a few weeks.
- Greenplum 4.0 controlled availability is planned for the end of April.
- Greenplum 4.0 general availability is planned around the end of May or early June.
- (Note: Everything in Greenplum 4.0 has been built, and is undergoing QA).
Greenplum has put together a nice list of big-name customers, including Fox/MySpace, eBay, Sears, and T-Mobile. While Fox/MySpace never got to the predicted 1-petabyte level of user data, T-Mobile is loosely projected to indeed get there. The same 1-petabyte projection is made more confidently about another Greenplum telecom customer (unnamed), which seems to be in the process of acquiring a 300-node Greenplum system.

The really interesting part of this announcement, however, is Greenplum Chorus. Greenplum agrees with my assertion that Greenplum Chorus is a new kind of data integration/ETL technology. In particular, Greenplum Chorus is designed around a stance I agree with, namely it’s unrealistic to put everything into a single enterprise data warehouse (EDW); you need to manage data marts as well, preferably in a coordinated way. Mainstream data integration/ETL (Extract/Integration/Load) vendors such as Informatica or Ab Initio would surely say “That’s often quite true, and our technology can handle such scenarios just as it handles single-EDW-data-sink environments.” But Greenplum Chorus offers three capabilities not generally found in traditional data integration products (and offers only those three capabilities), namely:

Spin out data marts, whether by recopying the data or by creating a virtual data mart inside another data warehouse/mart.
Find/discover data in databases across your enterprise.
Do social networking around databases/data marts.

Greenplum Chorus is heading into early access soon, with general availability slated around midyear. Also in the mix is a Greenplum “Hypervisor” that can somehow relate to an almost unlimited number of nodes or databases; however, I didn’t get a lot of details on the Greenplum Hypervisor technology or on the target dates for delivering and integrating the Hypervisor with other parts of Greenplum’s technology.

When Greenplum first talked about about the enterprise data cloud (EDC) idea, it emphasized the spinning out of physical data marts in an easy way, as opposed to the virtual data marts pushed by Oliver Ratzesberger and Teradata. Greenplum Chorus, however, supports both kinds (as, at least directionally, does Teradata), specifically letting you choose between:

“Independent sandboxes” – physical copies of the data, in a separate Greenplum database instance.
“Satellite sandboxes” – virtual data marts, of course managed by the same Greenplum database instance.

Actually, if you want to recopy data in the same Greenplum database instance, you can do that too, via something called “data sets,” but that’s not the main focus. Either option, I presume, can be configured to provide either or both of the two main benefits of spun-out data marts, namely:

Control over the performance and SLAs (Service-Level Agreements) of your analytic workload
Ability to mix in new raw data and/or new aggregations

in either case without messing up the performance, SLAs, security, or “one truth-ness” of the existing database.

To provide those capabilities in an analytic DBMS, you need sufficiently robust parallel data movement (for the physical sandboxes) and workload management (for the virtual ones). Greenplum obviously believes it has both. Teradata makes the same claim. Other vendors would make similar assertions, and presumably will offer similar capabilities soon. You also want some kind of ability to ingest data from foreign databases, but that can be pretty routine stuff; e.g., in Release 1 of Chorus, Greenplum is content to offer ODBC access to Oracle, SQL Server, et al.

The “data discovery” and “social networking” aspects of Greenplum Chorus seem to be quite Release 1 as well. Basically, Greenplum lets people post discussion threads about databases and data marts, discussing what value can be derived from them. I guess somebody could include links to web-technology reports based on those databases, but otherwise there’s no integration with business intelligence tools and their collaboration capabilities. Even so, Greenplum reports that business executives liked this capability in early access testing.

Greenplum Chorus is ETL without a lot of T, and without a lot of performance optimizations either. That may not be much of a problem in its paradigmatic use case, spinning out a data mart quickly for some analysis to see if valuable conclusions can be drawn. Presumably, in the most successful cases, business and technical processes would emerge after the fact to pipe up-to-date versions of that analysis into operational systems, mooting any ETL deficiencies in the initial exploration moot. In a world where “data exploration” is becoming an increasingly important concept, something like Greenplum Chorus may suffice to provide significant customer value. But whether Greenplum Chorus’s capabilities are eventually co-opted by more fully-featured data integration suites remains an open question for the future.

Categories: Analytic technologies, Benchmarks and POCs, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, Market share and customer counts, Petabyte-scale data management, Specific users, Telecommunications, Theory and architecture

Subscribe to our complete feed!

Comments

5 Responses to “Greenplum Chorus and Greenplum 4.0”

EMC is buying Greenplum | DBMS2 -- DataBase Management System Services on July 6th, 2010 6:53 pm

[…] Greenplum Chorus could, in principle, work with non-Greenplum DBMS. That possibility suddenly looks a lot more realistic. […]
EMC Is Buying Greenplum on September 20th, 2010 3:22 am

[…] Greenplum Chorus could, in principle, work with non-Greenplum DBMS. That possibility suddenly looks a lot more realistic. […]
Some thoughts on the announcement that IBM is buying Netezza | DBMS 2 : DataBase Management System Services on September 21st, 2010 5:31 pm

[…] As just one piece of that, IBM needs its own version of EMC/Greenplum Chorus. […]
Can Greenplum Become the Sun Microsystems of Databases? — GigaOM Research on October 18th, 2013 5:51 pm

[…] is that Greenplum has a unique vision for working with data (database guru Curt Monash has a great post analyzing the technical aspects of Greenplum Database 4.0 and Chorus, as well as the data-warehousing market, in general), and a Sun-inspired business model that could […]
Today in Cloud — Gigaom Research on October 23rd, 2013 6:33 am

[…] seeing lots of analytics announcements. The biggest news probably came from Greenplum, which announced a major upgrade to its database solution and also a solution called Chorus, which ties into the company’s Enterprise Data Cloud vision. Aster Data also made an […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Greenplum Chorus and Greenplum 4.0

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin