October 14, 2009

Greenplum is going hybrid columnar as well

Over the past summer, Vertica, VectorWise, and Oracle all announced flavors of hybrid row/columnar storage. Now it’s Greenplum’s turn. Greenplum is actually offering true columnar storage, as opposed to Oracle’s PAX-like scheme — and also as opposed to the kind of Frankencolumn storage Daniel Abadi decries. For example, you don’t have to do a join to retrieve multiple columns; you just ask for them and there they are. Similarly, Greenplum doesn’t maintain explicit row IDs – whether in row-oriented or column-oriented append-only storage – relying instead on block-level header information.

Highlights include:

*The term “polymorphic” is somewhat, shall we say, overloaded these days.

Comments

12 Responses to “Greenplum is going hybrid columnar as well”

  1. Osma on October 14th, 2009 2:18 am

    Interesting — the append-only compressed row store sounds kind of like a compressed MySQL/MyISAM table though. I’m curious how they’ve approached indexing in the column store mechanism. Have you found any data on that?

  2. Seth Grimes on October 14th, 2009 6:59 am

    Very helpful write-up!

  3. DW Consultant on October 14th, 2009 12:23 pm

    Nice write-up, Although it sounds like Greenplum loves to copy technology rather than innovating. Does this mean that they cannot perform as well as Columnar DBMS? Are they loosing business to Columnar Vendors?

  4. Ben Werther on October 14th, 2009 1:35 pm

    DW Consultant —

    - You’d have to agree that every vendor is building from a largely shared pool of ideas. Most of everything that every vendor does is covered in academic literature going back decades. Our goal isn’t being novel in everything we do — it is delivering value to customers.

    - That being said, I think a little credit is due here. We’ve built a flexible enough storage infrastructure to allow us to (1) easily add a very efficient implementation of column-oriented tables, and (2) allow both row- and column-orientation to be used not just in the same database but in different partitions of the same table.

    So why did we add this feature? It is about customer choice. For most analytical queries and mixed workloads – particularly with high-rate continuous microbatched loads – our row processing wins out over columnar approaches. (i.e. There are good reasons why the pure columnar guys aren’t winning mixed-workload EDW deals against Teradata like we are). But there are a lot of cases where columnar processing does great and does have an edge over row processing. Customers wanted the choice, so now we do both.

  5. Daniel Abadi on October 14th, 2009 2:53 pm

    Curt,

    You pretty much predicted everything I was going to say, but nonetheless, my reactions can be found at:

    http://dbmsmusings.blogspot.com/2009/10/greenplum-announces-column-oriented.html

  6. Paul Johnson on October 15th, 2009 3:52 pm

    Well done to Greenplum for offering more choice say I. A hybrid column/row capability is pretty cool.

    We downloaded the new release a few days ago after Luke mentioned during a call that the new column stuff had been made available.

    It’ll be interesting to see how it works once folks start beating on it.

  7. The Top 10 Trends for 2010 in Analytics, Business Intelligence, and Performance Management « Enterprise Information Management on December 3rd, 2009 5:16 am

    [...] Data, and the like with significant innovations in in-memory processing, exploiting parallelism, columnar storage options, and more.  We already starting to see hybrid approaches between the Hadoop players and [...]

  8. Appregatta Blog » 2010: The Year of Business Intelligence on December 10th, 2009 6:08 am

    [...] Data, and the like with significant innovations in in-memory processing, exploiting parallelism, columnar storage options, and more. Additionally, significant opportunities to push application processing into [...]

  9. Aster Data nCluster Version 4.6 | DBMS 2 : DataBase Management System Services on September 15th, 2010 3:35 am

    [...] Aster Data has now joined Greenplum/EMC among row-based analytic DBMS vendors with hybrid row-column stores. Oracle will join them some [...]

  10. Columnar compression vs. column storage | DBMS 2 : DataBase Management System Services on February 6th, 2011 4:23 am

    [...] that truly offer some form of hybrid row/column storage include Vertica, EMC/Greenplum, and Aster Data. Oracle Exadata, in my opinion, does not, but I can see why people might get [...]

  11. The Top 10 Trends for 2010 in Analytics, Business Intelligence, and Performance Management | Analytics Careers on November 11th, 2011 12:41 pm

    [...] Data, and the like with significant innovations in in-memory processing, exploiting parallelism, columnar storage options, and more.  We already starting to see hybrid approaches between the Hadoop players and [...]

  12. Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 18th, 2012 6:43 pm

    [...] neglects to praise Greenplum for true hybrid row/columnar data management, a feature shared by Teradata and Vertica, among others, but not by Oracle, DB2, or [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.