August 4, 2009

PAX Analytica? Row- and column-stores begin to come together

Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness. Their case generally runs along the lines:

Pushbacks to this argument from row-based vendors include:

plus generous dollops of:

(OK, I made that last one up, but I do hear the other claims frequently.)

However, there are at least two ways in which row- and column-stores are beginning to come together. First, there are lots of rumors about row-store vendors bringing out column-store options, even beyond the recent Ingres/VectorWise announcement. (But anything I may know about same beyond noticing the rumors fly by is surely under NDA.) Second, column-store vendors Vertica and VectorWise are bringing out a kind of row/column hybrid storage option.

Vertica 3.5 introduces what Vertica calls “FlexStore.” A key part of FlexStore is the ability to store data not just in pure columnar format, but also to group columns together in what amounts to sub-rows. This is advantageous when data is retrieved together and, I presume, when it is updated. There’s a tradeoff in giving up column stores’ compression advantages, however, and use of this feature is not recommended for columns that are frequently retrieved independently. Vertica also notes that since it typically uses 1 megabyte block sizes, any table smaller than that shouldn’t be broken into columns at all.

VectorWise, of course, doesn’t have a product right now, but has gotten a bunch of recent publicity around the column-store product it plans to ship via its partner Ingres in 2010. When I asked Peter Boncz about row/column hybridization inside VectorWise (not federating between Ingres and VectorWise, but rather truly within VectorWise), he said one of the storage options was PAX, and pointed me at a 2001 paper by a group of academics that includes the ubiquitous Dave DeWitt. PAX turns out to stand, in creative spelling, for Partition Attributes Across.

The PAX idea is to store as many rows of data as can fit into a block, but within the block store them in columns. This preserves some of the compression and cache-efficiency benefits of column stores, while also bringing back whole rows in a single step. (I think Vertica’s FlexStore does something similar to this, but I’m not sure.)

Further confusing things, Peter Boncz of VectorWise told me VectorWise can support “any hybrid” of columnar storage and PAX.

Bottom line: The distinction between row- and column-stores isn’t going to go away any time soon, but it is at least beginning to blur a bit.

Comments

11 Responses to “PAX Analytica? Row- and column-stores begin to come together”

  1. VectorWise, Ingres, and MonetDB | DBMS2 -- DataBase Management System Services on August 4th, 2009 6:43 am

    [...] VectorWise, the product, will be an open-source columnar analytic DBMS. (But that’s not quite true. Pending productization, it’s more accurate to call the VectorWise technology a row/column hybrid.) [...]

  2. The future of the database is… plaid? — Too much information on September 2nd, 2009 9:44 am

    [...] Curt Monash recently noted there are a couple of approaches emerging to hybrid row/column [...]

  3. Oracle Exadata Hybrid Columnar Compression | DBMS2 -- DataBase Management System Services on September 3rd, 2009 5:33 am

    [...] sounds a whole lot like PAX. Specifically, in Oracle’s case I would guess “hybrid columnar compression” [...]

  4. This and that | DBMS2 -- DataBase Management System Services on December 29th, 2009 5:15 am

    [...] Vertica offers a post on its 3.5 release, with a riff on the popular theme “We’ve fixed some weaknesses in our prior versions that we didn’t previously say we had.” More important, Vertica is pretty clear on the virtues of its hybrid columnar architecture. [...]

  5. Ingres VectorWise technical highlights | DBMS2 -- DataBase Management System Services on July 23rd, 2010 8:35 pm

    [...] VectorWise 1.0 is pretty purely columnar. There’s a bit of PAX, but it’s mainly automagic/under the covers. The one user-controlled exception I understood [...]

  6. Aster Data nCluster Version 4.6 | DBMS 2 : DataBase Management System Services on September 15th, 2010 3:09 am

    [...] main thing in Aster Data nCluster Version 4.6 is Aster’s version of hybrid row-column store technology. Technical highlights, if I’m getting it right, [...]

  7. Mike Stonebraker on “real column stores” | DBMS 2 : DataBase Management System Services on January 12th, 2011 9:43 am

    [...] seems not to be met by any of the vendors cited — including Vertica, which introduced Vertica FlexStore in mid-2009.  And while I’m at it — Aster Data nCluster definitely meets criterion [...]

  8. Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 8th, 2012 12:17 pm

    [...] to praise Greenplum for true hybrid row/columnar data management, a feature shared by Teradata and Vertica, among others, but not by Oracle, DB2, or [...]

  9. Joel Wittenmyer on April 11th, 2012 9:03 am

    We had some talking heads pitching Exadata to us yesterday. I say talking heads because they demonstrated a definite lack of knowledge of certain of their subjects, such as RAC. Anyway, they described the HCC method as storing ‘like’ columns together in a block. So within a bulk load (which, I think, gives us an idea of just what the scope of a Compression Unit is…) they can persist columns that are integers into a block together, columns that are strings into another block together, etc. I’m guessing that they determine the optimal compression algorithm for each block (or set of blocks that hold the same datatype) based on the datatypes in those blocks.

  10. Columnar compression vs. column storage | DBMS 2 : DataBase Management System Services on May 26th, 2013 4:41 pm

    [...] Specifically, if data in a relational table is grouped together according to what row it’s in, then the database manager is called “row-based” or a “row store.” If it’s grouped together according to what column it’s in, then the database management system is called “columnar” or a “column store.” Increasingly, row-based and columnar storage are being hybridized. [...]

  11. Impala and Parquet | DBMS 2 : DataBase Management System Services on June 23rd, 2013 11:36 pm

    [...] execution engine such as Impala — can refer to. Within these big blocks, Parquet is PAX-like; i.e., it stores entire rows in the same big block, but does so a column at a time. However, [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.