January 22, 2007

Are row-oriented RDBMS obsolete?

If Mike Stonebraker is to be believed, the era of columnar data stores is upon us.

Whether or not you buy completely into Mike’s claims, there certainly are cool ideas in his latest columnar offering, from startup Vertica Systems. The Vertica corporate site offers little detail, but Mike tells me that the product’s architecture closely resembles that of C-Store, which is described in this November, 2005 paper.

The core ideas behind Vertica’s product are as follows.

Data warehouse queries only need to retrieve the data in certain columns from disk. Therefore, storing data in columns reduces I/O.
But a pure column store is hard to update in real-time, and data warehouses need real-time updates (both for “real-time” uses and just for error correction). Hence, there is a small (1 gigabyte or so) conventional row store to receive updates, the contents of which are periodically bulk-moved to the column store. It’s in main memory, and hence super-fast. (That’s not how the paper says C-Store was architected, but it seems to be one of the things that got changed for the commercial Vertica implementation.)
Timestamps are used for inserts and deletes; otherwise, there are no data changes. (Without that kind of approach, the update strategy in Point #2 couldn’t be viable.) A big benefit to these timestamps is that you can assure integrity via “snapshot isolation”; i.e., by a virtual rollback to a recent point in time. Thus, Vertica can get away without any kind of locks or, for that matter, transaction/redo logs. Row-oriented Netezza uses a similar logless, lockless approach.
Columnar data stores lend themselves to aggressive compression. After all, most sophisticated compression techniques depend upon deltas (or lack of delta!) vs. other values in the same column. And compression works a lot better when the column itself is sorted. Vertica’s compression is carried straight through into query processing. One benefit: It also allows for more use of on-processor Level 2 cache. (Efficient use of Level 2 cache gets mentioned to me a lot these days …)
Data is stored in overlapping projection views, each of which is sorted by at least one of the columns in the view. Presorting obviously helps with query performance. Of course, this redundancy carries a penalty at load or update time. But the same is true of conventional RDBMS’s indices and, yes, materialized views.
Data is partitioned “horizontally,” in a shared-nothing environment. I.e., different “rows” go to different nodes. Queries are resolved on each node, and the result sets are combined centrally, with no attempt to ship intermediate results from node to node. Despite the experience of other shared-nothing data warehouse vendors that this approach leads to bottlenecks, Mike is confident it works fine in Vertica’s case.

Obviously, my post title was exaggerated; nobody, including Mike, thinks row-oriented data stores are obsolete for OLTP. But what about data warehousing? Will an approach like Vertica’s eventually win versus, say, the shared-nothing row-oriented RDBMS leaders (that would be some combination of IBM, Teradata, Netezza, and DATAllegro, depending on what you mean by “leader”)? Well, apparently Vertica has a bunch of tests going on, at database sizes from the low 100s of gigabytes to the low 10s of terabytes. And of course they have those great-looking benchmark results, for which they swear they tuned competitor’s products with passionate care.

If I have to make an early guess, I’d say that the success of columnar systems will depend in no small part on what kind of data warehouse applications we’re talking about. Referencing a taxonomy I previously posted:

Pinpoint data lookup doesn’t seem like a great fit for columnar systems. Indeed, traditional rows-and-B-trees would seem to be best.
Constrained query and reporting would seem to be a sweet spot, even though it’s a sweet spot for some of the best competition as well.
Cube-filling calculations involve big intermediate result sets. I’m not sure that’s a great fit for columnar systems.
Hardcore tabular data crunching would seem in many cases to be another sweet spot, again against a lot of competition, at least in some of its sub-categories.
Text and media search are best done by specialized systems that, at least in the case of text, wind up being quasi-columnar. The same goes for other specialty areas. Systems like Vertica’s have nothing to offer directly to these applications. However, it might be possible for Vertica to integrate with them fairly quickly, given that they’re starting from vaguely similar philosophical roots.

Categories: Columnar database management, Companies and products, Data warehousing, Database compression, Kognitio, Memory-centric data management, Michael Stonebraker, Netezza, Theory and architecture, Vertica Systems

Subscribe to our complete feed!

Comments

19 Responses to “Are row-oriented RDBMS obsolete?”

DBMS2 — DataBase Management System Services»Blog Archive » Mike Stonebraker Blasts “One Size Fits All” on January 22nd, 2007 7:24 am

[…] More recently, the argument in that paper has been extended with a benchmark-filled follow-up based on another Stonebraker startup, Vertica. • • • […]
Stuart Frost on January 22nd, 2007 11:38 am

Curt,

I took a look at C-Store a while ago when Vertica first came on the scene. The idea that row-oriented databases are going to be superceded “real soon” by column-oriented has been pushed on and off for around 20 years. Sand and Sybase IQ are OK for small (sub-TB) data warehouses, but they just don’t scale beyond that. Am I missing something in Vertica that would change that?

In practice, our row-oriented DATAllegro appliance is CPU bound for most queries, so I/O isn’t really the bottleneck as it is with most systems. We’re also about to introduce compression to move the bar even further.

Stuart
DATAllegro
Curt Monash on January 22nd, 2007 11:50 am

Hi Stuart!

Sybase IQ doesn’t scale because it isn’t properly parallellized. I can’t comment on Sand; they certainly claim to scale.

Kognitio, on the other hand, does show every sign of scaling with a columnar, shared-nothing architecture.

The one thing that worries me is what’s highlighted in Point #6 above. Just for what kinds of queries does or doesn’t the system scale? (I also don’t know the answer to that for Kognitio.) Otherwise, the story sounds pretty clean to me.

Best,

CAM
DBMS2 — DataBase Management System Services»Blog Archive » It’s a good week for puns … on January 31st, 2007 6:47 pm

[…] … unless you think that is inherently an oxymoron. I thought I was doing well catching and expanding on a clever pop culture reference. But the folks at columnar DBMS start-up Vertica Systems may have topped that with their slogan […]
DBMS2 — DataBase Management System Services»Blog Archive » Word of the day: “Compression” on March 16th, 2007 5:30 am

[…] IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements. […]
DBMS2 — DataBase Management System Services»Blog Archive » DATAllegro vs. Vertica and other columnar systems on March 19th, 2007 11:25 pm

[…] I’m hard pressed to see why, for some applications, this wouldn’t have all the benefits of the full columnar architectures of, say, Vertica or Kognitio. That said, I can also envision other applications in which Vertica would offer large performance benefits by allowing redundant storage with a variety of sort orders. […]
Phil Bowermaster on May 4th, 2007 6:46 pm

Hi Curt,

Bloor Research recently published an excellent evaluation white paper on Sybase IQ, authored by Philip Howard, which addresses (among other subjects) the Sybase IQ approach to parallelization.

http://www.sybase.com/content/1035804/SybaseIQ-12.7-010407-wp.pdf

As for Sybase IQ’s ability to scale — it has been dramatically demonstrated in a number of benchmark exercises (up to 155 TB) and customer implementations (40+ TB in production). A few examples:

http://www.sybase.com/detail_list?id=49108
http://www.sybase.com/detail?id=1027323
http://www-03.ibm.com/systems/p/solutions/sybase/iq/index.html

The entry of Vertica and other players into the column-based database market helps to demonstrate the growth potential of this space. We can expect to see more such entrants as database sizes continue to increase and organizations continue to look for technology that can reliably handle their analytics requirements.

Phil Bowermaster
Sybase
Curt Monash on May 5th, 2007 12:41 am

Hi Phil,

Nice paper! Did you guys sponsor it? I didn’t see any disclosure statements about that, but I noticed that “evaluation” was in quotes in the title.

Either way, I’m a great admirer of Philip Howard’s unrelentingly optimistic view of technology, as per http://www.dbms2.com/2006/05/15/philip-howard-likes-viper/. And I wonder whether it’s really true that the appliance vendors don’t do tokenization/dictionary compression. If they don’t, they surely should, and probably will soon.

Seriously, I’d be interested to learn what unnatural acts you did or didn’t have to perform to scale that high. And I’d really like to learn about the complexity you do or don’t offer in text analysis, since I’ve long thought that columnar relational indexing and text indexing were apt to fit very well together.

CAM
Ruslan on September 7th, 2007 4:11 am

Hi all,

well, BEFORE you have invent Vertica, and BEFORE Sybase have ship its column-oriented product, yet in 1998 year was introduced Valentina Database (www.paradigmasoft.com), with major development started at 1994-1995.

Intresting to compare 🙂
Curt Monash on September 8th, 2007 2:10 am

Hi Ruslan,

I’m trying to remember when Bob Epstein of Sybase first enthused to me about the Expressway acquisition, and I think it was a little earlier than the timeframe you’re suggesting.

Anyhow, after looking at your website I have a few suggestions:

1. If your main claim is speed, don’t have the benchmark link be dead.

2. Developer pricing is a bad business model in most markets.

3. Your web site doesn’t really say very much .

4. You need a copy editor who is a native English speaker.

Best regards,

CAM
Ileana Somesan on November 17th, 2007 12:58 pm

Hi all,

where is the novelty of column-oriented DBMS? Is this storage architecture another name for vertical partitioning in traditional RDBMS?

Ileana
Curt Monash on November 17th, 2007 6:29 pm

Hi Ileana,

You might want to look through http://www.dbms2.com/category/database-theory-practice/columnar-database-management/ for some ideas and answers. ParAccel and SAP would say that columnar architectures make memory-centric processing easier. Vertica and Infobright would say they make compression easier. DATAllegro and other row-based vendors, however, would offer the same skeptical questions you did.

Best,

CAM
Steve on December 6th, 2007 7:51 am

Sybase IQ doesn’t scale beyond one TB… Damn I must tell my client that, they have been using Sybase IQ for a 7TB DWH for the past 3 years (40Tb raw data btw)…
Curt Monash on December 6th, 2007 1:04 pm

Steve,

As I asked above — are there any unnatural acts of partitioning reflected in the SQL to get that kind of scalability?

Any serious DBMS can scale almost arbitrarily large if you just put a lot of database instances side by side …

CAM
DBMS2 — DataBase Management System Services » Blog Archive » Arguments AGAINST data warehouse appliances on April 25th, 2008 12:11 am

[…] similar arguments to me a few days ago. They are not wholly unbiased; indeed, both are involved in Vertica Systems. With that caveat, they have an interesting three-part […]
DBMS2 — DataBase Management System Services » Blog Archive » Who’s who in columnar relational database management systems on May 30th, 2008 3:45 am

[…] entirely in-memory and hence is limited in possible database size. Mike Stonebraker’s startup Vertica is of course the new kid on the block, and there are other columnar startups as well whose names […]
ITEC-470: Draft Schedule on June 18th, 2009 7:45 pm

[…] http://www.theregister.co.uk/2009/04/10/ibm_system_s_super/ and http://www.informationweek.com/news/software/database/showArticle.jhtml?articleID=207801436 and http://www.dbms2.com/2007/01/22/are-row-oriented-rdbms-obsolete/ […]
ITEC-470: Fall 09 Schedule (Draft) on June 28th, 2009 7:37 pm

[…] article and article and article […]
Technical basics of Sybase IQ | DBMS2 -- DataBase Management System Services on May 23rd, 2010 4:35 am

[…] columns themselves can be used as indexes in the usual Vertica-like […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Are row-oriented RDBMS obsolete?

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin