Vertica Systems

Analysis of columnar data warehouse DBMS vendor Vertica Systems. Related subjects include:

July 2, 2009

Notes on columnar/TPC-H compression

I was chatting with Omer Trajman of Vertica, and he said that a 70% compression figure for ParAccel’s recent TPC-H filing sounded about right.*  When I noted that seemed kind of low, Omer pointed out that TPC-H data is pseudo-random, while real-life data has much more correlation among the values in different columns. E.g., in retail, a customer is likely to consistently shop at the same stores and to put similar items into his shopping basket).

*Omer was involved in Vertica’s TPC-H-data-based load speed benchmark, and is Vertica’s representative to the TPC.

But why does this matter? After all, Vertica compresses one column at a time (unlike, say, Clearpace).  Well, the reason is that Vertica — like other column stores — wants to store different columns in the same row order, for obvious benefits in both reading and writing.  So, for example, if all the rows that include Gotham City are grouped sequentially, then all the rows mentioning Bruce Wayne are likely to be near each other as well, while none of the rows that mention Clark Kent will be mixed in.

And when a set of consecutive entries has low cardinality, it’s easier to get high levels of compression.

June 25, 2009

My current customer list among the analytic DBMS specialists

(This is an updated version of an August, 2008 post.)

One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:

All of those are Monash Advantage members.

If you care about all this, you may also be interested in the rest of my standards and disclosures.

June 22, 2009

H-Store is now VoltDB

I’ve always honored more of an NDA about the H-Store project and its commercialization than I really felt obligated to, given how freely information was being bandied about to others. I’m still doing so. :)

But I think I’ll at least say that the H-Store project is now named VoltDB.  The VoltDB website names two individuals — Mike Stonebraker and Andy Palmer — both of whom are founders of Vertica. Job listings on the site are for field engineer and trainer, but not developer, so that suggests something about the project’s/product’s maturity level.

If you have an extreme OLTP need, you should talk to VoltDB. If you don’t have access to Mike or Andy directly, I can hook you up with a key VoltDB marketing/outreach guy. Price may not be as much of a barrier as you’d initially fear.

If anybody from VoltDB wants to be less cloak-and-daggery and say more in the comment thread, I’d be pleased.

And yes — an open-secret working name for H-Store/VoltDB was, for a while, “Horizontica.”

June 8, 2009

Per-terabyte pricing

Software-only DBMS vendors sometimes price per terabyte of user data.  Vertica’s list price is $100K/TB. Greenplum’s list price is $70K/TB. In practice, both offer substantial discounts, especially at higher volumes.  In both cases, this means raw data, uncompressed, without counting indexes or temp space.

Client experience teaches me that this definition is easy to forget, so let me reemphasize the key point:

Per-terabyte pricing is based on a calculated figure.  Per-terabyte pricing is not based on the current disk space used by your database when managed by the DBMS you are replacing.

There’s at least one important difference in how Vertica and Greenplum calculate database size.  No matter how many times you copy the data, Vertica only charges you for it once.* But if you spin out data marts and recopy data into it — as Greenplum rightly encourages you to do — Greenplum wants to be paid for each copy.  Similarly, Vertica charges only for deployment, and not for test or development; I didn’t remember to ask what Greenplum’s policies are in those regards. (Edit: Greenplum says in a comment below that it doesn’t charge for test or development data either.)

*That policy is a great fit with Vertica’s performance recommendation that you should store columns in different sort orders, perhaps an average of two copies per column.

April 25, 2009

Vertica pricing and customer metrics

Since last fall, Vertica’s stated pricing has been “$100K per terabyte of user data.” Vertica hastens to point out that unlike, for example, appliance vendors or Sybase, it only charges for deployment licenses; development and test are free (although of course you have to Bring Your Own hardware). Offer the past few weeks, I’ve gotten other pricing comments from Vertica to the effect that:

I didn’t press my luck and ask exactly what “average” means in this context.

As for customers, metrics I got include:

Read more

April 14, 2009

There always seems to be a fire drill around MapReduce news

Last August I flew out to see my new clients at Greenplum. They told me they planned to roll out MapReduce in a few weeks, and asked for my help in publicizing it. From their offices I went to dinner with non-clients Aster Data, who told me they’d gotten wind of a Greenplum MapReduce announcement and planned to come out ahead of it. A couple of hours later, Aster signed up as a client. In something of a pickle — but not one of my own making — I knocked heads, and persuaded both vendors to announce MapReduce at the same time, namely the following Monday. Lots of publicity ensued for both vendors, and everybody was reasonably satisfied.

Read more

April 14, 2009

Stonebraker, DeWitt, et al. compare MapReduce to DBMS

Along with five other coauthors — the lead author seems to be Andy Pavlo — famous MapReduce non-fans Mike Stonebraker and David DeWitt have posted a SIGMOD 2009 paper called “A Comparison of Approaches to Large-Scale Data Analysis.” The heart of the paper is benchmarks of Hadoop, Vertica, and “DBMS-X” on identical clusters of 100 low-end nodes., across a series of tests including (if I understood correctly):

Read more

February 23, 2009

The questionable benefits of terabyte-scale data warehouse virtualization

Vertica is virtualizing via VMware, and has suggested a few operational benefits to doing so that might or might not offset VMware’s computational overhead. But on the whole,it seems virtualization’s major benefits don’t apply to the large-database MPP data warehousing.

Read more

February 23, 2009

Vertica Virtualizes Via VMware

(In other news, the sixth sick sheik’s sixth sheep is sick … but I digress.)

It seems that every analytic DBMS vendor feels compelled to issue at least one press release the week of winter TDWI. Vertica’s grand revelation this year is that you can use Vertica with VMware.* Of course, VMware working the way it does, you in fact have always been able to use Vertica with VMware. But now things are slightly improved, because Vertica has built install packages you can download, and has been working out recommended configuration settings as well.

*Edit: The actual press release is up now.

Read more

February 4, 2009

Draft slides on how to select an analytic DBMS

I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night.  Anybody see something I’m overlooking, or just plain got wrong?

Edit: The slides have now been finalized.

February 2, 2009

One vendor’s trash is another’s treasure

A few months ago, CEO Mayank Bawa of Aster Data commented to me on his surprise at how “profound” the relationship was between design choices in one aspect of a data warehouse DBMS and choices in other parts. The word choice in that was all Mayank, but the underlying thought is one I’ve long shared, and that I’m certain architects of many analytic DBMS share as well.

For that matter, the observation is no doubt true in many other product categories as well. But in the analytic database management arena, where there are literally 10-20+ competitors with different, non-stupid approaches, it seems most particularly valid. Here are some examples of what I mean.

Read more

January 26, 2009

New England Database Day this Friday January 30

Dan Weinreb, to whose opinions I usually give great weight, spoke very favorably of last year’s New England Database Day conference.  Well, this year’s is taking place on Friday.  It’s at MIT and it’s free, with easy registration.  A list of papers is here

It’s pretty obvious who’s running the show. Sam Madden’s name is given as a contact; elsewhere it’s referred to as being organized by Madden and Mike Stonebraker.  Of the six identified papers, 2-3 look like the subjects or people could be taken straight from Vertica’s Database Column blog.  But that hardly means the event will be one long Vertica commercial.  For example, the other papers include one from Netezza and one on Flash memory data access methods.

I really doubt I’ll make to Cambridge in time for the 9:00 am opening remarks ;), but I’ll try to swing by later on.

January 12, 2009

Database SaaS gains a little visibility

Way back in the 1970s, a huge fraction of analytic database management was done via timesharing, specifically in connection with the RAMIS and FOCUS business-intelligence-precursor fourth-generation languages.  (Both were written by Gerry Cohen, who built his company Information Builders around the latter one.)  The market for remoting-computing business intelligence has never wholly gone away since. Indeed, it’s being revived now, via everything from the analytics part of Salesforce.com to the service category I call data mart outsourcing.

Less successful to date are efforts in the area of pure database software-as-a-service.  It seems that if somebody is going for SaaS anyway, they usually want a more complete, integrated offering. The most noteworthy exceptions I can think of to this general rule are Kognitio and Vertica, and they only have a handful of database SaaS customers each. To wit: Read more

January 12, 2009

Gartner’s 2008 data warehouse database management system Magic Quadrant is out

Gartner’s annual Magic Quadrant for data warehouse DBMS is out.  Thankfully, vendors don’t seem to be taking it as seriously as usual, so I didn’t immediately hear about.  (I finally noticed it in a Greenplum pay-per-click ad.)  Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ.  My posts on the 2007 and 2006 MQs have also been updated with working links. Read more

January 3, 2009

More from Vertica on data warehouse load speeds

Last month, when Vertica releases its “benchmark” of data warehouse load speeds, I didn’t realize it had previously released some actual customer-experience load rates as well.  In a July, 2008 white paper that seems thankfully free of any registration requirements, Vertica cited four examples:

Read more

December 20, 2008

More grist for the column vs. row mill

Daniel Abadi and Sam Madden are at it again, following up on their blog posts of six months arguing for the general superiority of column stores over row stores (for analytic query processing).  The gist is to recite a number of bases for superiority, beyond the two standard ones of less I/O and better compression, and seems to be based largely on Section 5 of a SIGMOD paper they wrote with Neil Hachem.

A big part of their argument is that if you carry the processing of columnar and/or compressed data all the way through in memory, you get lots of advantages, especially because everything’s smaller and hence fits better into Level 2 cache. There also is some kind of join algorithm enhancement, which seems to be based on noticing when the result wound up falling into a range according to some dimension, and perhaps using dictionary encoding in a way that will help induce such an outcome.

The main enemy here is row-store vendors who say, in effect, “Oh, it’s easy to shoehorn almost all the benefits of a column-store into a row-based system.”  They also take a swipe — for being insufficiently purely columnar — at unnamed columnar Vertica competitors, described in terms that seemingly apply directly to ParAccel.

December 2, 2008

Data warehouse load speeds in the spotlight

Syncsort and Vertica combined to devise and run a benchmark in which a data warehouse got loaded at 5 ½ terabytes per hour, which is several times faster than the figures used in any other vendors’ similar press releases in the past. Takeaways include:

The latter is unsurprising. Back in February, I wrote at length about how Vertica makes rapid columnar updates. I don’t have a lot of subsequent new detail, but it made sense then and now.

Read more

November 18, 2008

Silly website tricks

Vertica’s marketing is usually good-to-outstanding, but they made a funny misstep this time. If you go to the Vertica home page, you’ll see seasonal art suggesting that their product is a turkey and/or that it’s terrified it’s about to get the ax.

Live by the pun, die by the pun.

October 15, 2008

Vertica offers some more numbers

Eric Lai interviewed Dave Menninger of Vertica.  Highlights included:

September 24, 2008

Vertica finally spells out its compression claims

Omer Trajman of Vertica put up a must-read blog post spelling out detailed compression numbers, based on actual field experience (which I’d guess is from a combination of production systems and POCs):

It’s clear what Omer means by most of those categories from reading the post, but I’m a little fuzzy on what “Consumer Data” or “Marketing Analytics” comprise in his taxonomy. Anyhow, Omer’s post is a huge improvement over my recent one — based on a conversation with Omer :) — which featured some far less accurate or complete compression numbers.

Omer goes on to claim that trickle-feed data is harder for rival systems to compress than it is for Vertica, and generally to claim that Vertica’s compression is typically severalfold better than that of competitive row-based systems.

September 22, 2008

Database compression is heavily affected by the kind of data

I’ve written often of how different kinds or brands of data warehouse DBMS get very different compression figures. But I haven’t focused enough on how much compression figures can vary among different kinds of data. This was really brought home to me when Vertica told me that web analytics/clickstream data can often be compressed 60X in Vertica, while at the other extreme — some kind of floating point data, whose details I forget for now — they could only do 2.5X. Edit: Vertica has now posted much more accurate versions of those numbers. Infobright’s 30X compression reference at TradeDoubler seems to be for a clickstream-type app. Greenplum’s customer getting 7.5X — high for a row-based system — is managing clickstream data and related stuff. Bottom line:

When evaluating compression ratios — especially large ones — it is wise to inquire about the nature of the data.

September 22, 2008

Web analytics — clickstream and network event data

It should surprise nobody that web analytics – and specifically clickstream data — is one of the biggest areas for high-end data warehousing. For example:

Read more

September 6, 2008

SANs vs. DAS in MPP data warehousing

Generally speaking:

But if you think about it, those facts don’t exactly add up.

Read more

September 5, 2008

Dividing the data warehousing work among MPP nodes

I talk with lots of vendors of MPP data warehouse DBMS. I’ve now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives.

Read more

August 26, 2008

Vertica’s paying customer count

In a recent Computerworld article, Andy Ellicott of Vertica was cited as saying Vertica has 50 paying customers total. That’s very much on par with Greenplum’s figure, leaving aside any questions of deal size. (Greenplum runs a number of databases much larger than Vertica’s biggest. However, I believe Greenplum also charges a lot less per terabyte of user data.)

Previous Vertica paying customer count figures include:

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.