May 8th, 2008 Curt Monash
Another TDWI conference approaches. Not coincidentally, I had another Vertica briefing. Primary subjects included some embargoed stuff, plus (at my instigation) outsourced data marts. But I also had the opportunity to follow up on a couple of points from February’s briefing, namely:
Vertica has about 35 paying customers. That doesn’t sound like a lot more than they had a quarter ago, but first quarters can be slow.
Vertica’s list price is $150K/terabyte of user data. That sounds very high versus the competition. On the other hand, if you do the math versus what they told me a few months ago — average initial selling price $250K or less, multi-terabyte sites — it’s obvious that discounting is rampant, so I wouldn’t actually assume that Vertica is a high-priced alternative.
Vertica does stress several reasons for thinking their TCO is competitive. First, with all that compression and performance, they think their hardware costs are very modest. Second, with the self-tuning, they think their DBA costs are modest too. Finally, they charge only for deployed data; the software that stores copies of data for development and test is free.
Posted in Analytics and analytic technologies, Columnar architectures, Data warehousing, Database compression, Vertica Systems | 4 Comments »
May 8th, 2008 Curt Monash
Call me slow on the uptake if you like, but it’s finally dawned on me that outsourced data marts are a nontrivial segment of the analytics business. For example:
- I was just briefed by Vertica, and got the impression that data mart outsourcers may be Vertica’s #3 vertical market, after financial services and telecom. Certainly it seems like they are Vertica’s #3 market if you bundle together data mart outsourcers and more conventional OEMs.
- When Netezza started out, a bunch of its early customers were credit data-based analytics outsourcers like Acxiom.
- After nagging DATAllegro for a production reference, I finally got a good one — TEOCO. TEOCO specializes in figuring out whether inter-carrier telcom bills are correct. While there’s certainly a transactional invoice-processing aspect to this, the business seems to hinge mainly around doing calculations to figure out correct charges.
- I was talking with Pervasive about Pervasive Datarush, a beta product that lets you do super-fast analytics on data even if you never load it into a DBMS in the first place. I challenged them for use cases. One user turns out to be an insurance claims rule-checking outsourcer.
- One of Infobright’s references is a French CRM analytics outsourcer, 1024 Degres.
- 1010data has built up a client base of 50-60, including a number of financial and retail blue-chippers, with a soup-to-nuts BI/analysis/columnar database stack.
- I haven’t heard much about Verix in a while, but their niche was combining internal sales figures with external point-of-sale/prescription data to assess retail (especially pharma) microtrends.
To a first approximation, here’s what I think is going on. Read the rest of this entry »
Posted in 1010data, Analytics and analytic technologies, Business intelligence, Cloud computing, Data warehousing, Infobright and Brighthouse, Netezza, Pervasive Software, SaaS, Specific users, TEOCO, Vertica Systems | 1 Comment »
March 6th, 2008 Curt Monash
The relational DBMS industry is filled with startups. In some way or other, most of them are based on or make use of the open source project PostgreSQL. (Not all, of course; exceptions include DATAllegro and Infobright, which are based on Ingres and MySQL respectively.) But how they use PostgreSQL varies greatly. Read the rest of this entry »
Posted in EnterpriseDB and Postgres Plus, Greenplum, Open source RDBMS, PostgreSQL, Relational database management systems, Vertica Systems | 9 Comments »
February 8th, 2008 Curt Monash
Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.
I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:
- Can columnar DBMS do operational BI?
- Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
- Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Columnar architectures, Data warehousing, Database theory and practice, EII, ETL, and/or EAI, Michael Stonebraker, ParAccel, Sybase, Vertica Systems | No Comments »
February 7th, 2008 Curt Monash
I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.
- Vertica now is “approaching” 50 paid customers, up from 15 or so in early November. (Compared to most of Vertica’s fellow data warehouse specialists, that’s a lot.) Many — perhaps most — of these customers are hedge funds or telcos.
- Vertica’s typical lag from sale to deployment is about one quarter.
- Vertica’s typical initial selling price is $250K. Or maybe it’s $100-150K. The Vertica guys are generally pretty forthcoming, but pricing is an exception. Whatever they charge, it’s strictly per terabyte of user data. They think they are competitive with other software vendors, and cheaper, all-in, than appliance vendors.
- One subject on which they’re totally non-forthcoming (lawyers’ orders) is the recent patent lawsuit filed by Sybase. They wouldn’t even say whether they thought it was bogus because they didn’t infringe, or whether they thought it was bogus because the patent shouldn’t have been granted.
- Average Vertica database size is a little under 10 terabytes of user data, with many examples in the 15-20 Tb range. Lots of customers plan to expand to 50-100 Tb.
- Vertica claims sustainable load speeds of 3-5 megabytes/sec/node, irrespective of database size. Data is sucked into RAM uncompressed, then written out a gig/node at a time, compressed. Gigabyte chunks are then merged on disk, which is superfast as it doesn’t involve sorting. (30 megabytes/second.) Mike insists this doesn’t compromise compression.
We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.
Please sign up for our feed!
Posted in Analytics and analytic technologies, Data warehousing, Michael Stonebraker, Relational database management systems, Sybase, Vertica Systems | 4 Comments »
December 14th, 2007 Curt Monash
There are at least 16 different vendors offering appliances and/or software that do database management primarily for analytic purposes.* That’s a lot to keep up with,. So I’ve thrown together a little overview of the analytic data management landscape, liberally salted with links to information about specific vendors, products, or technical issues. In some ways, this is a companion piece to my prior post about data warehouse appliance myths and realities.
*And that’s just the tabular/alphanumeric guys. Add in text search and you run the total a lot higher.
Numerous data warehouse specialists offer traditional row-based relational DBMS architectures, but optimize them for analytic workloads. These include Teradata, Netezza, DATAllegro, Greenplum, Dataupia, and SAS. All of those except SAS are wholly or primarily vendors of MPP/shared-nothing data warehouse appliances. EDIT: See the comment thread for a correction re Kognitio.
Numerous data warehouse specialists offer column-based relational DBMS architectures. These include Sybase (with the Sybase IQ product, originally from Expressway), Vertica, ParAccel, Infobright, Kognitio (formerly White Cross), and Sand. Read the rest of this entry »
Posted in Analytics and analytic technologies, Cognos and Applix TM1, DATAllegro, Data warehouse appliances, Data warehousing, Dataupia, Greenplum, IBM and DB2, Kognitio and WX2, Netezza, Oracle, ParAccel, Relational database management systems, SAS Institute, Sybase, Teradata, Vertica Systems | 10 Comments »
December 7th, 2007 Curt Monash
In 1993, Ted Codd introduced the term OLAP (OnLine Analytic Processing) to describe data management that wasn’t optimized for OLTP (OnLine Transaction Processing). Later in the 1990s, Henry Morris of IDC introduced the term analytic applications to describe apps that weren’t transactional. Since then, no better word than “analytic” has emerged to cover the broad class of IT apps and technologies that aren’t focused on transactional processing.
In the latest incarnation, analytic appliances are coming to the fore. Read the rest of this entry »
Posted in Analytics and analytic technologies, Data warehouse appliances, Netezza, Relational database management systems, Vertica Systems | No Comments »
November 7th, 2007 Curt Monash
Vertica quietly announced an appliance bundling deal with HP and Red Hat today. That got me quickly onto the phone with Vertica’s Andy Ellicott, to discuss a few different subjects. Most interesting was the part about Vertica’s customer base, highlights of which included:
- Vertica’s claim to have “50” customers includes a bunch of unpaid licenses, many of them in academia.
- Vertica has about 15 paying customers.
- Based on conversations with mutual prospects, Vertica believes that’s more customers than DATAllegro has. (Of course, each DATAllegro sale is bigger than one of Vertica’s. Even so, I hope Vertica is wrong in its estimate, since DATAllegro told me its customer count was “double digit” quite a while ago.)
- Most Vertica customers manage over 1 terabyte of user data. A couple have bought licenses showing they intend to manage 20 terabytes or so.
- Vertica’s biggest customer/application category – existing customers and sales pipelines alike – is call detail records for telecommunications companies. (Other data warehouse specialists also have activity in the CDR area.). Major applications are billing assurance (getting the inter-carrier charges right) and marketing analysis. Call center uses are still in the future.
- Vertica’s other big market to date is investment research/tick history. Surely not coincidentally, this is a big area of focus for Mike Stonebraker, evidently at both companies for which he’s CTO. (The other, of course, is StreamBase.)
-
Runners-up in market activity are clickstream analysis and general consumer analytics. These seem to be present in Vertica’s pipeline more than in the actual customer base.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business Objects, DATAllegro, Data warehouse appliances, Data warehousing, HP and Neoview, RDF and graphs, Relational database management systems, Vertica Systems | No Comments »
October 23rd, 2007 Curt Monash
One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.
Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:
- Almost everybody (that Vertica sees) wants stars and snowflakes, so that’s what Vertica optimizes for.
- Replicating small dimension tables across nodes is great for performance.
- Even so, Vertica is broadening its support for more general schemas as well.
Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:
Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Columnar architectures, Data warehousing, Database theory and practice, Relational database management systems, Vertica Systems | 1 Comment »
October 23rd, 2007 Curt Monash
Vertica has been quietly selling product for three quarters and has about 50 customers.
Andy Ellicott of Vertica pointed me to the above Richard Hackathorn quote. Sadly, he asked me not to name and shame another analyst who foolishly said Vertica hadn’t “launched” yet.
But then, I understand. I’m also not going to identify the client who gave me fits by insisting on believing that nonsense, even in the face of the well-known facts that Vertica has shipping product, paying customers, and so on.
Posted in Columnar architectures, Data warehousing, Relational database management systems, Vertica Systems | No Comments »
October 19th, 2007 Curt Monash
It’s early autumn, the leaves are turning in New England, and Gartner has issued another Magic Quadrant for data warehouse DBMS. The big winners vs. last year are Greenplum and, secondarily, Sybase. Teradata continues to lead. Oracle has also leapfrogged IBM, and there are various other minor adjustments as well, among repeat mentionees Netezza, DATAllegro, Sand, Kognitio, and MySQL. HP isn’t on the radar yet; ditto Vertica. Read the rest of this entry »
Posted in Analytics and analytic technologies, DATAllegro, Data warehouse appliances, Data warehousing, Greenplum, HP and Neoview, IBM and DB2, Kognitio and WX2, MySQL, Netezza, Oracle, Relational database management systems, Sybase, Teradata, Vertica Systems | 6 Comments »
September 19th, 2007 Curt Monash
I was chatting with Stuart Frost this evening (DATAllegro’s CEO). As usual, I grilled him about customer counts; as usual, he was evasive, but expressed general ebullience about the pace of business; also as usual, he was charming and helpful on other subjects.
In particular, we talked about the Vertica story, and he offered some interesting pushback. Part was blindingly obvious — Vertica’s not in the marketplace yet, when they are the product won’t be mature, and so on. Part was the also obvious “we can do most of that ourselves” line of argument, some of which I’ve summarized in a comment here. But he made two other interesting points as well. Read the rest of this entry »
Posted in Columnar architectures, DATAllegro, Data warehouse appliances, Data warehousing, Database theory and practice, Relational database management systems, Vertica Systems | 1 Comment »
September 18th, 2007 Curt Monash
Back in March, I suggested that compression was a central and compelling aspect of Vertica’s story. Well, in their new blog, the Vertica guys now strongly reinforce that impression.
I recommend those two Database Column posts (by Sam Madden) highly. I’ve rarely seen such a clear, detailed presentation of a company’s technical argument. My own thoughts on the subject boil down to:
- In principle, all the technology (and hence all the technological advantages) they’re talking about could be turned into features of one of the indexing options of a row-oriented RDBMS. But in practice, there’s no indication that this will happen any time soon.
-
Release 1 of the Vertica product will surely have many rough edges.
- Some startups are surprisingly ignorant of the issue involved in building a successful, industrial-strength DBMS. But a company that has both Mike Stonebraker and Jerry Held seriously involved has a big advantage. They may make other kinds of errors, but they won’t make many ignorant ones.
Technorati Tags: Vertica, database compression, columnar
Posted in Columnar architectures, Data warehousing, Database compression, Database theory and practice, Michael Stonebraker, Relational database management systems, Vertica Systems | 4 Comments »
September 6th, 2007 Curt Monash
I’ve written a considerable amount about Vertica and/or the opinions of Mike Stonebraker. Now the Vertica guys have their own blog, which they pledge will not just be a rehash of Vertica marketing pitches — notwithstanding the Vertica-related wordplay in the blog’s name.*
*Those guys are good at wordplay.
Posted in Columnar architectures, Humor, Vertica Systems | No Comments »
June 15th, 2007 Curt Monash
When Mike Stonebraker and I discussed RDF yesterday, he quickly turned to suggesting fast ways of implementing it over an RDBMS. Then, quite characteristically, he sent over a paper that allegedly covered them, but actually was about closely related schemes instead.
Edit: The paper has a new, stable URL. Hat tip to Daniel Abadi.
All minor confusion aside, here’s the story. At its core, an RDF database is one huge three-column table storing subject-property-object triples. In the naive implementation, you then have to join this table to itself repeatedly. Materialized views are a good start, but they only take you so far. Read the rest of this entry »
Posted in Columnar architectures, Data warehousing, Database compression, Database theory and practice, Hierarchies, networks, graphs, and trees, RDF and graphs, Relational database management systems, Vertica Systems | No Comments »
June 14th, 2007 Curt Monash
The word from Vertica is that the product will go GA in the fall, and that they’ll have blow-out benchmarks to exhibit.
I find this very credible. Indeed, the above may even be something of an understatement.
Vertica’s product surely has some drawbacks, which will become more apparent when the product is more available for examination. So I don’t expect row-based appliance innovators Netezza and DATAllegro to just dry up and blow away. On the other hand, not every data warehousing product is going to live long and prosper, and I’d rate Vertica’s chances higher than those of several competitors that are actually already in GA.
Want to continue getting great research about DBMS, analytics, data integration, and other technologies related to data management? Then get a FREE subscription, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Posted in Columnar architectures, DATAllegro, Data warehousing, Netezza, Vertica Systems | 2 Comments »
March 24th, 2007 Curt Monash
In my opinion, the key part of Mike Stonebraker’s fascinating note on data compression was (emphasis mine):
The standard wisdom in most row stores is to use block compression. Hence, a storage block is compressed using a single technique (say Lempel-Ziv or dictionary). The technique chosen then compresses all the attributes in all the columns which occur on the block. In contrast, Vertica compresses a storage block that only contains one attribute. Hence, it can use a different compression scheme for each attribute. Obviously a compression scheme that is type-specific will beat an implementation that is “one size fits all”.
It is possible for a row store to use a type-specific compression scheme. However, if there are 50 attributes in a record, then it must remember the state for 50 type-specific implementations, and complexity increases significantly.
In addition, all row stores we are familiar with decompress each storage block on access, so that the query executor processes uncompressed tuples. In contrast, the Vertica executor processes compressed tuples. This results in better L2 cache locality, less main memory copying and generally much better performance.
Of course, any row store implementation can rewrite their executor to run on compressed data. However, this is a rewrite – and a lot of work.
Read the rest of this entry »
Posted in Columnar architectures, Data warehousing, Database compression, Sybase, Vertica Systems | 6 Comments »
March 24th, 2007 Curt Monash
The following is by Mike Stonebraker, CTO of Vertica Systems, copyright 2007, as part of our ongoing discussion of data compression. My comments are in a separate post.
Row Store Compression versus Column Store Compression
I Introduction
There are three aspects of space requirements, which we discuss in this short note, namely:
structural space requirements
index space requirements
attribute space requirements.
Read the rest of this entry »
Posted in Data warehousing, Database compression, Database theory and practice, Michael Stonebraker, Vertica Systems | 3 Comments »
March 21st, 2007 Curt Monash
We have lively discussions going on columnar data stores vs. vertically partitioned row stores. Part is visible in the comment thread to a recent post. Other parts come in private comments from Stuart Frost of DATAllegro and Mike Stonebraker of Vertica et al.
To me, the most interesting part of what the Vertica guys are saying is twofold. One is that data compression just works better in column stores than row stores, perhaps by a factor of 3, because “the next thing in storage is the same data type, rather than a different one.” Frankly, although Mike has said this a couple of times, I haven’t understood yet why row stores can’t be smart enough to compress just as well. Yes, it’s a little harder than it would be in a columnar system; but I don’t see why the challenge would be insuperable.
The second part is even cooler, namely the claim that column stores allow the processors to operate directly on compressed data. But once again, I don’t see why row stores can’t do that too. For example, when you join via bitmapped indices, exactly what you’re doing is operating on highly-compressed data.
Want to continue getting great research about DBMS, analytics, and other technologies related to data management? Then subscribe to our feed, by RSS/Atom or e-mail! We recommend taking the integrated feed for all our blogs, but blog-specific ones are also easily available.
Posted in Columnar architectures, DATAllegro, Data warehouse appliances, Data warehousing, Database compression, Relational database management systems, Vertica Systems | 1 Comment »
March 19th, 2007 Curt Monash
Stuart Frost of DATAllegro offered an interesting counter today to columnar DBMS architectures — vertical partitioning. In particular, he told me of a 120 terabyte (growing soon to 250 terabytes) call data record database, in which a few key columns were separated out. Read the rest of this entry »
Posted in Columnar architectures, DATAllegro, Data warehouse appliances, Data warehousing, Kognitio and WX2, Relational database management systems, Vertica Systems | 9 Comments »
March 16th, 2007 Curt Monash
IBM sent over a bunch of success stories recently, with DB2’s new aggressive compression prominently mentioned. Mike Stonebraker made a big point of Vertica’s compression when last we talked; other column-oriented data warehouse/mart software vendors (e.g. Kognitio, SAP, Sybase) get strong compression benefits as well. Other data warehouse/mart specialists are doing a lot with compression too, although some of that is governed by please-don’t-say-anything-good-about-us NDA agreements.
Compression is important for at least three reasons:
- It saves disk space, which is a major cost issue in data warehousing.
- It saves I/O, which is the major performance issue in data warehousing.
- In well-designed systems, it can actually make on-chip execution faster, because the gains in memory speed and movement can exceed the cost of actually packing/unpacking the data. (Or so I’m told; I haven’t aggressively investigated that claim.)
When evaluating data warehouse/mart software, take a look at the vendor’s compression story. It’s important stuff.
EDIT: DATAllegro claims in a note to me that they get 3-4x storage savings via compression. They also make the observation that fewer disks ==> fewer disk failures, and spin that — as it were
— into a claim of greater reliability.
Posted in DATAllegro, Data warehouse appliances, Data warehousing, Database compression, IBM and DB2, Relational database management systems, SAP, BI Accelerator, and MaxDB, Vertica Systems | 2 Comments »
March 6th, 2007 Curt Monash
I haven’t been as clear as I could have been in explaining why I think MPP/shared-nothing beats SMP/shared-everything. The answer is in a short white paper, currently bottlenecked at the sponsor’s end of the process. Here’s an excerpt from the latest draft:
There are two ways to make more powerful computers:
1. Use more powerful parts – processors, disk drives, etc.
2. Just use more parts of the same power.
Of the two, the more-parts strategy much more cost-effective. Smaller* parts are much more economical, since the bigger the part, the harder and more costly it is to avoid defects, in manufacturing and initial design alike. Consequently, all high-end computers rely on some kind of parallel processing.
*As measured in terms of capacity, transistor count, etc., not physical size.
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Data warehousing, Database theory and practice, Microsoft and SQL*Server, Netezza, Oracle, Relational database management systems, Teradata, Vertica Systems | 6 Comments »
January 31st, 2007 Curt Monash
… unless you think that is inherently an oxymoron. I thought I was doing well catching and expanding on a clever pop culture reference. But the folks at columnar DBMS start-up Vertica Systems may have topped that with their slogan
The tables have turned
Ouch.
Posted in Columnar architectures, Humor, Vertica Systems | No Comments »
January 22nd, 2007 Curt Monash
If Mike Stonebraker is to be believed, the era of columnar data stores is upon us.
Whether or not you buy completely into Mike’s claims, there certainly are cool ideas in his latest columnar offering, from startup Vertica Systems. The Vertica corporate site offers little detail, but Mike tells me that the product’s architecture closely resembles that of C-Store, which is described in this November, 2005 paper.
The core ideas behind Vertica’s product are as follows. Read the rest of this entry »
Posted in Columnar architectures, Data warehousing, Database compression, Database theory and practice, Kognitio and WX2, Memory-centric data management, Netezza, Products and vendors, Relational database management systems, Vertica Systems | 15 Comments »