Vertica Systems
Analysis of columnar data warehouse DBMS vendor Vertica Systems. Related subjects include:
Vertica slaughters Sybase in patent litigation
Back in August, 2008, I pooh-poohed Sybase’s patent lawsuit against Vertica. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, should they ever achieve material marketplace success.
If a recent Vertica press release is to be believed, Sybase got clobbered. The meat is:
… Sybase has admitted that under the claim construction order issued by the Court on November 9, 2009, “Vertica does not infringe Claims 1-15 of U.S. Patent No. 5,794,229.” Sybase further acknowledged that because the Court ruled that all the remaining claims in the patent (claims 16-24) were invalid, “Sybase cannot prevail on those claims.”
For those counting along at home — the patent only has 24 claims in total.
I have no idea whether Sybase can still cobble together grounds for appeal, or claims under some other patent. But for now, this sounds like a total victory for Vertica.
Edit: I’ve now seen a PDF of a filing suggesting the grounds under which Sybase will appeal. Basically, it alleges that the judge erred in defining a “page” of data too narrowly. Note that if Sybase prevails on appeal on that point, Vertica has a bunch of other defenses that haven’t been litigated yet. It further seems that Sybase may have recently filed another patent case against Vertica, in a different venue, based on a different patent.
One annoying blog troll excepted, is anybody surprised at this outcome?
| Categories: Columnar database management, Data warehousing, Sybase, Vertica Systems | 4 Comments |
This and that
I have various subjects backed up that I don’t really want to write about at traditional blog-post length. Here are a few of them. Read more
| Categories: Analytic technologies, Columnar database management, Complex event processing (CEP), Mark Logic, Native XML, Open source, Oracle, Theory and architecture, Vertica Systems | 2 Comments |
How 30+ enterprises are using Hadoop
MapReduce is definitely gaining traction, especially but by no means only in the form of Hadoop. In the aftermath of Hadoop World, Jeff Hammerbacher of Cloudera walked me quickly through 25 customers he pulled from Cloudera’s files. Facts and metrics ranged widely, of course:
- Some are in heavy production with Hadoop, and closely engaged with Cloudera. Others are active Hadoop users but are very secretive. Yet others signed up for initial Hadoop training last week.
- Some have Hadoop clusters in the thousands of nodes. Many have Hadoop clusters in the 50-100 node range. Others are just prototyping Hadoop use. And one seems to be “OEMing” a small Hadoop cluster in each piece of equipment sold.
- Many export data from Hadoop to a relational DBMS; many others just leave it in HDFS (Hadoop Distributed File System), e.g. with Hive as the query language, or in exactly one case Jaql.
- Some are household names, in web businesses or otherwise. Others seem to be pretty obscure.
- Industries include financial services, telecom (Asia only, and quite new), bioinformatics (and other research), intelligence, and lots of web and/or advertising/media.
- Application areas mentioned — and these overlap in some cases — include:
- Log and/or clickstream analysis of various kinds
- Marketing analytics
- Machine learning and/or sophisticated data mining
- Image processing
- Processing of XML messages
- Web crawling and/or text processing
- General archiving, including of relational/tabular data, e.g. for compliance
Oracle and Vertica on compression and other physical data layout features
In my recent post on Exadata pricing, I highlighted the importance of Oracle’s compression figures to the discussion, and the uncertainty about same. This led to a Twitter discussion featuring Greg Rahn* of Oracle and Dave Menninger and Omer Trajman of Vertica. I also followed up with Omer on the phone. Read more
| Categories: Columnar database management, Data models and architecture, Data warehousing, Database compression, Oracle, Theory and architecture, Vertica Systems | 13 Comments |
FlexStore and the rest of Vertica 3.5
Today, Vertica is announcing its 3.5 release, timed in line with a TDWI conference. Vertica 3.5 is scheduled to go into beta test in mid-August and be released to general availability in early October. Vertica 3.5 highlights include:
- Vertica/MapReduce integration, which I’m covering in a separate post
- A new storage architecture called Vertica FlexStore, which seems to boil down essentially to three things:
- A sort of row/column hybridization — Vertica would probably prefer to call it something like a column clustering feature — that I’m also covering in a separate post.
- The beginnings of a multi-temperature capability, somewhat akin to Teradata Virtual Storage.
- Enhancements to Vertica’s WOS (Write-Optimized Store, the in-memory part of Vertica that first receives updates). I don’t understand WOS architecture well enough to write about that yet.
- Load-balancing, to route queries evenly among Vertica nodes — probably just round-robin — rather than having them just be processed by whichever node happens to receive them.
PAX Analytica? Row- and column-stores begin to come together
Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness. Their case generally runs along the lines:
- Analytic queries commonly return only a fraction of all possible columns.
- Only returning the columns needed
- Saves I/O
- Saves cache space
- Reduces processing
- Facilitates compression
- Presumably all those row-based MPP vendors just went row-based because they had a fine row-based DBMS (usually but not always PostgreSQL) to build on.
Pushbacks to this argument from row-based vendors include:
- Yes, but it’s harder to update a column store
- Yes, but there are more steps to retrieving a bunch of columns than there are to retrieving the same information from row stores
| Categories: Analytic technologies, Columnar database management, Data warehousing, Theory and architecture, VectorWise, Vertica Systems | 4 Comments |
Vertica’s version of MapReduce integration
I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:
- By “integrating Vertica and MapReduce,” Vertica means “integrating Vertica and Hadoop.”
- Vertica’s Hadoop integration is based on Cloudera’s DBInputFormat.
- Omer called out for me several features of Vertica’s Hadoop integration that didn’t just come from Cloudera, namely:
- Cloudera’s DBInputFormat assumes the database runs on a single computer, or a single head node of an MPP system. Vertica’s technology, however, runs on peer parallel nodes with no head, and so Vertica adapted the DBInputFormat technology accordingly.
- Vertica lets you push down Map functions to the database. Omer reports a roughly even division among users and prospects between those who want to do this and ones who don’t.
- Vertica lets you do Reduce functions (or Map functions, if you don’t push them down to the database) on a separate cluster than you run the database software. Vertica asserts that its customers and prospects all want to do this. Right here is the big difference between Vertica’s MapReduce integration and Aster’s or Greenplum’s. (Aster would also say that Vertica’s weaker MapReduce/SQL programming integration is a big difference as well.)
- Indeed, Vertica lets you Reduce into a different DBMS than Vertica, if you choose.
- Vertica gives you flexibility on the size of the Map and Reduce clusters. Omer agreed with me when I said there were some limits on how fast one can add or subtract nodes in a Vertica grid, because there’s data redistribution involved. But one can add/change/delete Hadoop clusters extremely quickly.
Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more
The Boston Globe had an article on VoltDB
The Boston Globe article has more detail than Vertica and VoltDB have ever OKed me to put out, and some business details they’ve never given me.
| Categories: In-memory DBMS, Memory-centric data management, OLTP, Vertica Systems, VoltDB and H-Store | Leave a Comment |
Vertica customer notes
Dave Menninger of Vertica called to discuss NDA product futures, as vendors tend to do in the weeks before a TDWI conference. So we also talked a bit about the Vertica customer base. That’s listed as 86 at the end of Q2, up from 74 in Q1. That’s pretty small growth compared with Q1, which Dave didn’t fully explain. But then, off the top of his head, he was recalling Q1 numbers as being lower than that 74, so maybe there’s a reporting glitch in the loop somewhere.
Vertica’s two biggest customer segments are telecommunications and financial services, and Dave drew an interesting distinction between what the two groups care about. Telecom companies care about data warehouses that are big and 24/7 reliable, but don’t do particularly complex analytics. Financial services — by which he presumably means mainly proprietary traders — are most focused on complex and competitively innovative analytics.
Also mentioned in various contexts were web-based outfits such as data mart outsourcers, social networkers, and open-source software providers.
Vertica also offers customer win stories in other segments, but most actual discussion about what Vertica does revolves around the application areas mentioned above, just as it has been in the past.
Similar (not necessarily identical) generalizations would be true of many other analytic DBMS vendors.
| Categories: Analytic technologies, Application areas, Data warehousing, Investment research and trading, Market share, Telecommunications, Vertica Systems, Web analytics | 17 Comments |
While I’m venting about benchmarks
Late last year, Vertica made hoo-hah about what it called a world-record data warehouse load speed benchmark. I wrote at the time that this showed Vertica wasn’t painfully slow at loading, always a concern with column stores. But otherwise I mocked the idea that there was something useful to be learned from the whole exercise.
Well, guess what? In a throwaway line in a comment on Daniel Abadi’s blog, Barry Zane of ParAccel pointed out
we posted a load rate of almost 9TB/hour, which is, of course record breaking on its own
Quite right.
I hope the nonsense stops there, but I’m not optimistic …
