Patent nonsense in the data warehouse DBMS market
There are two recent patent lawsuits in the data warehouse DBMS market. In one, Sybase is suing Vertica. In another, an individual named Cary Jardin (techie founder of XPrime, a sort of predecessor company to ParAccel) is suing DATAllegro. Naturally, there’s press coverage of the DATAllegro case, due in part to its surely non-coincidental timing right after the Microsoft acquisition was announced and in part to a vigorous PR campaign around it. And the Sybase case so excited a troll who calls himself Bill Walters that he posted identical references to it on about 12 different threads in this blog, as well as to a variety of Vertica-related articles in the online trade press. But I think it’s very unlikely that any of these cases turn out to much matter. Read more
Categories: Columnar database management, Data warehousing, Database compression, DATAllegro, Sybase, Vertica Systems | 7 Comments |
Compare/constrast of Vertica, ParAccel, and Exasol
I talked with Exasol today – at 5:00 am! — and of course want to blog about it. For clarity, I’d like to start by comparing/contrasting the fundamental data structures at Vertica, ParAccel, and Exasol. And it feels like that should be a separate post. So here goes.
- Exasol, Vertica, and ParAccel all store data in columnar formats.
- Exasol, Vertica, and ParAccel all compress data heavily.
- Exasol and Vertica operate on in-memory data in compressed formats. ParAccel decompresses the data when it gets to RAM. Exasol, Vertica, and ParAccel all — perhaps to varying extents — operate on in-memory data in compressed formats.
- ParAccel and Exasol write data to what amounts to the in-memory part of their basic data structures; the data then gets persisted to disk. Vertica, however, has a separate in-memory data structure to accept data and write it to disk.
- Vertica is a disk-centric system that doesn’t rely on there being a lot of RAM.
- ParAccel can be described that way too; however, in some cases (including on the TPC-H benchmarks), ParAccel recommends loading all your data into RAM for maximum performance.
- Exasol is totally optimized for the assumption that queries will be run against data that had already been previously loaded into RAM.
Beyond the above, I plan to discuss in a separate post how Exasol does MPP shared-nothing software-only columnar data warehouse database management differently than Vertica and ParAccel do shared-nothing software-only columnar data warehouse database management. 🙂
Categories: Columnar database management, Data warehousing, Database compression, Exasol, ParAccel, Vertica Systems | 12 Comments |
EnterpriseDB update
I had lunch today with CTO Bob Zurek of EnterpriseDB, who turns out to live in almost the same town I do (they technically separated in 1783, but share a high school today). DBMS-related highlights included:
- EnterpriseDB thinks PostgreSQL training and certification are a big deal for increasing PostgreSQL adoption.
- EnterpriseDB’s business focus right now (at least, one of them) is moving developers from interest to download to deployment and payment — i.e., the standard funnel for open source and open-source-inspired products.
- EnterpriseDB finds it important to be a good PostgreSQL community citizen. This makes a lot of sense, as EnterpriseDB doesn’t control the core PostgreSQL engine, even if it does employ some of the core PostgreSQL developers.
- But “open source” is not the same as “free”.
- I got the impression that the GridSQL technology EnterpriseDB acquired is being used to go after general read-mostly, horizontally-scaling applications (i.e., MySQL’s sweet spot). I did not get the impression, by way of contrast, that EnterpriseDB is out to play catch-up — e.g., with GreenPlum — in MPP data warehousing.
- Bob pointed out that something like “Vacuum” to clean up the database periodically is needed in a MVCC (MultiVersion Concurrency Control) engine. He thinks PostgreSQL’s autovacuum is good but not ideal.
- Bob draws this as yet another two-dimensional positioning graph, but in essence he thinks PostgreSQL and Postgres Plus are well-suited for a large space that’s above MySQL and below Oracle. I don’t think he really contradicted Kee Kwan’s opinion that there are good times to use PostgreSQL and good times to use MySQL.
- I was wrong when I previously said EnterpriseDB now offers MySQL portability. It just offers MySQL migration.
- The Elastra/EnterpriseDB cloud offering isn’t generally available yet.
- Stay tuned for developments in replication/high availability.
Categories: EnterpriseDB and Postgres Plus, Mid-range, Open source, PostgreSQL | 1 Comment |
Netezza update
In my usual dual role, I called Phil Francisco of Netezza to lay some post-Microsoft/DATAllegro consulting on him late on a Friday night — and then took the opportunity of being on the phone with him to get a general Netezza update. Netezza’s July quarter just ended, so they’re still in quiet period, so I didn’t press him for a lot of numerical detail. More generally, I didn’t find a lot out that wasn’t already covered in my May Netezza update. But notwithstanding all those disclaimers, it was still a pretty interesting chat. Read more
Categories: Data warehouse appliances, Data warehousing, Greenplum, Netezza, Sybase | 3 Comments |
Database compression coming to the fore
I’ve posted extensively about data-warehouse-focused DBMS’ compression, which can be a major part of their value proposition. Most notable, perhaps, is a short paper Mike Stonebraker wrote for this blog — before he and his fellow researchers started their own blog — on column-stores’ advantages in compression over row stores. Compression has long been a big part of the DATAllegro story, while Netezza got into the compression game just recently. Part of Teradata’s pricing disadvantage may stem from weak compression results. And so on.
Well, the general-purpose DBMS vendors are working busily at compression too. Microsoft SQL Server 2008 exploits compression in several ways (basic data storage, replication/log shipping, backup). And Oracle offers compression too, as per this extensive writeup by Don Burleson.
If I had to sum up what we do and don’t know about database compression, I guess I’d start with this:
- Columnar DBMS really do get substantially better compression than row-based database systems. The most likely reasons are:
- More elements of a column fit into a single block, so all compression schemes work better.
- More compression schemes wind up getting used (e.g., delta compression as well the token/dictionary compression that row-based systems use too).
- Data-warehouse-based row stores seem to do better at compression than general-purpose DBMS. The reasons most likely are some combination of:
- They’re trying harder.
- They use larger block sizes.
- Notwithstanding these reasonable-sounding generalities, there’s a lot of variation in compression success among otherwise comparable products.
Compression is one of the most important features a database management system can have, since it creates large savings in storage and sometimes non-trivial gains in performance as well. Hence, it should be a key item in any DBMS purchase decision.
Some Elastra numbers
GigaOm reports that Elastra just raised $12 million, and that it has 40 paying customers, up from 13 around the time of Elastra’s March launch.
Categories: Cloud computing, Elastra | Leave a Comment |
Column stores vs. vertically-partitioned row stores
Daniel Abadi and Sam Madden followed up their post on column stores vs. fully-indexed row stores with one about column stores vs. vertically-partitioned row stores. Once again, the apparently reasonable way to set up the row-store database backfired badly.* Read more
Extensive QlikView coverage from a big fan and reseller
David Raab is a reseller and great fan of QlikTech’s QlikView. His recent lengthy post about the product (I hesitate to call it “detailed” only because he rightly complains that QlikTech is in fact stingy with technical detail) is positive enough to have been recommended by the company itself. Specifically, it was cited in the comment thread to my recent post on QlikTech, where David himself also addressed some of my questions.
But of course, no technology is perfect, not even one as great as David thinks QlikView is. Read more
Daniel Abadi and Sam Madden on column stores vs. indexed row stores
Daniel Abadi and Sam Madden — for whom I have the highest regard after our discussions regarding H-Store — wrote a blog post on Vertica’s behalf, arguing that column stores are far superior to fully-indexed row stores for not-very-selective queries. They link to a SIGMOD paper backing their argument up, provide some diagrams, and generally make a detailed case. As best I understand, here are some highlights: Read more
Categories: Columnar database management, Vertica Systems | 8 Comments |
QlikTech/QlikView update
I talked with Anthony Deighton of memory-centric BI vendor QlikTech for an hour and a half this afternoon. QlikTech is quite the success story, with disclosed 2007 revenue of $80 million, up 80% year over year, and confidential year-to-date 2008 figures that do not disappoint as a follow-on. And a look at the QlikTech’s QlikView product makes it easy to understand how this success might have come about.
Let me start by reviewing QlikTech’s technology, as best I understand it.