Analysis of VectorWise and its columnar analytic DBMS efforts. Related subjects include:
Ingres forgot to prebrief me on the VectorWise announcement, and despite valiant efforts hasn’t succeeded in connecting with me since they realized the lapse. Meanwhile, I took a look at the VectorWise press release, and found the quotes to be somewhat amusing.
Column-store proponents are prone to argue, in effect, that the only reason to implement an analytic DBMS with row-based storage is laziness. Their case generally runs along the lines:
- Analytic queries commonly return only a fraction of all possible columns.
- Only returning the columns needed
- Saves I/O
- Saves cache space
- Reduces processing
- Facilitates compression
- Presumably all those row-based MPP vendors just went row-based because they had a fine row-based DBMS (usually but not always PostgreSQL) to build on.
Pushbacks to this argument from row-based vendors include:
- Yes, but it’s harder to update a column store
- Yes, but there are more steps to retrieving a bunch of columns than there are to retrieving the same information from row stores
|Categories: Analytic technologies, Columnar database management, Data warehousing, Theory and architecture, VectorWise, Vertica Systems||11 Comments|
I talked with Omer Trajman of Vertica Monday night about Vertica’s MapReduce integration, part of its Vertica 3.5 release. Highlights included:
- By “integrating Vertica and MapReduce,” Vertica means “integrating Vertica and Hadoop.”
- Vertica’s Hadoop integration is based on Cloudera’s DBInputFormat.
- Omer called out for me several features of Vertica’s Hadoop integration that didn’t just come from Cloudera, namely:
- Cloudera’s DBInputFormat assumes the database runs on a single computer, or a single head node of an MPP system. Vertica’s technology, however, runs on peer parallel nodes with no head, and so Vertica adapted the DBInputFormat technology accordingly.
- Vertica lets you push down Map functions to the database. Omer reports a roughly even division among users and prospects between those who want to do this and ones who don’t.
- Vertica lets you do Reduce functions (or Map functions, if you don’t push them down to the database) on a separate cluster than you run the database software. Vertica asserts that its customers and prospects all want to do this. Right here is the big difference between Vertica’s MapReduce integration and Aster’s or Greenplum’s. (Aster would also say that Vertica’s weaker MapReduce/SQL programming integration is a big difference as well.)
- Indeed, Vertica lets you Reduce into a different DBMS than Vertica, if you choose.
- Vertica gives you flexibility on the size of the Map and Reduce clusters. Omer agreed with me when I said there were some limits on how fast one can add or subtract nodes in a Vertica grid, because there’s data redistribution involved. But one can add/change/delete Hadoop clusters extremely quickly.
Apparently, the use cases for Vertica/Hadoop integration to date lie in algorithmic trading and two kinds of web analytics. Specifically: Read more
|Categories: Analytic technologies, Cloudera, Columnar database management, Data warehousing, Hadoop, Investment research and trading, MapReduce, Parallelization, Theory and architecture, VectorWise, Vertica Systems, Web analytics||5 Comments|
I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn’t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include:
- VectorWise, the product, will be an open-source columnar analytic DBMS. (But that’s not quite true. Pending productization, it’s more accurate to call the VectorWise technology a row/column hybrid.)
- VectorWise is due to be introduced in 2010. (Peter Boncz said that to me more clearly than I’ve seen in other coverage.)
- VectorWise and Ingres have a deal in which Ingres will at least be the exclusive seller of the VectorWise technology, and hopefully will buy the whole company.
- Notwithstanding that it was once named something like “MonetDB,” VectorWise actually is not the same thing as MonetDB, another open source columnar analytic DBMS from the same research group.
- The MonetDB and VectorWise research groups consist in large part of academics in Holland, specifically at CWI (Centrum voor Wiskunde en Informatica). But Ingres has a research group working on the project too. (Right now there are about seven “highly experienced” people each on the VectorWise and Ingres sides, although at least the VectorWise folks aren’t all full-time. More are being added.)
- Ingres and VectorWise haven’t agreed exactly how VectorWise and Ingres Classic will play together in the Ingres product line. (All of the obvious possibilities are still on the table.)
- VectorWise is shared-everything, just as Ingres is. But plans — still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB project.
|Categories: Actian and Ingres, Analytic technologies, Columnar database management, Data warehousing, Database compression, MonetDB, Open source, Theory and architecture, VectorWise||12 Comments|