August 4, 2009

VectorWise, Ingres, and MonetDB

I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn’t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include:

The MonetDB project is led by Martin Kersten, with whom I chatted at SIGMOD in June (standing up and not taking notes, so I may have some details wrong). I get the impression, based on that conversation, my VectorWise call, and other data:

I further get the impression that VectorWise was actually Marcin Zukowksi’s Master’s Ph.D project, with Peter Boncz being his advisor. VectorWise also boasts another Peter Boncz student, who wrote about updating column stores.

As one might expect from the name, VectorWise does vector processing. I.e., the hard part of Marcin’s work was developing vectorized algorithms for one SQL operation after another. Vectorization, pipelining, and FPGAs might all seem to go together — XtremeData certainly seems to think so — but the VectorWise folks preferred to develop for Intel CPUs anyway, for pretty much the usual reasons. Another major theme is trying to get the right things into CPU cache, because in their opinion RAM cache is just sooooo painfully slow.

Our discussion of VectorWise’s compression was interesting. Highlights included:

Other notes include:

Comments

12 Responses to “VectorWise, Ingres, and MonetDB”

  1. Vertica’s version of MapReduce integration | DBMS2 -- DataBase Management System Services on August 4th, 2009 6:29 am

    […] VectorWise guys also told me they are looking forward to seeing how the two projects work together. […]

  2. Daniel Lemire on August 4th, 2009 8:54 am

    the advantages of operating on compressed data are only significant if the database stores columns in multiple sort orders each.

    If your table has few dimensions, this makes no sense. But for high dimensional tables, it rings true. Indeed, columnar compression often comes through run-length encoding (RLE), after sorting (lexicographically). Yet, only the first few columns (in sorting order) will end up compressible by RLE after sorting them.

    See for example:

    Daniel Lemire, Owen Kaser, Kamel Aouiche, Sorting improves word-aligned bitmap indexes. Data & Knowledge Engineering (to appear).
    http://arxiv.org/abs/0901.3751
    http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them

    This suggests that they are not relying much on RLE. It might be that vector processing does not work well in conjunction with RLE?

  3. Marcin Zukowski on August 4th, 2009 11:31 am

    Hi Curt,

    Thank you for a nice writeup on VectorWise. While generally correct, here are some clarifications:

    – the VectorWise technology belongs fully to our company (no academic institution, including CWI, can control it)

    – the MonetDB open-source system originated from the PhD research of Peter Boncz under supervision of Martin Kersten, while the VectorWise database engine is a technology generation later and came out of my own PhD (not MSc) research, supervised in turn by Peter Boncz. Other CWI group members also have significant contributions to both projects.

    – we do hope to make VectorWise technology available as early as possible, and 2010 is very possible, but please do not treat it as an official plan

    – as for the string compression, we use something called PDICT, which is a new – outlier resistant – form of dictionary encoding.

    – like you wrote, the main thing about the compression methods in VectorWise is that they are much faster than existing methods. As for the performance, we take a few “CPU cycles” (not “steps”) for one element. Links to publications with more technical info can be found on: http://www.vectorwise.com/index_js.php?page=company_origins

    – the place to visit for more info on the Ingres VectorWise project is http://www.ingres.com/vectorwise

    Best regards,
    Marcin Zukowski

  4. Curt Monash on August 4th, 2009 11:58 am

    Thanks, Marcin!

    I edited in two corrections (Ph.D, CPU cycles).

    Best,

    CAM

  5. Marcin Zukowski on August 4th, 2009 3:36 pm

    @Daniel

    One thing to note is that the opinion of working on compressed data sets is mostly useful for the major ordering columns only refers to the RLE compression. Like you write, in cases with large domain cardinality RLE won’t do much for non-sorted data.

    Still, other forms of compression can be used and data compressed with those can be analyzed without decompressing, see e.g. http://scholar.google.com/scholar?q=%22The+Implementation+and+Performance+of+Compressed+Databases.%22

    m.

  6. Edward on August 4th, 2009 10:03 pm

    There’s a 2008 talk by Peter Boncz about MonetDB/X100 project that illustrates principles that seem to be used by VectorWise’s DBMS:

    http://www.youtube.com/watch?v=yrLd-3lnZ58

    Cool stuff,
    E.

  7. Do hash tables work in constant time? on August 18th, 2009 10:15 am

    […] Am I being pedantic? Does the time required to multiply integers on modern machine depend on the size of the integers? It certainly does if you are using vectorization. And vectorization is used in commercial databases! […]

  8. HadoopDB | DBMS2 -- DataBase Management System Services on September 19th, 2009 8:05 pm

    […] where X=2. Column-store guru Abadi has repeatedly signaled his intention to try out HadoopDB with VectorWise at the nodes instead. (Recall that VectorWise is shared-everything.) It will be interesting to see […]

  9. Martin Kersten on issues in scientific data management | DBMS2 -- DataBase Management System Services on October 3rd, 2009 6:33 am

    […] Martin Kersten emailed a response to my post on issues in scientific data management. With his permission, I’ve lightly edited it, and am posting it below. […]

  10. Ingres VectorWise technical highlights | DBMS2 -- DataBase Management System Services on June 11th, 2010 7:28 am

    […] caught up with me for a regrettably brief call. Peter gave me the strong impression that what I’d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights […]

  11. Actian Vector Hadoop Edition | DBMS 2 : DataBase Management System Services on September 30th, 2014 1:50 am

    […] Peter Boncz isn’t exactly an Actian employee. Rather, he’s the professor who supervised Marcin Zukowski’s PhD thesis that became Vectorwise, and I chatted with Peter by Skype while he was at home in Amsterdam. I believe his assurances that […]

  12. Snowflake Computing | DBMS 2 : DataBase Management System Services on October 22nd, 2014 4:45 am

    […] 2 techie founders out of Oracle, plus Marcin Zukowski. […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.