May 27, 2013

IBM BLU

I had a good chat with IBM about IBM BLU, aka BLU Accelerator or Acceleration. BLU basics start:

And yes — that means Oracle is now the only major relational DBMS vendor left without a true columnar story.

BLU’s maturity and scalability basics start:

BLU technical highlights include:

Like any other columnar RDBMS or RDBMS option, BLU has a compression story:

IBM said all the compression algorithms were order-preserving, and hence range predicates can be executed on compressed data. Unfortunately, I neglected to ask how “approximate Huffman coding” could preserve order.

As with any columnar system, one has to wonder about how BLU writes data. IBM observed:

And so IBM doesn’t think load latency is much of a problem for BLU.

But that’s about all I figured out about how BLU writes data. So IBM kindly followed up with a lengthy email, plus permission to copy it — lightly edited — below:

Getting new data into/out of the database (in memory and on disk) is one of BLU’s strengths. The methods we support include:

We support LOAD (replace or append), SQL INSERT, UPDATE and DELETE,  as well as three SQL based utilities INGEST, IMPORT and EXPORT (not to mention BACKUP and RESTORE, which are also ways of getting data in and out).

As in previous versions of DB2, the SQL-based operations (INSERT, UPDATE, DELETE, INGEST, IMPORT) are in-memory first and hardened to disk asynchronously. LOAD operations are bulk operations to disk (though they tend to be CPU bound due to the cost of parsing and formatting data during load). As you suspected, the UPDATE operations are not in place.

Because the syntax and semantics of getting data into the database remain unchanged (an ongoing theme for us), it means that ELT tools like IBM Data Stage and partner products work without modification.

Distinguishing characteristics:

  • Unlike other columnar vendors that have tried to achieve performance when inserting new data using delta areas of the table — doing fast insert into a delta area (usually a delta tree or a row-based staging table) and then asynchronously moving the row data or delta data into the columnar area — we consider that a weak strategy since it means the data has to be loaded twice (once into the staging area and then again into the mainline table). Our approach is to add data directly against the main table and use bulk transformation (i.e. thousands of records at a time) to amortize any latency that would normally come from columnar processing. We believe that by bulk processing the data we largely eliminate the overhead inherent in columnar processing, and also entirely avoid the dual processing that other companies are suffering from.
  • We’ve invented a new columnar-specific logging method for BLU Acceleration. Externally it looks exactly like DB2’s traditionally log based transaction recovery (for crash and rollforward recovery), but the format of the bytes within the log is organized in columnar fashion, so we log buffers of column data rather than logging rows. This, in combination with XOR logging (only logging the XOR delta bits) results in great log space reduction. As a result the recovery logging is actually much smaller than row based logging in many cases. Heck, even we were happily surprised by it.  Again, like our storage story where BLU Accelerated tables can co-exist in the same storage and buffer pool as traditional row tables, the BLU Accelerated data is logged to the same log files as row table operations. The integration is completely seamless.

Finally, last and probably also least:

Comments

20 Responses to “IBM BLU”

  1. M-A-O-L » IBM BLU on May 27th, 2013 5:02 am

    [...] Watch out for DB2′s new columnar option. IBM BLU [...]

  2. Mark Callaghan on May 27th, 2013 10:13 am

    Is Vertica is effective at smaller insert batches when you are willing to pay the price of writing twice — once to the row store and once to the column store?

    What is the overhead for updates? If they are not in place what is done to remember that the old version has been replaced? I assume this imposes a cost on queries.

  3. Anton on May 27th, 2013 12:45 pm

    It took IBM quite a while to come up with a SIMD based processing, considering that VectorWise used it as far ago as 2003.
    Probably in a few years we might see GPU-based processing in IBM database.
    Would be interesting to see more details about compression used in BLU and also about operating on compressed data.

  4. Curt Monash on May 27th, 2013 6:12 pm

    Mark,

    Usually, the main cost for append-only (as opposed to update-in-place) is a background compaction process. Other advantages of append-only can be snapshot isolation or time travel. cf. MVCC

    I screwed up by not questioning IBM about all this in detail. No real excuse; most of the factors I complain about in http://www.dbms2.com/2012/10/06/analyzing-big-companies-is-hard/ weren’t really present in this case.

  5. Keshav on May 28th, 2013 3:32 pm

    Curt,

    The paper Constant-Time Query processing by Raman et al (http://bit.ly/16nc6cC) has details on how the order is preserved and how the equality and range predicates are done on compressed data in Blink engine. IBM Smart analytics Optimizer (ISAO) and Informix warehouse accelerator used this technique (pre-cursors to DB2 BLU). The data is partitioned via frequency partitioning technique and encoded with Huffman encoding. In each cell, each column uses fixed length codes. During encoding, the bit-codes are assigned in same order as the original values — this preserves the order.

    I’m sure detailed DB2 BLU papers will be forth coming. Previous work can be found in papers at http://www.informatik.uni-trier.de/~ley/pers/hd/l/Lohman:Guy_M=.html and http://www.informatik.uni-trier.de/~ley/pers/hd/r/Raman:Vijayshankar.html

  6. Curt Monash on May 28th, 2013 4:04 pm

    Thanks, Keshav — I’ll take a look!

  7. aaron on May 28th, 2013 5:20 pm

    Can you clarify the last remark? col db insert or updates are logical shifts of bit vectors (though the vendors optimize by breaking the vectors into chunks, usually in lists or tree structure, and pad the chunks to allow some changes).

    This means that col db inserts (or updates – which are handled generally as delete+insert) asymptotically approach the cost of touching the entire column data structures very quickly for even small batches. So multiple singleton or small batches are slow in most col dbs, and many vendors try to hide the pain by staging the data to a front-end DB (usually in-memory or row store) then merging periodically. Is this what you mean by append-only?

  8. Curt Monash on May 29th, 2013 6:41 am

    I’m simply drawing the distinction between append-only and update-in-place.

    Of course, for many analytic DBMS use cases, there aren’t a lot of updates anyway, just inserts. For those, the distinction is somewhat academic. Still, in practice there’s a correlation between append-only and other characteristics that are still interesting in those use cases. (Think MVCC.)

  9. aaron on May 29th, 2013 10:37 am

    I’m clearly missing the point here. There are 2 common strategies for MVCC: marking changes in-place (e.g., DB2 analytic accelerator) and keeping a change vector (e.g., Oracle).

    The first is awkward in a col DB because you need to add lots of detail about when the change occurred, so the latter seems the cheaper approach for col DB (keep new stuff to the side and merge in periodically in bigger batches.)

    I’m not clear on what append means in a col DB, since the data has to be kept in row order to match fields in that row together. I can’t see how append is distinct from insert (or for that matter delete) in small changes to col DB.

    My guess is that append was used in presentation-speak to mean keeping changes on the side like other col db vendors.

  10. Curt Monash on May 29th, 2013 11:19 am

    The way I use “append-only” is that you just keep adding records. If there’s an update, it changes some sort of flag, but the old version of the record isn’t overwritten until some kind of compaction process comes into play.

    A large fraction of newer database managers are append-only, analytic and short-request alike. MongoDB is a notable exception.

  11. aaron on May 29th, 2013 12:45 pm

    Thanks – that was the source of the confusion. I can’t reconcile column dbms and append-only, because of the cognitive dissonance. Append only seems a row store (or record store) construct, since col stores need ordering.

    Are there any append-only column stores you know of?

  12. Sandeep on May 30th, 2013 9:19 pm

    > ”BLU” doesn’t stand for anything, except insofar as it’s a reference to IBM’s favorite color. It’s just a code name that stuck.

    IIRC, it stands for “BLink Ultra”, an improved version of BLink, yet another internal codename.

  13. aaron on May 31st, 2013 11:21 am
  14. IBM speeds up DB2 10.5, remolds it as a Hadoop killer on June 16th, 2013 6:24 pm

    [...] researcher Curt Monash, of Monash Research, has noted that with IBM’s DB2 10.5 release, Oracle now is “now a usually vital relational DBMS [...]

  15. IBM re-moulds DB2 10.5 as a Hadoop killer | Information & Technology News, Analysis, Insights, Middle East on June 17th, 2013 1:01 am

    [...] analyst Curt Monash, of Monash Research, has noted that with IBM’s DB2 10.5 release, Oracle now is “now the only major relational DBMS [...]

  16. Kenton Andersen on June 17th, 2013 11:57 am

    RE aaron on May 29th, 2013 12:45 pm

    Vertica is an append-only column store with respect to disk files. Inserts or updates can be directed to disk or default to filling memory first. Tuples in memory are periodically flushed to new disk files in column-oriented structures. These are not “on the side”; they are the same format as all other disk files.

    Re “col stores need ordering”, this is implementation dependent. The relational model does not assume ordering of tuples. In Vertica’s case, each disk file has its own ordering; there is no total ordering of a table.

    To keep the number of files from becoming too large, Vertica has a background “mergeout” process that creates new, larger files. Then the older files are deleted. So there is a bit of a double-write penalty, but it does not delay the insert processes.

  17. Thoughts on in-memory columnar add-ons | DBMS 2 : DataBase Management System Services on September 23rd, 2013 9:25 am

    [...] the devil is in the details. See, for example, my May post on IBM’s version, called BLU, outlining all the engineering IBM did around that [...]

  18. Joel Wittenmyer on October 28th, 2013 1:37 pm

    Curt,
    You stated that “…Oracle is now the only major relational DBMS vendor left without a true columnar story.” What functionality does SQL Server offer, even now, that one might classify as a ‘true columnar story’?

  19. Curt Monash on October 29th, 2013 12:26 am

    At the time I wrote that, SQL Server’s columnar-option future had already been disclosed — I think in more detail than Oracle’s has been even now.

  20. Joel Wittenmyer on October 29th, 2013 9:40 am

    Thanks, Curt. I know it has been a while since you wrote this, and you’d probably rather attend to the newer posts :), so I appreciate that you did respond.

    I’m still struggling to find a SQL Server functionality that one might term ‘true columnar’. The best I can do is their column store indexes. I want to be sure to include SQL Server in the POCs we are doing for columnar technology, if they belong there, but I can’t find a reason to include them at this point.

    Thanks for doing DBMS2. I’ve been following for about 5 years.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.