The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I’ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ.
For starters, Sybase IQ is not just a bitmapped system, but it’s also not all that closely akin to C-Store or Vertica. In particular,
- Sybase IQ stores data in columns – like, for example, Vertica.
- Sybase IQ relies on indexes to retrieve data – unlike, for example, Vertica, in which the column pretty much is the index.
- However, columns themselves can be used as indexes in the usual Vertica-like way.
- Most of Sybase IQ’s indexes are bitmaps, or a lot like bitmaps, ala’ the original IQ product.
- Some of Sybase IQ’s indexes are not at all like bitmaps, but more like B-trees.
- In general, Sybase recommends that you put multiple indexes on each column because — what the heck – each one of them is pretty small. (In particular, the bitmap-like indexes are highly compressible.) Together, indexes tend to take up <10% of Sybase IQ storage space.
Sybase IQ is not immune to Sybase’s confusing choices in version numbering. Thus:
- Sybase IQ Version 15.2 will be announced and released soon.
- Sybase IQ Version 15.1 was a set of “binary replacements” rather than an “upgrade release” for Sybase IQ Version 15.0.
- Sybase IQ Version 15.0 was launched in February, 2009 and released for general availability some time thereafter.
- The prior version of Sybase IQ was 12.7.
- GA isn’t always GA, and some language localizations and so on weren’t ready for a while. Consequently, a lot of Sybase IQ sales continued to be of Version 12.7 even in the second half of 2009.
Now let’s get down to some technical particulars.
Sybase IQ columns are always stored in RowID order. However, RowIDs are logical and not physical, and hence take up little disk space. A small amount of per-page metadata lets you find the specific cell you want. (Cells are commonly fixed-width, in which case finding the cell of choice is a simple calculation.) So RowIDs are not much of an I/O overhead issue, although I’m not sure at what point they get unpacked and start needing to be carried around as the data travels through silicon.
Sybase IQ has 9 or so kinds of indexes. The choice of index has a lot to do with cardinality. In the extreme low-cardinality case, a simple bitmap might do. With intermediate cardinality, you might go to a modified kind of bitmap – e.g., if there there are 2^16 possible values, you can represent a value in 16 bits, and bitmap operations are approximately 16 times as costly as if the number of possible values were only 2^1. For very high cardinality, there’s a B-tree-like index called “High Group”.
Note: Surely every Sybase index name, at some time, made sense to at least one engineer.
Sybase IQ’s execution engine does seem to rely quite a bit on bitmaps. E.g., intermediate query results are stored as bitmaps, which helps them play nicely with each other and with many of the indexes. Sybase claims that Sybase IQ’s bitmap orientation often makes WHERE clauses execute very quickly. Sybase IQ reoptimizes queries after WHERE clauses are evaluated. Complex expressions are, when possible, evaluated once per unique value, not once per row.
Speaking of unique values – Sybase IQ’s compression story doesn’t currently match that of some other columnar products, but it seems to stack up pretty well against row-based systems. In particular:
- Sybase says IQ compression is most commonly 50-70%.
- Sybase further says that, in most cases, compression falls into the range 40-85%.
- Page-level LZ compression is decompressed upon read (duh).
- Dictionary/token compression may be decompressed later. For example, GROUP BYs are commonly done on tokens, and JOINs sometimes are.
Sybase IQ boasts pipelining, in two senses. First, IQ tries to read pages for multiple queries at the same time. Second, Sybase IQ tries to prefetch pages into cache before they’re needed. Sybase points out that these prefetched pages have the WHERE clauses already executed, and that no extra baggage is being dragged into cache that doesn’t need to be there.
Highlights of Sybase IQ’s update and load story include:
- Sybase IQ is optimized for large bulk loads. No surprise there.
- Sybase IQ has several options for microbatching and/or trickle feeds.
- The coolest is Sybase RAP.
- More generally, microbatching is based on Change Data Capture. Sybase has various ETL/replication technologies, creating a confusing array of options in that regard.
- Sybase says that one customer is microbatching 1000s of rows with 1 minute latency.
- There’s something about snapshotting and hence loads not interfering with queries. I’m not clear on the details.
- Assuming you have enough parallelism, you can dedicate some nodes to queries while others are dedicated to load. (Recall that Sybase IQ is shared-disk.)
I’ve lost track a little bit as to which “advanced analytics” functionality is in Sybase IQ 15.1, which will be in 15.2, and what’s a future beyond that, which is a great excuse for me to leave it out of what has already become a rather long post. But anyhow, except perhaps for the future stuff and/or some time series functionality, none of it seems terribly advanced. Sybase IQ does have two stored procedure languages, namely the ones for Sybase ASE (T-SQL) and for Sybase Anywhere or Adaptive Server Anywhere or whatever it’s called this week (Watcom SQL, which Sybase asserts is similar to the ANSI SQL stored procedure language).
Similarly, I’ll leave a lot of other stuff out as well, and for now stop here.
- I haven’t repeated every detail here from my August, 2009 technical post about Sybase IQ
- And here’s more about Sybase IQ, including some Sybase IQ 15.2 features, some market penetration info, and a slide deck