This is part of a three-post series on Kudu, a new data storage system from Cloudera.
- Part 1 (this post) is an overview of Kudu technology.
- Part 2 is a lengthy dive into how Kudu writes and reads data.
- Part 3 is a brief speculation as to Kudu’s eventual market significance.
Cloudera is introducing a new open source project, Kudu,* which from Cloudera’s standpoint is meant to eventually become the single best underpinning for analytics on the Hadoop stack. I’ve spent multiple hours discussing Kudu with Cloudera, mainly with Todd Lipcon. Any errors are of course entirely mine.
*Like the impala, the kudu is a kind of antelope. I knew that, because I enjoy word games. What I didn’t know — and which is germane to the naming choice — is that the kudu has stripes.
- Kudu is an alternative to HDFS (Hadoop Distributed File System), or to HBase.
- Kudu is meant to be the underpinning for Impala, Spark and other analytic frameworks or engines.
- Kudu is not meant for OLTP (OnLine Transaction Processing), at least in any foreseeable release. For example:
- Kudu doesn’t support multi-row transactions.
- There are no active efforts to front-end Kudu with an engine that is fast at single-row queries.
- Kudu is rather columnar, except for transitory in-memory stores.
- Kudu’s core design points are that it should:
- Accept data very quickly.
- Immediately make that data available for analytics.
- More specifically, Kudu is meant to accept, along with slower forms of input:
- Lots of fast random writes, e.g. of web interactions.
- Streams, viewed as a succession of inserts.
- Updates and inserts alike.
- The core “real-time” use cases for which Kudu is designed are, unsurprisingly:
- Low-latency business intelligence.
- Predictive model scoring.
- Kudu is designed to work fine with spinning disk, and indeed has been tested to date mainly on disk-only nodes. Even so, Kudu’s architecture is optimized for the assumption that there will be at least some flash on the node.
- Kudu is designed primarily to support relational/SQL processing. However, Kudu also has a nested-data roadmap, which of course starts with supporting the analogous capabilities in Impala.
Also, it might help clarify Kudu’s status and positioning if I add:
- Kudu is in its early days — heading out to open source and beta now, with maturity still quite a way off. Many obviously important features haven’t been added yet.
- Kudu is expected to be run with a replication factor (tunable, usually =3). Replication is via the Raft protocol.
- Kudu and HDFS can run on the same nodes. If they do, they are almost entirely separate from each other, with the main exception being some primitive workload management to help them share resources.
- Permanent advantages of older alternatives over Kudu are expected to include:
- Legacy. Older, tuned systems may work better over some HDFS formats than over Kudu.
- Pure batch updates. Preparing data for immediate access has overhead.
- Ultra-high update volumes. Kudu doesn’t have a roadmap to completely catch up in write speeds with NoSQL or in-memory SQL DBMS.
Kudu’s data organization story starts:
- Storage is right on the server (this is of course also the usual case for HDFS).
- On any one server, Kudu data is broken up into a number of “tablets”, typically 10-100 tablets per node.
- Inserts arrive into something called a MemRowSet and are soon flushed to something called a DiskRowSet. Much as in Vertica:
- MemRowSets are managed by an in-memory row store.
- DiskRowSets are managed by a persistent column store.*
- In essence, queries are internally federated between the in-memory and persistent stores.
- Each DiskRowSet contains a separate file for each column in the table.
- DiskRowSets are tunable in size. 32 MB currently seems like the optimal figure.
- Page size default is 256K, but can be dropped as low as 4K.
- DiskRowSets feature columnar compression, with a variety of standard techniques.
- All compression choices are specific to a particular DiskRowSet.
- So, in the case of dictionary/token compression, is the dictionary.
- Thus, data is decompressed before being operated on by a query processor.
- Also, selected columns or an entire DiskRowSet can be block-compressed.
- Tables and DiskRowSets do not expose any kind of RowID. Rather, tables have primary keys in the usual RDBMS way.
- Kudu can partition data in the three usual ways: randomly, by range or by hash.
- Kudu does not (yet) have a slick and well-tested way to broadcast-replicated a small table across all nodes.
*I presume there are a few ways in which Kudu’s efficiency or overhead seem more row-store-like than columnar. Still, Kudu seems to meet the basic requirements to be called a columnar system.