June 16, 2017

Generally available Kudu

I talked with Cloudera about Kudu in early May. Besides giving me a lot of information about Kudu, Cloudera also helped confirm some trends I’m seeing elsewhere, including:

Now let’s talk about Kudu itself. As I discussed at length in September 2015, Kudu is:

Kudu’s adoption and roll-out story starts:

Early Kudu interest is focused on 2-3 kinds of use case. The biggest is the kind of “data warehousing” highlighted above. Cloudera characterizes the others by the kinds of data stored, specifically the overlapping categories of time series — including financial trading — and machine-generated data. A lot of early Kudu use is with Spark, even ahead of (or in conjunction with) Impala. A small amount has no relational front-end at all.

Other notes on Kudu include:

And finally, the Cloudera folks woke me up to some issues around streaming data ingest. If you stream data in, there will be retries resulting in duplicate delivery. So your system needs to deal with those one way or another. Kudu’s way is:

Comments

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.