January 25, 2016

Kafka and more

In a companion introduction to Kafka post, I observed that Kafka at its core is remarkably simple. Confluent offers a marchitecture diagram that illustrates what else is on offer, about which I’ll note:

Kafka offers little in the way of analytic data transformation and the like. Hence, it’s commonly used with companion products.  Read more

January 25, 2016

Kafka and Confluent

For starters:

At its core Kafka is very simple:

So it seems fair to say:

Read more

January 22, 2016

Cloudera in the cloud(s)

Cloudera released Version 2 of Cloudera Director, which is a companion product to Cloudera Manager focused specifically on the cloud. This led to a discussion about — you guessed it! — Cloudera and the cloud.

Making Cloudera run in the cloud has three major aspects:

Features new in this week’s release of Cloudera Director include:

I.e., we’re talking about some pretty basic/checklist kinds of things. Cloudera Director is evidently working for Amazon AWS and Google GCP, and planned for Windows Azure, VMware and OpenStack.

As for porting, let me start by noting: Read more

January 14, 2016

BI and quasi-DBMS

I’m on two overlapping posting kicks, namely “lessons from the past” and “stuff I keep saying so might as well also write down”. My recent piece on Oracle as the new IBM is an example of both themes. In this post, another example, I’d like to memorialize some points I keep making about business intelligence and other analytics. In particular:

Similarly, BI has often been tied to data integration/ETL (Extract/Transform/Load) functionality.* But I won’t address that subject further at this time.

*In the Hadoop/Spark era, that’s even truer of other analytics than it is of BI.

My top historical examples include:

Read more

November 19, 2015

The questionably named Cloudera Navigator Optimizer

I only have mixed success at getting my clients to reach out to me for messaging advice when they’re introducing something new. Cloudera Navigator Optimizer, which is being announced along with Cloudera 5.5, is one of my failures in that respect; I heard about it for the first time Tuesday afternoon. I hate the name. I hate some of the slides I saw. But I do like one part of the messaging, namely the statement that this is about “refactoring” queries.

All messaging quibbles aside, I think the Cloudera Navigator Optimizer story is actually pretty interesting, and perhaps not just to users of SQL-on-Hadoop technologies such as Hive (which I guess I’d put in that category for simplicity) or Impala. As I understand Cloudera Navigator Optimizer:

Read more

September 10, 2015

MongoDB update

One pleasure in talking with my clients at MongoDB is that few things are NDA. So let’s start with some numbers:

Also >530 staff, and I think that number is a little out of date.

MongoDB lacks many capabilities RDBMS users take for granted. MongoDB 3.2, which I gather is slated for early November, narrows that gap, but only by a little. Features include:

There’s also a closed-source database introspection tool coming, currently codenamed MongoDB Scout.  Read more

August 3, 2015

Data messes

A lot of what I hear and talk about boils down to “data is a mess”. Below is a very partial list of examples.

To a first approximation, one would expect operational data to be rather clean. After all, it drives and/or records business transactions. So if something goes awry, the result can be lost money, disappointed customers, or worse, and those are outcomes to be strenuously avoided. Up to a point, that’s indeed true, at least at businesses large enough to be properly automated. (Unlike, for example — :) — mine.)

Even so, operational data has some canonical problems. First, it could be inaccurate; somebody can just misspell or otherwise botch an entry. Further, there are multiple ways data can be unreachable, typically because it’s:

Inconsistency can take multiple forms, including:  Read more

July 7, 2015

Zoomdata and the Vs

Let’s start with some terminology biases:

So when my clients at Zoomdata told me that they’re in the business of providing “the fastest visual analytics for big data”, I understood their choice, but rolled my eyes anyway. And then I immediately started to check how their strategy actually plays against the “big data” Vs.

It turns out that:

*The HDFS/S3 aspect seems to be a major part of Zoomdata’s current story.

Core aspects of Zoomdata’s technical strategy include:  Read more

June 10, 2015

Hadoop generalities

Occasionally I talk with an astute reporter — there are still a few left :) — and get led toward angles I hadn’t considered before, or at least hadn’t written up. A blog post may then ensue. This is one such post.

There is a group of questions going around that includes:

To a first approximation, my responses are:  Read more

June 8, 2015

Teradata will support Presto

At the highest level:

Now let’s make that all a little more precise.

Regarding Presto (and I got most of this from Teradata)::

Daniel Abadi said that Presto satisfies what he sees as some core architectural requirements for a modern parallel analytic RDBMS project:  Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.