Analysis of data integration products and technologies, especially ones related to data warehousing, such as ELT (Extract/Transform/Load). Related subjects include:

March 15, 2015

BI for NoSQL — some very early comments

Over the past couple years, there have been various quick comments and vague press releases about “BI for NoSQL”. I’ve had trouble, however, imagining what it could amount to that was particularly interesting, with my confusion boiling down to “Just what are you aggregating over what?” Recently I raised the subject with a few leading NoSQL companies. The result is that my confusion was expanded. :) Here’s the small amount that I have actually figured out.

As I noted in a recent post about data models, many databases — in particular SQL and NoSQL ones — can be viewed as collections of <name, value> pairs.

Consequently, a NoSQL database can often be viewed as a table or a collection of tables, except that:

That’s all straightforward to deal with if you’re willing to write scripts to extract the NoSQL data and transform or aggregate it as needed. But things get tricky when you try to insist on some kind of point-and-click. And by the way, that last comment pertains to BI and ETL (Extract/Transform/Load) alike. Indeed, multiple people I talked with on this subject conflated BI and ETL, and they were probably right to do so.

Read more

March 5, 2015

Cask and CDAP

For starters:


So far as I can tell:

Read more

February 28, 2015

Databricks and Spark update

I chatted last night with Ion Stoica, CEO of my client Databricks, for an update both on his company and Spark. Databricks’ actual business is Databricks Cloud, about which I can say:

I do not expect all of the above to remain true as Databricks Cloud matures.

Ion also said that Databricks is over 50 people, and has moved its office from Berkeley to San Francisco. He also offered some Spark numbers, such as: Read more

February 18, 2015

Greenplum is being open sourced

While I don’t find the Open Data Platform thing very significant, an associated piece of news seems cooler — Pivotal is open sourcing a bunch of software, with Greenplum as the crown jewel. Notes on that start:

Greenplum, let us recall, is a pretty decent MPP (Massively Parallel Processing) analytic RDBMS. Various aspects of it were oversold at various times, and I’ve never heard that they actually licked concurrency. But Greenplum has long had good SQL coverage and petabyte-scale deployments and a columnar option and some in-database analytics and so on; i.e., it’s legit. When somebody asks me about open source analytic RDBMS to consider, I expect Greenplum to consistently be on the short list.

Further, the low-cost alternatives for analytic RDBMS are adding up. Read more

January 19, 2015

Where the innovation is

I hoped to write a reasonable overview of current- to medium-term future IT innovation. Yeah, right. :) But if we abandon any hope that this post could be comprehensive, I can at least say:

1. Back in 2011, I ranted against the term Big Data, but expressed more fondness for the V words — Volume, Velocity, Variety and Variability. That said, when it comes to data management and movement, solutions to the V problems have generally been sketched out.

2. Even so, there’s much room for innovation around data movement and management. I’d start with:

3. As I suggested last year, data transformation is an important area for innovation.  Read more

October 5, 2014

Streaming for Hadoop

The genesis of this post is that:

Of course, we should hardly assume that what the Hadoop distro vendors favor will be the be-all and end-all of streaming. But they are likely to at least be influential players in the area.

In the parts of the problem that Cloudera emphasizes, the main tasks that need to be addressed are: Read more

September 28, 2014

Some stuff on my mind, September 28, 2014

1. I wish I had some good, practical ideas about how to make a political difference around privacy and surveillance. Nothing else we discuss here is remotely as important. I presumably can contribute an opinion piece to, more or less, the technology publication(s) of my choice; that can have a small bit of impact. But I’d love to do better than that. Ideas, anybody?

2. A few thoughts on cloud, colocation, etc.:

3. As for the analytic DBMS industry: Read more

August 14, 2014

“Freeing business analysts from IT”

Many of the companies I talk with boast of freeing business analysts from reliance on IT. This, to put it mildly, is not a unique value proposition. As I wrote in 2012, when I went on a history of analytics posting kick,

  • Most interesting analytic software has been adopted first and foremost at the departmental level.
  • People seem to be forgetting that fact.

In particular, I would argue that the following analytic technologies started and prospered largely through departmental adoption:

  • Fourth-generation languages (the analytically-focused ones, which in fact started out being consumed on a remote/time-sharing basis)
  • Electronic spreadsheets
  • 1990s-era business intelligence
  • Dashboards
  • Fancy-visualization business intelligence
  • Planning/budgeting
  • Predictive analytics
  • Text analytics
  • Rules engines

What brings me back to the topic is conversations I had this week with Paxata and Metanautix. The Paxata story starts:

Metanautix seems to aspire to a more complete full-analytic-stack-without-IT kind of story, but clearly sees the data preparation part as a big part of its value.

If there’s anything new about such stories, it has to be on the transformation side; BI tools have been helping with data extraction since — well, since the dawn of BI. Read more

July 20, 2014

Data integration as a business opportunity

A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”

*If you believe that Splunk is a data integration company, that changes these observations only a little.

On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.

Read more

March 23, 2014

DBMS2 revisited

The name of this blog comes from an August, 2005 column. 8 1/2 years later, that analysis holds up pretty well. Indeed, I’d keep the first two precepts exactly as I proposed back then:

I’d also keep the general sense of the third precept, namely appropriately-capable data integration, but for that one the specifics do need some serious rework.

For starters, let me say: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.