December 28, 2010

Evolving definitions and technology categories for 2011

It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. If you’d like to contribute thoughts on these subjects, now might be a really good time.

1.  Not many terms I coin gets marketing traction, but machine-generated data has grown some legs. Clients (Infobright, Cloudera) and non-clients alike have adopted it. I need to follow up with a more official description/definition of the concept. The Wikipedia article on same doesn’t get the job done yet. (Edit: Here’s my take on defining machine-generated data. Be sure to read through to Daniel Abadi’s response.)

2.  Merv Adrian is going to Gartner Group. Expect great improvement in Gartner’s DBMS coverage, in areas beyond the straightforward “This is what users say they are doing” Gartner already excels at. That said, Merv is probably not starting at Gartner soon enough to help make the 2010 analytic DBMS Magic Quadrant any better than the Gartner 2009 data warehouse database management system magic quadrant, the Gartner 2008 data warehouse database management system magic quadrant, and so on.

In particular, Merv has a good understanding of trends and technology on analytic DBMS and related markets. Judging by his Twitter stream, James Kobielus at Forrester if anything overrates the shift to general “analytic platforms.” And I of course am expected to help define the “analytic platform”/”advanced analytics”/whatever category. Taking all those analyst efforts together, it’s reasonable to expect a lot more market awareness — and also market confusion — around these areas.

3.  All that plugs into a larger project I was working on before my family issues came crashing in. The enterprise data warehouse is a myth, and that’s just the first reason that the old EDW vs. data mart bifurcation is grossly inadequate for understanding analytic data management choices. So I’m working on some ideas to categorize types of data warehouse/mart/whatever according to what kind of data you have and how you use that data. Multiple industry players (OK, vendors) have offered interesting and useful feedback in this process, although I’m still waiting for Teradata and IBM. (Edit: My bad. Teradata actually had sent a helpful response some time ago.)

In connection with that effort, the last outline I did back in October of analytic data use styles read:

The data warehouse/mart categories weren’t in exact one-to-one correlation to those use styles, but the connection was of course pretty close.

*I’ve really struggled with terminology in the area of data exploration (over-used already)/discovery analytics (sounds weird)/research analytics (caused confusion when I tried it). Investigative analytics is my latest try.

4.  And finally — like most people, I find the terms unstructured or semi-structured data to be misleading, for at least two reasons:

So I’ve been playing for a couple of years with the thought of introducing the term polystructured data. This is not a finished concept, because there are at least three different things I could mean by it:

It may take a while to find, but I think there’s a pony in there somewhere.

Edit: Here’s the definition of poly-structured database I eventually came up with.


6 Responses to “Evolving definitions and technology categories for 2011”

  1. Rob Klopp on December 29th, 2010 11:36 am

    I would like to suggest a stronger differentiation between the process, workload, and requirements to build a model and the process, workload, and requirements to score data using a model.

    A model can be built manually using BI… i.e. my model to select the “top” customers is based on a sliding scale I created that weighs current recurring revenue and tenure.

    Or a model can be built using very sophisticated algorithms.

    In either case scoring then is the application of a more-or-less complex SQL statement to rate/score the base.

    Vendors want to claim advanced analytic capabilities if they score in-database. The differentiation I suggest would help clarify questions around who can actually execute complex algorithms and who can only score.

  2. M-A-O-L » 2011 Predictions on December 30th, 2010 4:47 am

    [...] Curt Monash of DBMS2 [...]

  3. Curt Monash on December 31st, 2010 3:32 am

    Hi Rob,

    I have several posts up on that point, and definitely plan to have more in the future.

    See e.g.

  4. Steve P on January 2nd, 2011 1:25 pm


    The terms structured and unstructured and the silo’d thinking around them are amongst the biggest culprits for constraining the value derived from information today.

    The purest definition for structured I have found is one where you know and can easily manipulate the schema. If you don’t meet those conditions then it is unstructured and the question is – how much effort does it take to figure out the schema? A lot of what we call “unstructured” can have an easy to discover schema…and some it…nearly impossible.

    Worth noting is that if someone has 3 SAP instances with 3 different schemas and they want to look at data across those…its unstructured because the consolidated schema isn’t known (until a new one is created). Not that different than documents with a relatively easily discoverable schema.

    In any case….this is a great area for you to create frameworks and add value to the industry.


  5. Examples and definition of machine-generated data | DBMS 2 : DataBase Management System Services on March 1st, 2011 2:50 am

    [...] Recently and somewhat belatedly, I added a somewhat obvious point — if we don’t keep all or even most of our machine-generated data, then what we keep is likely to be in some way massaged, extracted, or derived. The purpose of this post is to address a second oversight — giving a hopefully clear definition of what I actually mean by “machine-generated data.”  [...]

  6. Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 9th, 2012 4:18 am

    [...] Merv Adrian is now at Gartner. [...]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.