January 8, 2012

Big data terminology and positioning

Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:

given that

But the conflation should stop there.

*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.

When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:

Notes on all this include:

Comments

10 Responses to “Big data terminology and positioning”

  1. OK, So just what is “Big Data”? | Big Data Yum Cha on January 9th, 2012 6:56 am

    […] make things more complex than they need to be. Then in response others (eg Monash Research on Jan 8 here) dialled it back to just two dimensions (volume and […]

  2. Tasso Argyros on January 9th, 2012 11:12 am

    Curt, great piece as always with good math-proof-like concept breakdown.

    One other dimension I wanted to add is Analytics, which we see as key to defining big data properly. This is important in two separate ways:

    a) you need more than SQL to analyze big data. MapReduce is key here. This is something that a lot of “Big Data” vendors don’t really have and they will often tell you that you don’t need (because they can’t provide natively).

    b) If you see the most successful applications of Big Data, it is all about discovery and investigative analytics (http://www.dbms2.com/2011/03/03/investigative-analytics/). It’s about quickly collecting and analyzing different multi-structured data sources (e.g. customer interaction data, mobile data, web data) and discovering where the gold is hidden without breaking the bank with too much process or people. It also means a “fast-fail” approach where if a combination of data + analytics don’t produce what you hypothesize, you quickly move on to explore other opportunities. This is very different from the old-school waterfall BI model where a well defined question (e.g. “how many sales I have by product and region) is converted to Data, ETL and BI reports through a well-defined process.

    I would argue that a system or deployment is not Big Data if it doesn’t have both beyond-SQL capabilities and an investigative/discovery approach on how it performs its Analytics.

    Thanks,

    Tasso Argyros
    co-president, Teradata Aster

  3. Curt Monash on January 9th, 2012 4:11 pm

    Tasso,

    Those are great observations about “dealing with”, “managing”, or “exploiting” Big Data — but what do they have to do with “defining” the term?? :)

    Best,

    CAM

  4. Tasso Argyros on January 9th, 2012 4:55 pm

    Curt, if you believe that Big Data is only about the data, you’re right. If you think that the analytics are critical component of “Big Data” they ought to be in the definition.

    E.g. I can put multi-sturctured data in almost any relational database, no problem. They can all be stored in BLOBs. But if I can’t analyze it properly (say, with MapReduce), is it really Big Data? I think not.

    Another way to look at this is that I was just elaborating on what “fits well” mean in your definition of Big Data. It’s again not about the storage features but the analytical capabilities.

    Best,
    Tasso

  5. Curt Monash on January 9th, 2012 6:38 pm

    Tasso,

    I think analytics are something one does to data. I don’t think they’re a “component” of the data.

    More directly, I’ve noted multiple times in the past that Aster makes a case for getting multi-structured data into a relational database sooner than non-Aster alternatives would seem to suggest, and that the difference is based on analytics. But I don’t think that’s a good enough reason to make the terminological mess even worse than it already is.

  6. Brian Andersen on January 9th, 2012 8:27 pm

    I agree that the problem of poly-structured data is not new at all and I think has little to do with “Big Data”. Its a problem for even the smallest databases. In a relational model you have to design the database schema with your “use cases” in mind and if you decide you want to store some other data well then you have to rework the db.

    May I suggest calling it “Hairy Data” instead?

  7. Notes on the Oracle Big Data Appliance : DBMS 2 : DataBase Management System Services on January 10th, 2012 8:33 pm

    […] is really Oracle’s multi-structured big data appliance. Oracle’s relational big data appliance is Exadata, which has been out for years and has […]

  8. Historical notes on analytics — terminology | Software Memories on January 17th, 2012 3:05 am

    […] Big data (analytics) — I just discussed that mess a week ago. […]

  9. Introduction to Deep Information Sciences and DeepDB | DBMS 2 : DataBase Management System Services on April 21st, 2013 3:07 am

    […] the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share […]

  10. Confluence: Performance Technologies on May 14th, 2014 11:02 am

    Big Data…

      The technology team is investigating the potenti…

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.