March 24, 2013

Essential features of exploration/discovery BI

If I had my way, the business intelligence part of investigative analytics — i.e. , the class of business intelligence tools exemplified by QlikView and Tableau — would continue to be called “data exploration”. Exploration what’s actually going on, and it also carries connotations of the “fun” that users report having with the products. By way of contrast, I don’t know what “data discovery” means; the problem these tools solve is that the data has been insufficiently explored, not that it hasn’t been discovered at all. Still “data discovery” seems to be the term that’s winning.

Confusingly, the Teradata Aster library of functions is now called “Discovery” as well, although thankfully without the “data” modifier. Further marketing uses of the term “discovery” will surely follow.

Enough terminology. What sets exploration/discovery business intelligence tools apart? I think these products have two essential kinds of feature:

Here’s what I mean.

*I’d wanted to call this re-presentation. But that would have been … pun-ishing. 🙂

The canonical form of query modification is:

That capability is much more useful in systems that allow you to change how the data is visualized, both:

Other forms of query modification, such as faceted drill-down or parameterization, don’t depend as heavily on flexible revisualization. Perhaps not coincidentally, they’ve been around longer in some form or other than have the QlikView/Tableau/Spotfire kinds of interfaces. But at today’s leading edge, query modification and query result revisualization are joined at the hip.

What else is important for these tools?

Please note that speed is a necessary condition for exploratory BI, not a sufficient one; a limited UI that responds really fast is still a limited UI.

As for how the speed is achieved — three consistent themes are columnar storage, compression, and RAM. Beyond that, the details vary significantly from product to product, and I won’t try to generalize at this time.

Related links

Comments

8 Responses to “Essential features of exploration/discovery BI”

  1. Adam Ferrari on March 25th, 2013 9:40 am

    Hey Curt – very much agree with your list of essential characteristics of data discovery/exploration, but one that I would add is “lightweight data modeling.” Discovery tools tend to work by loading the data first and then elaborating the “model” (if it can be called that) incrementally. Traditional BI tools tend to work by creating the (dimensional) model first, and then populating the data and presentation content off of it. Discovery is “data first” versus traditional BI’s “model first.” Wayne Eckerson had a nice discussion of this idea in his blog last week: http://www.b-eye-network.com/blogs/eckerson/archives/2013/03/a_guide_for_bi.php . He calls the distinction “bottom up” versus “top down” BI.

  2. Curt Monash on March 26th, 2013 2:10 am

    Thanks, Adam; Wayne’s distinction has some merit. But I would dispute your assertion that exploratory BI tools are tied to lazy data modeling. Platfora assumes star schemas. QlikView assumes star/snowflake. And anything that isn’t true of those tools is not, in my opinion, true of “exploratory BI”.

  3. Platfora at the time of first GA | DBMS 2 : DataBase Management System Services on March 26th, 2013 6:50 am

    […] exploratory business intelligence against Hadoop-based data. As per last weekend’s post about exploratory BI, a key requirement is speed; and so far as I can tell, any technological innovation Platfora offers […]

  4. Adam Ferrari on March 26th, 2013 2:44 pm

    I think maybe my original comment gave the impression that by “lazy modeling” I meant “not star / snowflake schemas” (or maybe my association with Endeca creates that impression). That was actually not my intention at all. I really meant my comment to be independent of the structural expectations of the modeling environment, and more about how modeling fits into the overall application development lifecycle encouraged by the tools. Traditional BI tools tend to be oriented towards models that are built up front, and that beyond the table and join structure also contain lots of valuable metadata such as dimension hierarchy relationships, time and ordering information, etc. This is actually a good thing – it makes these models naturally suited to serving as long-lived knowledge repositories over high-value enterprise data. But in data discovery use cases, there’s more of an orientation towards getting the data in first with little or no up front metadata configuration (even in the case of tools where the model is star/snowflake-style) and quickly moving on to creating dashboard views and interacting with the data. This closely relates to why the data exploration capabilities of discovery tools are so important – on new and unfamiliar data quickly dropped into the environment, it’s invaluable to have power tools for filtering and viewing the data such as the chart lassoing you describe.

    So, no disagreement that any feature not descriptive of broad swaths of the data discovery landscape doesn’t belong on the “essential features” list. But I stand by my argument that lightweight modeling should be on the list, and seems to be broadly represented by major tools claiming to enable “data discovery” or “data exploration,” even the ones based on traditional table plus joins data structures.

  5. Jay Jakosky on March 26th, 2013 6:24 pm

    I completely agree with Adam. You have to look at the activity surrounding the product to appreciate how “lightweight data modeling” is a major contributor to success. This directly impacts the speed of deployment and faster ROI. But it’s even more powerful that individuals can act independently of IT, with a gentler learning curve, and rapidly innovate a proof of concept. I have many customers with non-developers doing it all because of this.

    There is an important third feature that separates data discovery. I call it negative space. Tableau, Spotfire and QlikView all have in-memory engines. This means the data is no longer separate from the display. So you see what didn’t happen as easily as what did. There is no need to modify the query.

    I think this dovetails with a possible 4th feature. This has been a part of QlikView for several years and revolutionized how we build solutions. Tableau just released it in version 8 as well. Spotfire has something similar. It’s the idea of simultaneous “sets”. You could think of these as parallel queries, but it wasn’t realistic to implement this without in-memory engines that sit so closely to the display. Is this necessary to be discovery? Maybe not, but it definitely separates the best from the rest.

  6. Jay Jakosky on March 26th, 2013 6:35 pm

    And what I’m saying in my first paragraph is that THIS is real discovery. It’s like the story of the man searching for keys under the streetlamp because that’s where the light is. Unless you get to point your light where you please, who cares? As standalone products, QlikView, Spotfire and Tableau support this. And their product roadmaps support it to varying degrees from the server as well.

  7. Curt Monash on March 26th, 2013 10:06 pm

    I posted about the BI/visualization tools themselves, and you guys have been making good points about the query and ETL engines. This raises the question as to whether those need to be tightly coupled to the BI, or whether a separation can be drawn.

    Platfora’s case — see pingback above — illustrates why the answer might seem to be “yes”. It doesn’t seem realistic to have there be no concept of “building a data mart”. A more attainable goal would be to make “building a data mart” be easy/self-service, after whatever initial and general assist IT gives to get things to that state. That, in turn, raises the question as to whether IT’s preparation is the delay/bureaucracy we were trying to avoid in the first place.

    Platfora would argue that it isn’t, really — if you have tight coupling. And the same goes for the other exploratory BI vendors named above. Analytic RDBMS vendors who push a version of “data mart spin-out” might argue you don’t need the tight coupling. I think that for the intermediate-long term, the jury is still out.

    But in the short term, the BI vendors do seem to influence how easy it is to set up a useful data mart.

  8. Jay Jakosky on March 27th, 2013 2:17 am

    Curt, I think your common feature set is actually describing a visual query tool backed by a high-speed data store. It’s a form of exploration, but it’s way off the mark. You need to recognize the emergent properties that make these tools valuable and exploratory.

    Here’s an example of exploration in the “negative space” that I mentioned in my earlier reply. One of my favorite milestones that comes in every implementation is the appearance of unexpected data relationships. The customer discovers that their data, and the processes that generated that data, are wrong. Not simply a matter of data entry error, but a hidden business problem, and usually quite a few. This can be an overwhelming time for some customers, but most are eager to see more. We have had BI tools for years that only display the results as queried–the positive space–and hide everything else. Is it true exploration if I have to know where to look?

    By the way, exploration is the process, and discovery is the outcome. Whether it’s called data discovery or business discovery or something else, I can guarantee novel discovery.

    Do the query engine and visualization layer need to be tightly coupled? Again, I think without tight coupling there’s just not much of value here. You need to be able to stay in the flow of thought, so we know that the total render time must be single-digit seconds (less than 3 is our rule). In that time, you need to calculate the results and the negative space. This would be a set of queries for the mins, maxes, or distinct values in all fields, depending on the discovery tool you’re using. Then we get to the concept of “sets” that I referred to. So now you have a doubling or tripling of the queries that you started with. You referred to RDBMS vendors that would say you can explore without tight integration. Sure, you have two options: a completely different and simplified experience; or simulate the full discovery experience by saturating the RDBMS with dozens of simultaneous queries and expect them to return with speed.

    And then there’s the coupling with ETL and “lightweight data modeling”. I think there is a separation to be made here, which I didn’t think in my earlier replies. The distinction is between exploration/discovery of “data” or the “business”. It’s as much an issue of the company culture as of the technical capabilities, so it’s not fair to expect it of all products, particularly those that target large enterprises. But there’s a lot to be discussed on this topic. One of my most common metaphors is shining a light. That’s how I feel about these discovery products. Democratizing discovery–making it possible for people with different perspectives to shine lights throughout the company–has huge impacts on data quality, process improvement, compliance, exception resolution, and much more.

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.