March 16, 2012

Juggling analytic databases

I’d like to survey a few related ideas:

Here goes.

The idea of an analytic data store separate from your transactional one has been around since before the relational era. Its approximate evolution was:

In the past, large DBMS vendors liked to argue that enterprises should have a single analytic data store — commonly known as an enterprise data warehouse (EDW) — but that theory holds ever less water. A sample of my writing on that subject includes:

Recently, the big vendors have capitulated. In particular:

* Teradata also uses the term “ADW”, for Active Data Warehouse, which in essence means “Low latency! High concurrency! Rah rah rah!”

**Calling that “Smart Consolidation” is like naming a swinger club “Smart Fidelity”. But terminology aside, I endorse the idea.

***Teradata definitely expects its Hortonworks relationship to ascend beyond the Barney level; Tasso Argyros gave enough NDA details to be convincing about that. But it’s not there yet.

So data marts should often be managed by different technology than your core IDW. But even if you want to use the same technology, there are good reasons to have separate data marts, including the desire to manage:

In each case, the point is that:

I call this data mart spin-out, and am no longer sure where I first picked up that term. Oliver Ratzesberger popularized the concept when he was at eBay, and then Greenplum ran with it.

More precisely, Greenplum ran with it from a marketing standpoint. Delivery of what eventually became Chorus was more like a crawl.

Data mart spin-out can be either physical, in which case there’s real data (re)copying going on, or virtual, in which case the whole thing is being done as a trick in the core DBMS software, especially its workload management subsystem. Virtual spin-out is faster, more flexible, and less costly, all else being equal. But it does lead to a more complex mixed-workload scenario, which you’re relying on your workload management technology to sort out.

Anyhow:

So where is this all going? Mark Beyer of Gartner came up with the term “Logical Data Warehouse” three years ago, and evidently has been trying to refine its definition ever since. Forrester Research has been known to mention similar-sounding ideas. At this point, Gartner still seems to be trying to recreate the EDW fallacy at a higher level of abstraction, which is going to work even less well than EDWs did.

Informatica, which one might think would be the biggest fan of the idea, doesn’t seem to have embraced it yet. But then, the whole thing sounds somewhat like Oracle’s 1990s Project Sedona, which was one of the bigger fiascos in software history, and certainly was the greatest failure of Informatica CEO Sohaib Abbasi’s distinguished career.

My own opinion is:

Of course, one can retreat to saying “OK, but how about partly-universal, in line with the quasi-EDWs many enterprises have”? On that basis, I think some of the ideas of the “Logical Data Warehouse” will hold up, for example the ones that amount to glorified MDM (Master Data Management), and probably some of the ILM (Information Lifecycle Management) ones as well. The kind of low-level “Let’s build a mini-Facebook to keep track of and talk about our data stores” collaboration that Oliver open-sourced on his way out of eBay — and that seems to be part of Greenplum Chorus too — could also succeed.

But if you’re looking for some kind of logical/virtual Grand Data Unification – well, that won’t work any better than any other Grand Data Unification idea has over the past 40 years

Comments

11 Responses to “Juggling analytic databases”

  1. Brian Andersen on March 16th, 2012 3:14 pm

    “But if you’re looking for some kind of logical/virtual Grand Data Unification – well, that won’t work any better than any other Grand Data Unification idea has over the past 40 years”

    That whole thing, and then you just punt? We’re just stuck with duck tape and bubble gum from now on? I think it won’t work until it does and then these will look like the dark ages. Like when you had to program for every type of computer in a different language, and there were dozens of types of computers.

  2. Curt Monash on March 16th, 2012 3:36 pm

    People still use a multitude of languages to get their programming work done.

  3. Paul Johnson on March 19th, 2012 10:22 am

    Oliver’s ‘virtual’ data marts on Teradata at a previous employer are very much physical and not logical/virtual, a point discussed at some length about 6 months ago on the Teradata masters mailing list.

    The concept of a ‘virtual data mart’, consisting of views, has been around at Teradata for 10 years or more.

    Anyhow, irrespective of the technologies or architectures in play, the aim should remain the same – to give users what they want/need at an acceptable cost to all involved.

    For some this may mean a single ‘EDW’ style platform is sufficient, for others it may mean a core data warehouse with delivery data marts on a different technology.

    There are many platform/architecture choices available, which I personally view as ‘a good thing’…so long as they are used appropriately.

  4. Dan Graham on March 19th, 2012 3:40 pm

    Since the 1990s at AT&T and WalMart, Teradata has known customers needed multiple “central data warehouses”. Since 2007, we have been designing machines to fit specific workloads, some of which are your 8 kinds of analytic DB. Neverthless, we still favor consolidation and integration of data into the fewest number of systems: they provide more value than marts and in the long run are cheaper to own.

    Oliver’s data marts inside the Teradata box are now a product called “Data Lab”. These are marts inside the big box and can be joined to the production EDW data. Cool! And every mart has an expiration date on it. So its a great sandbox for power users aimed at “agile” analytics. We owe you a demo.

    Great blog, especially the last couple paragraphs and the rah rah rah.

  5. Alfredo on March 20th, 2012 12:45 pm

    We’ve been into this problem for a while eh! :)
    Great discussion though.

    At this moment we run Greenplum with datamarts, it made us possible “unify” more data access types than the previous version of our DW using other database technology, however, now, this is becoming impossible to analyze certain levels of information (event level on the internet) with Greenplum and hadoop appears for that kind of analytics. EMC has a solution for that, but I do not think that is the way to go in this case, since the needs here are fuzzier and requires more “elasticity” what is your opinion about a Virtual DW combining housed DW and possibly Hadoop on the cloud? crazy? not?
    Regards!

  6. John M. Wildenthal on March 20th, 2012 12:58 pm

    The ParAccel link after the physical vs. virtual discussion goes to a news article about a NEO approach – http://www.theregister.co.uk/2012/03/15/asteroid_near_miss/ – is this just happening to me, or to others, too?

  7. Curt Monash on March 20th, 2012 4:34 pm

    John,

    That should be http://www.dbms2.com/2011/02/03/paraccel-padb-technical-notes/ Fixing now. Thanks!

  8. Curt Monash on March 20th, 2012 4:42 pm

    Alfredo,

    That seems like a lot longer discussion that can be kicked off with a fuzzy couple-sentence question. :)

    I can’t even tell yet if your real problem is performance/scaling/efficiency, or if it’s achieving tight integration among analytic techniques that your current architecture keeps a bit separate from each other.

  9. Data(base) virtualization — a terminological mess | DBMS 2 : DataBase Management System Services on January 5th, 2013 12:50 pm

    [...] Logical data warehouse would seem to be a related concept. [...]

  10. Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — concepts | DBMS 2 : DataBase Management System Services on February 5th, 2013 8:25 am

    [...] of this makes sense. But Gartner has been talking about the “logical data warehouse” for a long time without ever seeming to firm up what it is, as evidenced for example by some dueling summaries of [...]

  11. Notes on Teradata systems | DBMS 2 : DataBase Management System Services on April 15th, 2013 2:53 am

    [...] March, 2012 post on various vendors’ admissions that multiple analytic database systems are needed. Categories: Data integration and middleware, Data warehouse appliances, Data warehousing, [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.