July 5, 2011

Eight kinds of analytic database (Part 1)

Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.

Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. 

Enterprise data warehouse (Full or partial)

The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. Full EDWs are pipedreams. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you’re going to stress concurrency and/or operational use cases.

Traditional data mart

Whether or not you have something like an enterprise data warehouse, it’s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation.  Some investigative analytics may be in the mix as well.

Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them — e.g. Sybase IQ and Vertica — have excellent track records in concurrent usage as well. Ted Codd pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.

Investigative data mart — agile

Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they’re differentiated by database size.

If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) — and if that data is “single-subject” and fairly homogenous — your watchwords should be “cheap”, “easy”, and “fast”. You don’t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs), nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).

*If you have dozens or even hundreds of analysts hitting the same database, you’re probably back to the more concurrency-oriented scenarios outlined above.

Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don’t rule out Netezza’s lowest-end products (even if they’d really rather sell you something bigger). Or, if you’re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.

Investigative data mart — big

But if you’re looking at tens of terabytes of relational data, or even more, you really do have a “big data” problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.

Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression — e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.

Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full analytic platforms, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.

Continued in Part 2, where we cover some of the more difficult use cases.

Comments

6 Responses to “Eight kinds of analytic database (Part 1)”

  1. Eight kinds of analytic database (Part 2) | DBMS 2 : DataBase Management System Services on July 5th, 2011 3:18 am

    [...] Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data [...]

  2. Sybase IQ soundbites | DBMS 2 : DataBase Management System Services on July 7th, 2011 11:27 am

    [...] Sybase IQ is the classic choice for what I call traditional data marts. [...]

  3. Big Data » Blog Archive BigData Analytics Produkte - Kategorisierung durch Kurt Monash on August 2nd, 2011 5:01 am

    [...] Monash schlägt folgende 8 Kategorien für Big Data Analytics Produkte vor: [...]

  4. 分析型数据库的分类 | Alex的个人Blog on October 14th, 2011 6:29 am

    [...]   参考资料 dbms2 做了主要的分类 http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/ [...]

  5. Key questions when selecting an analytic RDBMS | DBMS 2 : DataBase Management System Services on February 10th, 2013 6:47 pm

    [...] Eight kinds of analytic database (July 2011, 2-part post) [...]

  6. Teradata SQL-H, using HCatalog | DBMS 2 : DataBase Management System Services on January 31st, 2014 9:13 am

    [...] could be a kind of investigative workflow. First you play around with the Hadoop data via SQL-H. Then when you think you’re [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.