July 5, 2011

Eight kinds of analytic database (Part 1)

Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.

Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning.

Enterprise data warehouse (Full or partial)

Kinds of data likely to be included: All, but especially operational
Likely use styles: All
Canonical example: Central EDW for a big enterprise
Stresses: Concurrency, reliability, workload management

The enterprise data warehouse (EDW) ideal says that you copy all your data into one place, and drive all decision-making from there. Full EDWs are pipedreams. Still, a partial EDW makes sense for most large enterprises, and many indeed already have one. The first product lines to consider for classical EDWs are Teradata, DB2, Exadata, and maybe Microsoft SQL Server, especially if you’re going to stress concurrency and/or operational use cases.

Traditional data mart

Kinds of data likely to be included: All
Likely use styles: Business intelligence, budgeting/consolidation, investigative
Examples: Reporting servers, planning/consolidation servers, anything MOLAP, etc.
Stresses: Performance, concurrency, TCO

Whether or not you have something like an enterprise data warehouse, it’s common to have lighter-weight data marts as well. A traditional data mart might drive reports and dashboards. Or it might be specialized for budgeting, planning, and/or consolidation. Some investigative analytics may be in the mix as well.

Any DBMS that can support an EDW can also support a data mart, but it may not be the most cost-effective way to do so. Columnar DBMS might have more attractive performance and TCO (Total Cost of Ownership); the same goes for Netezza. Some of them — e.g. Sybase IQ and Vertica — have excellent track records in concurrent usage as well. Ted Codd pushed what amounts to MOLAP (Multidimensional OnLine Analytic Processing) systems for these use cases. But relational DBMS commonly do a better job, which is one reason most major MOLAP products have wound up at RDBMS companies.

Investigative data mart — agile

Kinds of data likely to be included: All, especially customer-centric
Likely use styles: Investigative
Canonical example: A few analysts getting a few TB to examine
Stresses: Ease of setup/load, ease of admin, price/performance

Besides the traditional data mart, there are at least two other kinds. Both are focused on investigative analytics, but they’re differentiated by database size.

If you have just a few analysts,* looking at no more than a few terabytes of data (perhaps even just some gigabytes) — and if that data is “single-subject” and fairly homogenous — your watchwords should be “cheap”, “easy”, and “fast”. You don’t need to invest in much hardware, in expensive software, in much administrative effort (the analysts can be their own DBAs), nor should you endure much set-up time. Just grab a product, grab some data, and start running queries (or extracts into the statistical tool of your choice).

*If you have dozens or even hundreds of analysts hitting the same database, you’re probably back to the more concurrency-oriented scenarios outlined above.

Infobright is often cost-effective among columnar analytic DBMS. Other vendors might cut you a price break as well. If you have multiple terabytes of data, don’t rule out Netezza’s lowest-end products (even if they’d really rather sell you something bigger). Or, if you’re in the sub-terabyte range, maybe you can get by with an in-memory BI tool such as QlikView, and not do anything special on the DBMS side at all.

Investigative data mart — big

Kinds of data likely to be included: All, especially customer-centric, logs, financial trade, scientific
Likely use styles: Investigative
Canonical example: Single-subject 20 TB – 20 PB relational database
Stresses: Performance, scale-out, analytic functionality

But if you’re looking at tens of terabytes of relational data, or even more, you really do have a “big data” problem. Performance and scalability are major challenges, usually best addressed by MPP (Massively Parallel Processing) systems, such as Netezza, Vertica, Aster Data, ParAccel, Teradata, or Greenplum. Performance POCs (Proofs Of Concept) are a big part of the buying process. Vendor price negotiations are crucial too.

Actually, in the low tens of terabytes you might be able to get away with a shared-disk system that has excellent compression — e.g., columnar products like Sybase IQ, Infobright, or SAND, rather than just Vertica and ParAccel.

Assuming you have affordable, scalable query performance, the competitive differentiator can switch to additional analytic functionality. Aster, Netezza, ParAccel, Vertica, and Greenplum either offer full analytic platforms, or seem to be on the path to doing so. Teradata, which now owns Aster Data, offers substantial built-in analytic capability in its traditional products as well, and the same goes for Sybase IQ.

Continued in Part 2, where we cover some of the more difficult use cases.

Categories: Analytic technologies, Aster Data, Benchmarks and POCs, Business intelligence, Buying processes, Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Database diversity, Exadata, Greenplum, IBM and DB2, Infobright, Investment research and trading, Log analysis, Microsoft and SQL*Server, MOLAP, Netezza, OLTP, Oracle, ParAccel, Parallelization, Petabyte-scale data management, Predictive modeling and advanced analytics, Pricing, QlikTech and QlikView, SAND Technology, Scientific research, Sybase, Teradata, Vertica Systems, Web analytics, Workload management

Subscribe to our complete feed!

Comments

7 Responses to “Eight kinds of analytic database (Part 1)”

Eight kinds of analytic database (Part 2) | DBMS 2 : DataBase Management System Services on July 5th, 2011 3:18 am

[…] Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data […]
Sybase IQ soundbites | DBMS 2 : DataBase Management System Services on July 7th, 2011 11:27 am

[…] Sybase IQ is the classic choice for what I call traditional data marts. […]
Big Data » Blog Archive BigData Analytics Produkte - Kategorisierung durch Kurt Monash on August 2nd, 2011 5:01 am

[…] Monash schlägt folgende 8 Kategorien für Big Data Analytics Produkte vor: […]
分析型数据库的分类 | Alex的个人Blog on October 14th, 2011 6:29 am

[…] 参考资料 dbms2 做了主要的分类 http://www.dbms2.com/2011/07/05/eight-kinds-of-analytic-database-part-1/ […]
Key questions when selecting an analytic RDBMS | DBMS 2 : DataBase Management System Services on February 10th, 2013 6:47 pm

[…] Eight kinds of analytic database (July 2011, 2-part post) […]
Teradata SQL-H, using HCatalog | DBMS 2 : DataBase Management System Services on January 31st, 2014 9:13 am

[…] could be a kind of investigative workflow. First you play around with the Hadoop data via SQL-H. Then when you think you’re […]
Which analytic technology problems are important to solve for whom? | DBMS 2 : DataBase Management System Services on April 12th, 2015 11:54 pm

[…] Eight kinds of analytic database (July, 2011) […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Eight kinds of analytic database (Part 1)

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin