July 5, 2011

Eight kinds of analytic database (Part 2)

In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear.

Bit bucket

Kinds of data likely to be included: Logs, other technical/external
Likely use styles: Staging/ETL, investigative
Canonical example: Log files in a Hadoop cluster
Stresses: TCO, scale-out, transform/big-query performance, ETL functionality

With the explosion of machine-generated data has come the need for a place to put it all, sometimes called the big bit bucket. This is like the investigative data mart for big databases, but more poly-structured. In some cases it is focused on data staging and transformation; but it can also be used for analysis in place.

The list of candidate technologies to run your bit bucket starts with Hadoop and Splunk.

Archival data store

Kinds of data likely to be included: Operational, CDR (call detail record), security log
Likely use styles: Archival, reporting (for compliance), possibly also investigative
Examples: Any long-term detailed historical store
Stresses: TCO, compression, scale-out, performance (if multi-use)

Analytic DBMS vendors have been insulting each other with the claim “that’s just an archival data store,” dating back at least to the first time Greenplum was deployed on an underpowered Sun Thumper system. Perhaps only Rainstor truly embraces the archival positioning, and I’ve become pretty dubious about their technical claims and their company alike.

Still, there’s a legitimate need for data stores — especially relational analytic DBMS that:

Store data cheaply, with high rates of compression.
Have decent performance if you do want to query the data.
May have archiving/compliance-specific features as well.

Along with Rainstor, SAND and SenSage have at least partially targeted that use case. In addition, appliance vendors such as Teradata and Netezza try to have an archive-oriented product version in their lineups.

Outsourced data mart

Kinds of data likely to be included: All
Likely use styles: Traditional BI, investigative analytics, staging/ETL
Examples: Advertising tracking, SaaS CRM
Stresses: Performance, TCO, reliability, concurrency

Much of what happens in analytic database management can also be outsourced. Some applications that run via SaaS (Software as a Service) are analytic. I’ve had three different clients whose main business is picking marketing targets in various vertical segments; others who wanted to add analytics to what were historically OLTP applications; and others yet who just offered online business intelligence. Also, if your fundamental business is gathering data and reselling it to a variety of user organizations, that’s an analytic data management challenge. The possibilities expand from there.

Data outsourcers are in the IT business, and so their IT development is — hopefully! — more serious and less politically encumbered than at many conventional enterprises. Thus, legacy systems and master data management issues are commonly less prevalent, or at least more aggressively disposed of. The same, up to a point, goes for vendor politics.* Multitenancy is commonly an issue, as is running in the cloud.

*Even so, there’s often That Guy who doesn’t want to migrate away from Oracle, no matter what.

Vertica gets the nod in a number of these cases; it’s cloud-friendly, and often the problem is naturally columnar. Other columnar products can be good choices too, with added brownie points for Infobright if the shop is MySQL-oriented anyway. Running Netezza or other appliances makes sense mainly if you’re pretty sure you want to keep operating your own data centers, but some data outsourcers are just fine with that assumption.

Operational analytic(s) server

Kinds of data likely to be included: Customer-centric, log, financial trade
Likely use styles: Advanced operational analytics
Examples:
- Lower latency: Web or call-center personalization, anti-fraud
- Higher latency: Customer profiling, Basel 3 risk analysis
Stresses: Performance, reliability, analytic functionality, perhaps concurrency

Even with eight different choices, I need a “catch-all” category; this is it.

Suppose you want to do reasonably sophisticated analytics, then use the results in operations. This is the classical challenge in integrating short-request and analytic processing. There are multiple ways to tackle it, embodying different trade-offs in cost, convenience, or analytic accuracy. If the platform on which you want to run your investigative analytics also has the reliability and concurrency appropriate for mission-critical operations, you’re set. Otherwise, you may want to pipe derived data into a more “industrial-strength” DBMS, ideally the one that runs your operational apps anyway

Another option is to integrate a limited amount of analytics immediately into your short-request processing system. For example, as bad as they are at the kinds of queries that require joins, NoSQL systems are often fast at simple aggregations. As MapReduce/NoSQL integrations mature, that option may not require pumping the data anywhere else for deeper analytics; even if it does, at least you’re starting out with the data in a convenient bit bucket.

Streaming/CEP-centric architectures could come into play as well. And it goes on from there. The possibilities in this last category are just too varied to generalize about.

So did I get them all? Or are there yet other analytic data management use cases that I don’t fit into my eight categories?

Categories: Analytic technologies, Archiving and information preservation, Business intelligence, Buying processes, Cloud computing, Columnar database management, Data mart outsourcing, Data types, Data warehouse appliances, Data warehousing, Database compression, Database diversity, EAI, EII, ETL, ELT, ETLT, Greenplum, Hadoop, Investment research and trading, Log analysis, MapReduce, MOLAP, MySQL, Netezza, NoSQL, Open source, Petabyte-scale data management, Predictive modeling and advanced analytics, Rainstor, SAND Technology, Scientific research, SenSage, Software as a Service (SaaS), Streaming and complex event processing (CEP), Telecommunications, Vertica Systems, Web analytics

Subscribe to our complete feed!

Comments

6 Responses to “Eight kinds of analytic database (Part 2)”

Eight kinds of analytic database (Part 1) | DBMS 2 : DataBase Management System Services on July 5th, 2011 3:19 am

[…] in Part 2, where we cover some of the more difficult use cases. Categories: Analytic technologies, Aster […]
Big Data » Blog Archive BigData Analytics Produkte - Kategorisierung durch Kurt Monash on August 2nd, 2011 5:14 am

[…] Eight kinds of analytic database (Part 2) […]
Christoph on August 25th, 2011 3:54 am

Hi,

Nice article.
Could you tell where you would put data (massive, mainly texts with meta-data attributes) that might be updated (like social media monitoring), where analysts can add tags or categories and modify properties of the document? Thanks
Curt Monash on August 25th, 2011 12:24 pm

Christoph,

It depends on the application set. Apparently you have operational uses included, so it might be a more traditional kind of data mart. If the more demanding processing is general analytics rather than the operational or hand-tagging parts, it might be more like one of the kinds of investigative mart.

You say “massive” and you also seem to imply that hand tagging is a big part of the whole. Those two statements don’t mesh perfectly, truth be told, unless there’s a LOT of hand-tagging going on. 😉
Database diversity revisited | DBMS 2 : DataBase Management System Services on July 8th, 2012 8:55 pm

[…] Eight kinds of analytic database (July, 2011) […]
Teradata SQL-H, using HCatalog | DBMS 2 : DataBase Management System Services on April 15th, 2013 2:46 am

[…] … the Hadoop clusters involved would hold a lot more data than you’d want to pay for storing in Teradata Aster. E.g., think of cases where Hadoop is used as a big bit bucket or archival data store. […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Eight kinds of analytic database (Part 2)

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin