July 8, 2012

Database diversity revisited

From time to time, I try to step back and build a little taxonomy for the variety in database technology. One effort was 4 1/2 years ago, in a pre-planned exchange with Mike Stonebraker (his side, alas, has since been taken down). A year ago I spelled out eight kinds of analytic database.

The angle I’ll take this time is to say that every sufficiently large enterprise needs to be cognizant of at least 7 kinds of database challenge. General notes on that include:

The Big Seven database challenges that almost any enterprise faces are:

Persistent OLTP (OnLine Transaction Processing) database management. If you’re an enterprise of any size, you surely have this need. Most commonly, this need is met by a row-based relational DBMS — Oracle, IBM DB2, Microsoft SQL Server, Sybase ASE, MySQL, PostgreSQL, Progress OpenEdge et al. However,

Website and network backing. When we look specifically at websites, the situation shifts somewhat. These can combine aspects of:

What’s more, it can be unwise to combine true OLTP and user interaction tracking in a single relational database. For one horrific example, consider the September, 2010 Chase outage.

Similar considerations can apply for other systems that ingest machine-generated data, e.g. from social games or sensor networks.

In-memory cache or DBMS. It’s increasingly hard to think of a major OLTP system or web property that goes straight to persistent storage, without an intermediate in-memory layer. Or, if you do have one, it’s because you picked your persistent data store primarily for how well it functions when the whole working set is in RAM. I touched on some of those points in a general memory-centric data management survey last April. Beyond that, I need to learn more about caching grids of various kinds.

Analytic support. Whether you’re focused on event monitoring, trend monitoring, or flat-out investigative analytics, there’s a lot of analysis to be done, and a lot of data stores optimized for helping you do it. Those are, of course, a major subject of this blog. Overview posts include:

One point not emphasized in those posts — sometimes you have a really specialized analytic need that gets you looking at a corresponding DBMS, such as a graph store or maybe SciDB.

True document management. People started recording business information in document format over 5000 years ago. They never stopped. If nothing else, enterprises at least need search engines. Or they can manage their documents via systems that have other merits as well; indeed, I’ve sent more than one client in the direction of MarkLogic.

Embedded database management. Enterprises operate many systems that feature internal database management — e.g. email, computer-aided engineering of various kinds, security appliances, or most things that generate logs. Often, you can just forget about the underlying data management, figuring the system supplier has it covered. On the other hand, perhaps you should stop and think — do you want access to that data as part of your general computing environment? If so, then perhaps you should get more involved in managing or extracting it.

And of course, you may be in the business of developing to embedded DBMS yourself. Those can take many forms. Generally, when I write about them, I focus on the kind of DBMS — e.g. in-memory or mid-range — rather than obsessing about whether a particular product happens to be sold more often through OEM rather than direct channels.

Finally, there’s data integration, among your own databases (of which there are many), but also with external ones. I have some catching up to do on the various flavors of classical ETL (Extract/Transform/Load), so I’m talking with vendors again, including Informatica — but not Talend, which seems reluctant to let me talk with somebody technical, and also not the secrecy-obsessed Ab Initio folks. I probably should circle back to SnapLogic, and of course to my neglected clients at Syncsort. As for Hadoop-related data integration, I’m still figuring that out too. Several people I respect seem excited about HCatalog, and I’m pursuing that further.

One opinion I hold in data integration is that it’s increasingly important to stream updates to your analytic data store as soon as they come in, due to the general desire for low-latency analytics. I see this as something that can and sometimes should be done with the same replication technologies used for high availability, disaster recovery, and so on. More advanced ETL capabilities often aren’t needed; instead, ELT suffices.

Overall, I think enterprises could wind up with diversity in data integration rivaling what they have in database management itself. Candidates include:

Stay tuned for further research.


3 Responses to “Database diversity revisited”

  1. M-A-O-L » Database Diversity Revisited on July 10th, 2012 4:15 pm

    […] Database diversity revisited. I can relate to that for the Financial Industry (not having had much exposure to other […]

  2. M-A-O-L » What will the Internet look like in 2020? on July 12th, 2012 12:36 am

    […] data stores is coming big time, whether that’s caches as above, or analytic data stores as in Monash’s ELT example that I wrote about the other […]

  3. Five different kinds of business intelligence | DBMS 2 : DataBase Management System Services on July 16th, 2012 3:45 am

    […] recently categorized seven different kinds of database, let me now make a similar effort for business intelligence. To a first approximation, I’d […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.