I’ve been posting a lot recently about the diverse database technologies used to support data warehousing. With the marketplace supporting such a broad range of architectures, it seems clear that a lot of those architectures actually deserve to thrive, presumable each in a different kind of usage scenario. So in this post I’ll take a pass at dividing up use cases for data warehouses, and suggesting which kinds of data warehouse management technologies might do the best job of supporting them. To start with, I’ve divided things into a number of buckets:
- Pinpoint data lookup
- Constrained query and reporting
- Cube-filling calculations
- Hardcore tabular data crunching
- Text and media search
- Specialty areas, such as relationship analytics
In some uses, a data warehouse or mart is just a glorified operational data store for pinpoint data lookup, such as collecting all the information about a single customer. Due to unfortunate (if often unavoidable) legacy design choices, data is invariably scattered amongst a number of silos. A data warehouse can be needed* to consolidate that, even for the most transactional of uses. Almost any kind of relational DBMS can be a good choice as the underlying technology. The stereotypical application of this kind would be call center support — or a personalization-heavy e-commerce application — but examples can be found in almost every area of business. “Operational BI” often falls into this area. Lightweight scoring (based on data mining models) is often involved in these apps, although if that gets to be a major processing burden the scoring can usually be batched up in advance instead.
*Well, MDM — Master Data Management — can in some cases be an alternative, as can other middleware technologies, but that’s a whole other discussion.
Data warehouses and marts are also used to support huge amounts of other constrained query and reporting. True scheduled reporting, dashboards, fairly predictable ad hoc queries – i.e., classical BI — these are the bread and butter of traditional warehouse/mart-oriented data warehouse DBMS. General-purpose RDBMS often do a good job with these work loads; columnar and star-oriented systems are tuned for them; and any other data warehouse technology has to take them strongly into account as well. Stereotypical “applications” here would be pretty much anything that a standard BI reporting, query, and/or dashboard tool would do.
Ever since E. F. Codd popularized it, the phrase OnLine Analytic Processing has flopped around in meaning, as have variant terms (OLAP, MOLAP, ROLAP, HOLAP, etc.). The use of actual hypercube-centric technologies has ebbed and flowed a bit. But throughout, there’s been a well-defined and ever-growing need for planning, budgeting, allocation, and/or consolidation apps that require lots and lots of cube-filling calculations. These can be supported either by specialized cube-oriented data stores, or by the same kinds of systems (often with a star schema or something similar) used to support constrained query and reporting.
Finally, there is hardcore data crunching. Data mining fits that bill, but so does heavy SQL-only data exploration (aka “The Query That Ate Pittsburgh”). This is where a small number of expert users extract value from massive data stores. Scheduled reporting can also fit into this category at aggressive enterprises. Here is where the high-end data warehouse vendors – e.g., Teradata, IBM (mainframe DB2), and the data warehouse appliance startups – really shine. At smaller enterprises, other kinds of data stores also suffice. I have a careful list (two versions of the same list, actually) of data mining app categories over on the Monash Report. It’s a good start on a list of apps for this whole category.
But actually – that wasn’t the final category. While we’ve pretty much covered relational and other tabular warehousing, there’s also the whole huge category of text and media search. An enterprise text index is, in its own way, a data warehouse. And then there are a variety of specialty categories, such as relationship analytics.
As complicated as this taxonomy is becoming, it still understates the case. A large enterprise will typically have demanding apps in most, if not all, these categories. And so there’s a constant tension between consolidating various kinds of analytic function (and thus the needed data) into one data store, vs. offloading certain parts to specialty products. At sufficiently small enterprises, it may truly make sense to run everything, OLTP and analytics alike, on a single database. At larger enterprises, however, it is typically best to have a variety of marts and specialty warehouses. Exactly how those should be mixed and matched is a much tougher subject than can be adequately covered in a single article.