March 26, 2007

White paper — Index-Light MPP Data Warehousing

Many of my thoughts on data warehouse DBMS and appliances have been collected in a white paper, sponsored by DATAllegro. As in a couple of other white papers — collected here — I coined a phrase to describe the core concept: Index-light. MPP row-oriented data warehouse DBMSs certainly have indices, which are occasionally even used. But the approaches to database design that are supported or make sense to use are simply different for DATAllegro, Netezza (the most extreme example of all) or Teradata than for Oracle or Microsoft. And the differences are all in the direction of less indexing.

Here’s an excerpt from the paper. Please pardon the formatting; it reads better in the actual .PDF

Different DBMS are best at different tasks.

A single relational database management system (RDBMS) can perform a broad variety of duties. It may even do them all pretty well. But for some uses, a special-purpose product can greatly outperform general-purpose systems. Complex data warehousing is such a task.

Index-light MPP appliances excel at data warehousing.

For most data warehouses, market-leading general-purpose RDBMS are good enough. But for complex queries against multi-terabyte data warehouses, index-light MPP data warehouse appliances are a much more efficient option. Offered by DATAllegro, Netezza, Teradata (if you use the term “appliance” a bit loosely), and IBM (if you use the term “appliance” very loosely), these systems beat their index-heavy SMP counterparts on several major criteria:

  • Performance
  • Price/performance
  • Consistency of performance
  • Administration costs

Much of this superiority stems from three factors.

The index-light MPP (Massively Parallel Processing) appliance story hinges on three technical factors:

1. Shared-nothing MPP. Loosely-coupled systems are significantly cheaper than tightly-coupled ones, for the same level of raw component performance.

2. Reduced use of indices. By minimizing redundant references to information, index-light systems can store up to 7X less data than index-heavy ones. This produces enormous savings both in hardware and in administrative costs.

3. Avoidance of random disk reads. Disk rotation speeds have only improved 12.5-fold in the past 50 years, making random disk lookup the greatest constraint on conventional RDBMS performance. Index-light systems largely evade this bottleneck.

DATAllegro offers a prime example.

DATAllegro offers what may be the archetype of the index-light MPP appliance strategy. A typical system contains multiple standard servers, each responsible for 6-12 standard disk drives, for a total installation in the tens of terabytes. (Indeed, as of DATAllegro V3, the servers and storage units are just standard Dell and EMC products respectively.) Data generally comes off the disks in full table or partition scans, in 24-megabyte blocks, but you can use the functionality of Ingres if you want to. And the whole thing is a lot faster and cheaper than conventional index-heavy alternatives.

Comments

4 Responses to “White paper — Index-Light MPP Data Warehousing”

  1. DBMS2 — DataBase Management System Services»Blog Archive » Another short white paper on MPP data warehouse appliances on May 10th, 2007 12:34 pm

    [...] up on an earlier piece, DATAllegro has sponsored a second white paper on MPP data warehouse appliances. This one focuses [...]

  2. DBMS2 — DataBase Management System Services » Blog Archive » Notes from the Netezza user conference on April 25th, 2008 12:07 am

    [...] Yes, Netezza streams data off of disk rather than doing a lot of random seeks. But DATAllegro does the same thing, without recourse to FPGAs. That doesn’t really have much to do with complex event processing [...]

  3. MapReduce for data mining? Maybe for variable-schema analytics. | DBMS2 -- DataBase Management System Services on August 25th, 2008 3:52 am

    [...] that the data warehouse appliance vendors have ALREADY disrupted the market he’s focusing on. Index-light row-based and columnar systems are both super fast at data mining [...]

  4. The disk rotation speed bottleneck | DBMS2 -- DataBase Management System Services on January 31st, 2010 6:02 pm

    [...] been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.