September 28th, 2006 Curt Monash
One of the least understood aspects of data warehouse technology is what may be called the
Expansion Ratio = (Total disk space used, except for mirroring) / (Size of the base database).
This is similar to the explosion ratio discussed in the OLAP Report’s justly famous discussion of database explosion, but I’m going with my own terminology because I don’t want to be tied to their precise terminology, nor to their technical focus. Expansion Ratios are hotly debated, with some figures being:
- Teradata claims an Expansion Ratio of 8-9X for Oracle, 6X for DB2 (open system version), and 2.5X for Teradata. The underlying source is data warehouses they’ve replaced, so there may be a bias toward out-of-control warehouses on the part of their competitors.
- An anonymous appliance vendor exec said to me off the top of his head that Oracle has 6-8X Expansion Ratios.
- Oracle’s TPC-H submissions in the largest size range (10 terabytes) have 9.7-10.5X Expansion Ratios, if I’m reading the TPCs correctly.
- Oracle cites a survey of 8 customers with 10-60 Tb database size in which the Expansion Ratio works out to 1.6X. (More on this anomalous result below.)
I don’t have actual figures from Netezza and DATallegro, but I imagine they’d come out lower than 2X, possibly well below.
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Data warehousing, Database compression, IBM and DB2, Netezza, Oracle, Relational database management systems, Teradata | 4 Comments »
September 27th, 2006 Curt Monash
I talked at length with Bill Blake and Doug Johnson of Netezza today. (Bill is exactly the guy I complained of previously having had my access cut off to.) One takeaway was a clarification of their approach to transactions, which sounds even cooler than I first thought. It’s actually not a new idea; they just timestamp rows with CreateIDs and DeleteIDs, then exploit those to the hilt. Actually, it seems like this approach would be interesting in OTLP as well, although I’m not aware of it being used in any of the more successful OLTP DBMS systems. (Yes, this is an open invitation to fans of less-established DBMS products to tell me of their virtues, preferably in a flame-free manner.)
Read the rest of this entry »
Posted in Data warehouse appliances, Netezza, Relational database management systems | 2 Comments »
September 27th, 2006 Curt Monash
Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:
- Shared-everything architecture
- End-to-end solution story
- OLTP industrial-strengthness carried over to data warehousing
In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, EII, ETL, and/or EAI, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Relational database management systems, Teradata | 1 Comment »
September 24th, 2006 Curt Monash
The very name of this blog comes from the kind of “horses for courses” data store strategy implied by my recent post on different kinds of data warehouse uses. A number of other commentators have recently made similar points, although they may not agree with every detail. For example, William McKnight pretty much makes the pure DBMS2 argument, pointing out that a partially virtual warehouse is often superior to a fully centralized physical one. And Andy Hayler of Kalido says pretty much the same thing, although he strongly calls out his difference in emphasis from William’s view.
A tip of the hat to Mark Rittman for pointing me to those two and others.
Posted in Data warehouse appliances, Database theory and practice, EII, ETL, and/or EAI, Relational database management systems | No Comments »
September 24th, 2006 Curt Monash
I’ve been posting a lot recently about the diverse database technologies used to support data warehousing. With the marketplace supporting such a broad range of architectures, it seems clear that a lot of those architectures actually deserve to thrive, presumable each in a different kind of usage scenario. So in this post I’ll take a pass at dividing up use cases for data warehouses, and suggesting which kinds of data warehouse management technologies might do the best job of supporting them. To start with, I’ve divided things into a number of buckets:
- Pinpoint data lookup
- Constrained query and reporting
- Cube-filling calculations
- Hardcore tabular data crunching
- Text and media search
- Specialty areas, such as relationship analytics
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Data warehousing, IBM and DB2, MOLAP, Netezza, Relational database management systems, Teradata | 1 Comment »
September 22nd, 2006 Curt Monash
EDIT: Now they seem to be working again, with no action on my part and no known software updates through the whole process. Go figure. I do not know Wordpress well enough to guess just exactly what had to have been broken and then fixed at my hosting provider to have caused these effects.
As of this writing, my blogs (DBMS2, the Monash Report, Text Technologies, and Software Memories) are all working in Firefox, and the top page of each is working in IE, but the rest of the pages/links are NOT working in IE. (But www.monash.com, a non-Wordpress site on the same host, is still working through IE.) Naturallly, I’m addressing this problem as fast as I can. I imagine the fix will involve some sort of a reinstall and/or theme change, which could alter the blogs’ look-and-feel, maybe not for the better (especially at first). I apologize for the inconvenience!
Posted in About this blog | No Comments »
September 22nd, 2006 Curt Monash
The last person I spoke with at the Netezza conference on Tuesday was a customer/presenter that the company had picked out for me. One thing he said baffled me — he claimed that Netezza was a real appliance vendor, but DATallegro wasn’t, presumably due to administrability issues. Now, it wasn’t clear to me that he’d ever evaluated DATallegro, so I didn’t take this too seriously, but still the exchange brought into focus the great differences between data warehouse products in the area of administration. For example:
- Netezza has no indices at all. And no caches. And the hardware is preconfigured. This all makes administration pretty simple.
- DATallegro has almost no indices, and also has preconfigured hardware. But it has some partitioning, optionally.
- Teradata also has preconfigured hardware. It does have indices, but rather simple ones. Plus it has join indices. And it has a few more configuration options in other areas (e.g., block size) than the other appliance vendors. (Yes, I count Teradata among the appliances.)
- If you go through all the fuss of installing SAP’s applications and BI technology anyway, the incremental administration of just SAP BI Accelerator is pretty light.
- Oracle and IBM have mammothly complex indexing options, but have put large amounts of work into tools to lessen the resulting administrative burden.
- IBM offers preconfigured hardware units to simplify some installation issues.
- Come to think of it, I don’t really know how hard it is to administer columnar systems (e.g., Sybase IQ).
Posted in DATAllegro, Data warehouse appliances, Data warehousing, Greenplum, IBM and DB2, Netezza, Oracle, Relational database management systems, SAP, BI Accelerator, and MaxDB, Teradata | 2 Comments »
September 20th, 2006 Curt Monash
I wrote about SAP’s BI Accelerator quite a bit in my white paper on memory-centric data management, but otherwise I seem not to have posted much about it here. In essence, it’s a product that’s all RAM-based, and generally geared for multi-hundred-gigabyte data marts. The basic design is a compression-heavy column-based architecture, evolved from SAP’s text-indexing technology TREX. Like data warehouse appliances, it eschews indexing, relying instead on blazingly fast table scans.
I asked Lothar Schubert of SAP how BIA was doing in the market in its early going. This was his response:
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Data warehouse appliances, Data warehousing, Database compression, Memory-centric data management, Relational database management systems, SAP, BI Accelerator, and MaxDB | 5 Comments »
September 20th, 2006 Curt Monash
Sometimes, when one talks to a company about a close competitor, what one hears may not be 100% strictly accurate. Yesterday, I more than once heard claims that sounded oddly like “DATallegro has to open source whatever software it develops.” Today, DATallegro CEO Stuart Frost clarified as follows:
• DATallegro has no (little?) legal obligation to open source anything. Even the version of Ingres they use is not the GPL one.
• They do give a few enhancements back to Ingres (via open source?) rather than maintain them themselves.
• The whole MPP technology is proprietary, in every sense of “proprietary.” (For example, they use a whole different optimizer than Ingres’s. I’ve forgotten whether the Ingres optimizer is also left in place.)
Posted in DATAllegro, Data warehouse appliances, Ingres, Memory-centric data management, Open source RDBMS, Relational database management systems | 1 Comment »
September 20th, 2006 Curt Monash
Todd Walter and Randy Lea of Teradata gave generously of their time today, ducking out of their user conference, and shared their take on issues we’ve been discussing here recently. Overall, Teradata response to the data warehouse appliance guys is essentially: “Well, those may be fine for specific queries, or for data marts, but in true blended enterprise data warehouse workloads we’re superior, including in performance.”
Specific takeaways included:
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Netezza, Relational database management systems, Teradata | 4 Comments »
September 20th, 2006 Curt Monash
I talked with Teradata today, and they called me on my use of the term “sequential.” Basically, if there’s any head movement for disk seeks, some computer science researchers wouldn’t call it “sequential.” I didn’t know that; I was just familiar with the less precise usage of the term in some vendors’ marketing and discussions.* OK, I’ll make up a new, more precise term instead. How about “coarse-grained”?
*And so we have another instance of Monash’s First Law of Commercial Semantics: Bad jargon drives out good.
Posted in Database theory and practice, Teradata | 8 Comments »
September 20th, 2006 Curt Monash
There’s another cool-sounding part to the Netezza story, which straddles their chips and their software: The FPGA takes over the work of assuring database consistency. If the system attempts to read and write a record at the same time, the FPGA keeps thing straight. This eliminates the need for locks — at least if you don’t care about transactional integrity — and some of the reason for logs. (I guess that in lieu of any kind of rollback/rollforward they just rely on failover to mirrored disks.)
This isn’t exactly the way one would want to do OLTP, and in general my head is shaking as I write this — but it sure seems to suffice for some rather demanding data warehouse users.
Posted in Data warehouse appliances, Database theory and practice, Netezza, Relational database management systems | 2 Comments »
September 20th, 2006 Curt Monash
In addition to its software story, Netezza of course has a rather unique chip story. Where other vendors might have standard disk controllers and high-powered microprocessors, Netezza respectively has a FPGA (Field-Programmable Gate Array) and lesser microprocessor (PowerPC). Netezza claims that two major advantages of these choices are:
- 5X throughput/performance improvement
- Much lower heat and power consumption.
The main function of the FPGA, other than generically getting data on and off disk, is to restrict and project tables (i.e., execute single-table WHERE clauses). Netezza claims that their FPGAs can perform these operations on the streaming data at least as quickly as an expensive, hot, power-hungry top-end microprocessor would, and indeed faster. The key word is “streaming”, which they contrast to the microprocessor’s need to get the data in and then back out of RAM (cache or otherwise).
I’ll be interested to see whether somebody can muster a ringing refutation to Netezza’s claims.
Posted in DATAllegro, Data warehouse appliances, Netezza, Relational database management systems | 10 Comments »
September 20th, 2006 Curt Monash
For various reasons, I’m not going to try to give a comprehensive overview of the Netezza story. But I’d like to highlight four points that illustrate a lot of the difference between Netezza’s architecture and that of more conventional data warehousing DBMS.
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Netezza, Relational database management systems | 3 Comments »
September 20th, 2006 Curt Monash
Over the past year, Netezza has exhibited the squirreliest question-dodging behavior I’ve seen from a DBMS vendor since – actually, since Sybase tried to conceal the System 10 fiasco in 1993-5. To its credit, however, Netezza finally decided to open the kimono. Specifically, they invited me to their user conference, which I attended today, and indeed were quite helpful in FINALLY getting my questions addressed, and in offering more access as needed.
Read the rest of this entry »
Posted in Data warehouse appliances, Netezza, Relational database management systems | 1 Comment »
September 19th, 2006 Curt Monash
A lot of evidence is pointing to a major paradigm shift in data warehouse RDBMS, along the lines of:
Old way: Assume I/O is random; lower total execution time by improving selectivity and thus lowering the amount of I/O.
New way: Drive the amount of random I/O to near zero, and do as much sequential I/O as necessary to achieve this goal.
Examples include:
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Database theory and practice, Memory-centric data management, Relational database management systems, SAP, BI Accelerator, and MaxDB, TransRelational | 3 Comments »
September 6th, 2006 Curt Monash
I use Akismet as a spam-catcher. On the whole it’s good, but it has one annoying deficiency — you can only review the 150 most recent suspected spam. This time around, however, I had 766 suspected spam. If you had a valid comment in the 616 I couldn’t review, I’m sorry. Please be so kind as to resubmit it.
Thanks,
CAM
EDIT: The attack continues. Today I deleted 245 real or imagined spam. A couple of days ago it was 135, all real.
Posted in About this blog | 1 Comment »