Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
The core of the Vertica story still seems to be compression
Back in March, I suggested that compression was a central and compelling aspect of Vertica’s story. Well, in their new blog, the Vertica guys now strongly reinforce that impression.
I recommend those two Database Column posts (by Sam Madden) highly. I’ve rarely seen such a clear, detailed presentation of a company’s technical argument. My own thoughts on the subject boil down to:
- In principle, all the technology (and hence all the technological advantages) they’re talking about could be turned into features of one of the indexing options of a row-oriented RDBMS. But in practice, there’s no indication that this will happen any time soon.
- Release 1 of the Vertica product will surely have many rough edges.
- Some startups are surprisingly ignorant of the issue involved in building a successful, industrial-strength DBMS. But a company that has both Mike Stonebraker and Jerry Held seriously involved has a big advantage. They may make other kinds of errors, but they won’t make many ignorant ones.
Categories: Columnar database management, Data warehousing, Database compression, Michael Stonebraker, Theory and architecture, Vertica Systems | 5 Comments |
Three bold assertions by Mike Stonebraker
In the first “meat” — i.e., other than housekeeping — post on the new Database Column blog, Mike Stonebraker makes three core claims:
1. Different DBMS should be used for different purposes. I am in violent agreement with that point, which is indeed a major theme of this blog.
2. Vertica’s software is 50X faster than anything non-columnar and 10X faster than anything columnar. Now, some of these stats surely come from the syndrome of comparing the future release of your product, as tuned by world’s greatest experts on it who also hope to get rich on their stock options in your company, vs. some well-established production release of your competitors’ products, tuned to an unknown level of excellence,* with the whole thing running test queries that you, in your impartial wisdom, deem representative of user needs. Or something like that … Read more
Categories: Benchmarks and POCs, Columnar database management, Data warehousing, Database diversity, Michael Stonebraker, OLTP, Theory and architecture, TransRelational | 3 Comments |
Philip Howard likes Calpont — again
The ratio of Philip Howard plaudits about Calpont to shipping products from Calpont has now doubled. Yet it also has remained the same. This is because it is a countably infinite number, namely a quotient whose denominator is zero. Last time around, he seemed to like their hardware strategy. This time around, he seems to like their lack of a hardware strategy. Be that as it may, the previously discussed nature of Calpont’s website hasn’t changed — one page, content-free, and misleading even so.
Oh, and it appears he broke the embargo on Paraccel. Bad Philip. Spank him, Kim.
Categories: Calpont, Data warehouse appliances, Data warehousing, Emulation, transparency, portability | 1 Comment |
Big stuff coming from DATAllegro
In the literal sense, that is. While the details on what I wrote about this a few weeks ago* are still embargoed, I’m at liberty to drop a few more hints.
*Please also see DATAllegro CEO Stuart Frost’s two comments added today to that thread.
DATAllegro systems these days basically consist of Dell servers talking to EMC disk arrays, with Cisco Infiniband to provide fast inter-server communication without significant CPU load. Well, if you decrease the number of Dell servers per EMC box, and increase the number of disks per EMC box, you can slash your per-terabyte price (possibly at the cost of lowering performance).
Read more
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro | Leave a Comment |
Dataupia – low-end data warehouse appliances
It’s unfortunate that Dataupia has concepts like “Utopia” and “Satori” in its marketing, as those serve to obscure what the company really offers – data warehouse appliances designed for the market’s low end. Indeed, it seems that they’re currently very low-end, because they were just rolled out in May and are correspondingly immature.
Basic aspects include:
- Type 1 appliances, which most other data warehouse appliance vendors (Teradata excepted) have moved away from. And there actually seems to be very little special about the hardware design to take advantage of the proprietary opportunity.
- Apparently limited redistribution of intermediate query result sets – i.e, the “fat head” architecture most competitors have moved away from. But it’s not pure fat-head; there’s some data redistribution.
- General lack of partnerships with the obvious software players (but they’re working on that).
- Low price point ($19,500 per 2-terabyte module).
Beyond price, Dataupia’s one big positive differentiation vs. alternative products is that you don’t write SQL directly to a Dataupia appliance. Rather, you talk to it through the federation capability in your big-brand DBMS, such as Oracle or SQL*Server. Benefits of this approach include: Read more
Categories: Data warehouse appliances, Data warehousing, Dataupia, Emulation, transparency, portability | 3 Comments |
DATAllegro heads for the high end
DATAllegro Stuart Frost called in for a prebriefing/feedback/consulting session. (I love advising my DBMS vendor clients on how to beat each other’s brains in. This was even more fun in the 1990s, when combat was generally more aggressive. Those were also the days when somebody would change jobs to an arch-rival and immediately explain how everything they’d told me before was utterly false …)
While I had Stuart on the phone, I did manage to extract some stuff I’m at liberty to use immediately. Here are the highlights: Read more
Categories: Data warehouse appliances, Data warehousing, Database compression, DATAllegro, Greenplum, Netezza, Teradata | 4 Comments |
Fast RDF in specialty relational databases
When Mike Stonebraker and I discussed RDF yesterday, he quickly turned to suggesting fast ways of implementing it over an RDBMS. Then, quite characteristically, he sent over a paper that allegedly covered them, but actually was about closely related schemes instead. 🙂 Edit: The paper has a new, stable URL. Hat tip to Daniel Abadi.
All minor confusion aside, here’s the story. At its core, an RDF database is one huge three-column table storing subject-property-object triples. In the naive implementation, you then have to join this table to itself repeatedly. Materialized views are a good start, but they only take you so far. Read more
Categories: Columnar database management, Data models and architecture, Data warehousing, Database compression, RDF and graphs, Theory and architecture, Vertica Systems | 1 Comment |
Bracing for Vertica
The word from Vertica is that the product will go GA in the fall, and that they’ll have blow-out benchmarks to exhibit.
I find this very credible. Indeed, the above may even be something of an understatement.
Vertica’s product surely has some drawbacks, which will become more apparent when the product is more available for examination. So I don’t expect row-based appliance innovators Netezza and DATAllegro to just dry up and blow away. On the other hand, not every data warehousing product is going to live long and prosper, and I’d rate Vertica’s chances higher than those of several competitors that are actually already in GA.
Categories: Columnar database management, Data warehousing, DATAllegro, Netezza, Vertica Systems | 2 Comments |
Large DB2 data warehouses on Linux (and AIX)
I was consulting recently to a client that needs to build really big relational data warehouses, and also is attracted to native XML. Naturally, I suggested they consider DB2. They immediately shot back that they were Linux-based, and didn’t think DB2 ran (or ran well) on Linux. Since IBM often leads with AIX-based offerings in its marketing and customer success stories, that wasn’t a ridiculous opinion. On the other hand, it also was very far from what I believed.
So I fired some questions at IBM, Read more
Categories: Data warehousing, IBM and DB2 | Leave a Comment |
Another short white paper on MPP data warehouse appliances
Following up on an earlier piece, DATAllegro has sponsored a second white paper on MPP data warehouse appliances. This one focuses specifically on DATAllegro’s move from Type 1 to Type 2 (i.e., virtual) appliances, via its new V3 product line. The basic tradeoffs of this move include:
- Superior hardware reliability
- Hardware lock-in shifted from DATAllegro to Dell, EMC, and Cisco
- Loss of specialized encryption acceleration
- Possibly a loss of some other performance optimization as well
- Better time-to-market in exploiting general Moore’s Law performance speedups
Actually, I didn’t make that last point explicitly in the paper, but it quite possibly trumps any performance disadvantages from the switch. And Moore’s Law itself certainly far outweighs any other performance-affecting factors.
Categories: Data warehouse appliances, Data warehousing, DATAllegro | Leave a Comment |