Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
Is Teradata bringing out a low-end data warehouse appliance?
Edit: This post is superseded by our analysis of the new Teradata 2500 data warehouse appliance.
One of Teradata’s competitors believes they got an accurate leak about a new low-end Teradata appliance. Teradata is neither confirming nor denying. I believe the leak.
I’m not going to give product or pricing details, which in any case could be subject to change before a final product release. But the general idea is:
- Commodity Dell servers.
- Some of the higher-end software stripped out.
- Limit on the number of nodes, leading to a database size limit somewhere in the tens of terabytes.
It will be interesting to see whether Teradata can come out with something that’s closely competitive in price, performance, and administrative ease to what the newer data warehouse appliance vendors offer, yet upgrades cleanly to full-sophistication Teradata systems for those who choose to pursue that path.
Categories: Data warehouse appliances, Data warehousing, Teradata | 1 Comment |
Will Brighthouse become the MySQL data warehouse of choice?
As I’ve previously noted:
- Infobright is about to make more noise about its MySQL-based data warehouse software, Brighthouse.
- Brighthouse has some very interesting technical features.
- A Sun/Infobright partnership would make a lot of sense.
Talking with Infobright today, I was again struck by how close their relationship with MySQL (the company is). Stay tuned.
Categories: Analytic technologies, Data warehousing, Infobright, MySQL | Leave a Comment |
Infobright is gearing up for a press push
There’s another TDWI conference coming up, so it’s time for data warehouse-related press rollouts. Infobright (one of my many clients in this area) will be doing one of them, and ran an early version by me. Customer announcements, vendor partnerships, and so on are still being finalized, but anyhow Infobright has 7 revenue-recognized customers and a bunch more that are sold and in the implementation cycle. There’s a Release 3 of Brighthouse coming up. As one would expect, Release 3’s major claims to fame are the general addition of features (including some which elicit a “You didn’t have that already?” reaction), plus huge performance improvements in some queries (i.e., the biggest bottlenecks in Brighthouse Release 2).
On that level, it’s all standard stuff, as is Infobright’s core pitch — ease, simplicity, low cost, etc., and the benefits of same. But drilling down, there are some rather unique technical claims. Read more
Categories: Analytic technologies, Data warehousing, Infobright | 1 Comment |
Things could get interesting for Infobright
Of the many new specialty data warehouse DBMS and appliances, Infobright’s BrightHouse is the only leading one based on MySQL. I expect Sun and Infobright to have some interesting conversations now. Conversely, I wouldn’t be optimistic about any partnering discussions Infobright might have with, say, HP.
The most directly competitive relationship Sun now has to any future Infobright partnership is with ParAccel.
Categories: Analytic technologies, Data warehousing, Infobright, MySQL, Open source, ParAccel | 2 Comments |
Flash-based data warehousing is getting ever closer
EMC is rolling out solid-state drives later this quarter. The press release mentions the word “terabyte”, so this is for non-trivial systems. And by the way, 100,000 write/erase cycles before something wears out is several per hour, so that’s a non-problem for data warehousing.
ParAccel and SAP already offer RAM-based appliances. I suspect we’ll see appliances based on solid-state drives before long. I also wouldn’t be shocked if a non-appliance vendor such as Oracle suddenly jumped into this area, trying to use it as a way to leapfrog the appliance vendors.
Categories: Data warehouse appliances, Data warehousing | 1 Comment |
Netezza targets 1 petabyte
Netezza is promising petabyte-scale appliances later this year, up from 100 terabytes. That’s user data (I checked), and assumes 2-3X compression, or a little less than they think is actually likely. I.e., they’re describing their capacity in the same kinds of terms other responsible vendors do. They haven’t actually built and tested any 1 petabyte systems internally yet, but they’ve gone over 100 terabytes.
Basically, this leaves Netezza’s high-end capability about 10X below Teradata’s. On the other hand, it should leave them capable of handling pretty much every Teradata database in existence. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Petabyte-scale data management, Teradata | Leave a Comment |
A quick survey of data warehouse management technology
There are at least 16 different vendors offering appliances and/or software that do database management primarily for analytic purposes.* That’s a lot to keep up with,. So I’ve thrown together a little overview of the analytic data management landscape, liberally salted with links to information about specific vendors, products, or technical issues. In some ways, this is a companion piece to my prior post about data warehouse appliance myths and realities.
*And that’s just the tabular/alphanumeric guys. Add in text search and you run the total a lot higher.
Numerous data warehouse specialists offer traditional row-based relational DBMS architectures, but optimize them for analytic workloads. These include Teradata, Netezza, DATAllegro, Greenplum, Dataupia, and SAS. All of those except SAS are wholly or primarily vendors of MPP/shared-nothing data warehouse appliances. EDIT: See the comment thread for a correction re Kognitio.
Numerous data warehouse specialists offer column-based relational DBMS architectures. These include Sybase (with the Sybase IQ product, originally from Expressway), Vertica, ParAccel, Infobright, Kognitio (formerly White Cross), and Sand. Read more
Netezza rolls out its compression story
The proximate cause for today’s flurry of Netezza-related posts is that the company has finally rolled out its compression story. In a nutshell, Netezza has developed its own version of columnar delta compression, slated to ship May, 2008. It compresses 2-5X, with the factor sometimes going up into double digits. Netezza estimates this produces a 2-3X improvement in overall performance, with the core marketing claim being that performance will “double” from compression alone. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Database compression, Netezza, Theory and architecture | Leave a Comment |
Netezza is finally opening the kimono
I’ve bashed Netezza repeatedly for secrecy and obscurity about its technology and technical plans. Well, they’re getting a lot better. The latest post in a Netezza company blog, by marketing exec Phil Francisco, lays out their story clearly and concisely. And it’s backed up by a white paper that does more of the same. In particular, Page 11 of that white paper spells out possible future directions for enhancement, such as better compression, encryption, join filtering, and Netezza Developer Network stuff. Read more
Data warehouse appliances – fact and fiction
Borrowing the “Fact or fiction?” meme from the sports world:
- Data warehouse appliances have to have specialized hardware. Fiction. Indeed, most contenders except Teradata and Netezza — for example, DATAllegro, Vertica, ParAccel, Greenplum, and Infobright — offer Type 2 appliances. (Dataupia is another exception.)
- Specialized hardware is a dead-end for data warehouse appliances. Fiction. If it were easy for Teradata to replace its specialized switch technology, it would have done so a decade ago. And Netezza’s strategy has a lot of appeal.
- Data warehouse appliances are nothing new, and failed long ago. Fiction, but only because of Teradata. 1980s appliance pioneer Britton-Lee didn’t do so well (it was actually bought by Teradata). IBM and ICL (Britain’s national-champion hardware company) had content-addressable data store technology that went nowhere.
- Since data warehouse appliances failed long ago, they’ll fail now too. Fiction. Shared-nothing MPP is a fundamental advantage of appliances. So are various index-light strategies. Data warehouse appliances are here to stay.
- Data warehouse appliances only make sense if your main database management system can’t handle the job. Fiction. There are dozens of data warehouse appliances managing under 5 terabytes of user data, if not under 1 terabyte. True, some of them are legacy installations, dating back to when Oracle couldn’t handle that much data well itself. But new ones are still going in. Even if Oracle or Microsoft SQL Server can do the job, a data warehouse appliance is often a far superior — cheaper, easier to deploy and keep running, and/or better performing — alternative.
- Data warehouse appliances are just for data marts. For your full enterprise data warehouse, use a conventional DBMS. Part fact, part fiction. It depends on the appliance, and on the complexity of your needs. Teradata systems can do pretty much everything. Netezza and DATAllegro, two of the oldest data warehouse appliance startups, have worked hard on their concurrency issues and now can support fairly large user or reporting loads. They also can handle reasonable volumes of transactional or trickle-feed updates, and probably can support full EDW requirements for decent-sized organizations. Even so, there are some warehouse use cases for which they’re ill-suited. Newer appliance vendors are more limited yet.
- Analytic appliances are just renamed data warehouse appliances. Fact, even if misleading. Netezza is using the term “analytic appliance” to highlight additional things one can do on its boxes beyond answering queries. But those are still operations on a data mart or data warehouse.
- Teradata is the leading data warehouse appliance vendor. More fact than fiction. Some observers say that Teradata systems aren’t data warehouse appliances. But I think they are. Competitors may be superior to Teradata in one or the other characteristic trait of appliances – e.g., speed of installation – but it’s hard to define “appliances” in an objective way that excludes Teradata.
If you liked this post, you might also like one on text mining fact and fiction.