Data warehousing
Analysis of issues in data warehousing, with extensive coverage of database management systems and data warehouse appliances that are optimized to query large volumes of data. Related subjects include:
VectorWise, Ingres, and MonetDB
I talked with Peter Boncz and Marcin Zukowski of VectorWise last Wednesday, but didn’t get around to writing about VectorWise immediately. Since then, VectorWise and its partner Ingres have gotten considerable coverage, especially from an enthusiastic Daniel Abadi. Basic facts that you may already know include:
- VectorWise, the product, will be an open-source columnar analytic DBMS. (But that’s not quite true. Pending productization, it’s more accurate to call the VectorWise technology a row/column hybrid.)
- VectorWise is due to be introduced in 2010. (Peter Boncz said that to me more clearly than I’ve seen in other coverage.)
- VectorWise and Ingres have a deal in which Ingres will at least be the exclusive seller of the VectorWise technology, and hopefully will buy the whole company.
- Notwithstanding that it was once named something like “MonetDB,” VectorWise actually is not the same thing as MonetDB, another open source columnar analytic DBMS from the same research group.
- The MonetDB and VectorWise research groups consist in large part of academics in Holland, specifically at CWI (Centrum voor Wiskunde en Informatica). But Ingres has a research group working on the project too. (Right now there are about seven “highly experienced” people each on the VectorWise and Ingres sides, although at least the VectorWise folks aren’t all full-time. More are being added.)
- Ingres and VectorWise haven’t agreed exactly how VectorWise and Ingres Classic will play together in the Ingres product line. (All of the obvious possibilities are still on the table.)
- VectorWise is shared-everything, just as Ingres is. But plans — still tentative — are afoot to integrate VectorWise with MapReduce in Daniel Abadi’s HadoopDB project.
Categories: Actian and Ingres, Analytic technologies, Columnar database management, Data warehousing, Database compression, MonetDB, Open source, Theory and architecture, VectorWise | 12 Comments |
Teradata 13 focuses on advanced analytic performance
Last October I wrote about the Teradata 13 release of Teradata’s database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:
- Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
- UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.
To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well. Read more
“The Netezza price point”
Over the past couple of years, quite a few data warehouse appliance or DBMS vendors have talked to me directly in terms of “Netezza’s price point,” or some similar phrase. Some have indicated that they’re right around the Netezza price point, but think their products are superior to Netezza’s. Others have stressed the large gap between their price and Netezza’s. But one way or the other, “Netezza’s price” has been an industry metric.
One reason everybody talks about the “Netezza (list) price” is that it hasn’t been changing much, seemingly staying stable at $50-60K/terabyte for a long time. And thus Teradata’s 2550 and Oracle’s larger-disk Exadata configuration — both priced more or less in the same range — have clearly been price-competitive with Netezza since their respective introductions.
That just changed. Netezza is cutting its pricing to the $20K/terabyte range imminently, with further cuts to come. So where does that leave competitors?
- The Teradata 1550 is in the Netezza price range (still a little below, actually).
- Oracle basically has nothing price-competitive with Netezza.
- Microsoft has stated it plans to introduce Madison below the old DATAllegro price points; conceivably, that could be competitive with Netezza’s new pricing, although I haven’t checked as to how much it now costs simply to buy a lot of SQL Server licenses (which presumably would be a Madison lower bound, and might except for hardware be the whole thing, since Microsoft likes to create large product bundles).
- XtremeData just launched in the new Netezza price range.
- Troubled Dataupia is hard to judge. While on the surface Dataupia’s prices sound very low, you can’t use a Dataupia box unless you also have a brand-name DBMS (license and hardware) alongside it. That obviously affects total cost significantly.
- Kickfire seems unaffected, as it doesn’t and most likely won’t compete with Netezza (different database size ranges).
- For the most part, software-only vendors are free to adapt or not as they choose. Hardware prices generally don’t need to be over $10K/terabyte, and in some cases could be a lot less. So the question is how far they’re willing to discount their software.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Dataupia, Exadata, Kickfire, Oracle, Pricing, Teradata, XtremeData | 14 Comments |
Netezza’s worldwide show-and-tell
In this economy, conference attendance is way down. Accordingly, a number of vendors have reevaluated whether it makes sense to have a traditional big-bang user conference, or whether it might make more sense to do a tour, bringing their message to multiple geographical areas. Netezza has opted for the latter course, something I’ve been well aware of for two reasons:
- Planning for the conferences and for Netezza’s product roll-out is of course coordinated, and product roll-out is something I advise my clients on.
- Netezza engaged me to speak at six different versions of the event (i.e., America and Europe, but not the Far East). There’s still time to contribute suggestions about my talk here.
Apparently, I’ll be talking late morning each time. My dates are:
- September 2, Boston
- September 9, Washington, DC
- September 15, Milan
- September 17, London
- September 24, San Francisco
- September 29, Chicago
The brand name of the events is Enzee Universe. Locations, registration information, and other particulars may be found on the Enzee Universe website.
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Presentations | 2 Comments |
Netezza is changing its hardware architecture and slashing prices accordingly
Netezza is about to make its biggest product announcement in years. In particular:
- Netezza is cutting prices to under $20K/terabyte of user data, with even lower numbers promised for the near future.
- Netezza is replacing its PowerPC chips with Intel-based IBM blades.
- There will be substantial changes in how data flows between the various parts of a Netezza node.
- Netezza claims this will all produce an immediate 10-15X increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance. (Edit: Netezza now agrees that it shouldn’t have phrased things that way”.)
Allow me to explain. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Pricing, Theory and architecture | 35 Comments |
XtremeData announces its DBx data warehouse appliance
XtremeData is announcing its DBx data warehouse appliance today. Highlights include: Read more
Categories: Benchmarks and POCs, Data warehouse appliances, Data warehousing, Pricing, XtremeData | 34 Comments |
Netezza on concurrency and workload management
I visited Netezza Friday for what was mainly an NDA meeting. But while I was there I asked where Netezza stood on concurrency, workload management, and rapid data mart spin-out. Netezza’s claims in those regards turned out to be surprisingly strong.
In the biggest surprise, Netezza claimed at least one customer had >5,000 simultaneous users, and a second had >4,000. Both are household names. Other unspecified Netezza customers apparently also have >1,000 simultaneous users. Read more
Categories: Data warehouse appliances, Data warehousing, Netezza, Teradata, Theory and architecture, Workload management | 13 Comments |
Vertica customer notes
Dave Menninger of Vertica called to discuss NDA product futures, as vendors tend to do in the weeks before a TDWI conference. So we also talked a bit about the Vertica customer base. That’s listed as 86 at the end of Q2, up from 74 in Q1. That’s pretty small growth compared with Q1, which Dave didn’t fully explain. But then, off the top of his head, he was recalling Q1 numbers as being lower than that 74, so maybe there’s a reporting glitch in the loop somewhere.
Vertica’s two biggest customer segments are telecommunications and financial services, and Dave drew an interesting distinction between what the two groups care about. Telecom companies care about data warehouses that are big and 24/7 reliable, but don’t do particularly complex analytics. Financial services — by which he presumably means mainly proprietary traders — are most focused on complex and competitively innovative analytics.
Also mentioned in various contexts were web-based outfits such as data mart outsourcers, social networkers, and open-source software providers.
Vertica also offers customer win stories in other segments, but most actual discussion about what Vertica does revolves around the application areas mentioned above, just as it has been in the past.
Similar (not necessarily identical) generalizations would be true of many other analytic DBMS vendors.
Update on Microsoft’s Madison and Fast Track data warehouse products
I chatted with Stuart Frost of Microsoft yesterday. Stuart is and remains GM of Microsoft’s data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart’s departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.
Microsoft Madison availability remains scheduled for H1 2010. Nothing new there. Tangible progress includes a few customer commitments of various sorts, including one outright planned purchase (due to some internal customer considerations around using up a budget). At the moment various Microsoft Madison technology “previews” are going on, which seem to amount to proofs-of-concept, that:
- Start with actual customer data (some from Microsoft, some from outside)
- Generate larger synthesized data sets based on those (database size seems to be 10-100 TB)
- Run in Microsoft data centers or “technology centers”, rather than on customer premises.
The basic Microsoft Madison product distribution strategy seems to be: Read more
Oracle cites Exadata wins
A couple of weeks ago, Oracle put out a press release about Exadata wins. Highlights include:
- 20 names of actual customers.
- One quote citing a competitive win (over Netezza)
- One quote citing a ~50X speedup of one query “without manual tuning”
- One quote citing consistent 10-72X query performance speedups
- One quote citing a speedup from “days” to “minutes”
Unless I missed it, none of the quotes implied Exadata was actually in production, and none compared hardware between the old/slow/production and Exadata/fast/test systems.