Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
How to tell Teradata’s product lines apart
Once Netezza hit the market, Teradata had a classic “disruptive” price problem – it offered a high end product, at a high price, sporting lots of features that not all customers needed or were willing to pay for. Teradata has at times slashed prices in competitive situations, but there are obvious risks to that, especially when a customer already has a number of other Teradata systems for which it paid closer to full price.
This year, Teradata has introduced a range of products that flesh out its competitive lineup. There now are three mainstream Teradata offerings, plus two with more specialized applicability. Teradata no longer has to sell Cadillacs to customers on Corolla budgets.
But how do we tell the five Teradata product lines apart? The names are confusing, both in their hardware-vendor product numbers and their data-warehousing-dogma product names, especially since in real life Teradata products’ capabilities overlap. Indeed, Teradata executives freely admit that the Teradata Data Mart Appliance 551 can run smaller data warehouses, while the Teradata Data Warehouse Appliance 2550 is positioned in large part at what Teradata quite reasonably calls data marts.
When one looks past the difficulties of naming, Teradata’s product lineup begins to make more sense. Let’s start by considering the three main Teradata products.
| Categories: Data warehouse appliances, Data warehousing, Netezza, Pricing, Teradata | 9 Comments |
Introduction to Kickfire
I’ve spent a few hours visiting or otherwise talking with my new clients at Kickfire recently, so I think I have a better feel for their story. A few details are still missing, however, either because I didn’t get around to asking about them, or because an unexplained accident corrupted my notes (and I wasn’t even using Office 2007). Highlights include:
| Categories: Columnar database management, Data warehouse appliances, Data warehousing, Kickfire, MySQL, Theory and architecture | Leave a Comment |
Quick guide to Teradata’s announcements this week
The Teradata Partners (i.e., user) conference is this week. So there have been lots of press releases, some presentations, lots of meetings, and so on. A lot of Teradata’s messaging is in flux, as it moves fairly rapidly to correct what I believe have been some deficiencies in the past. One confusing result is that there was very little prebriefing about the actual announcement details, and we’re all scrambling to figure out what’s up.
Teradata does a good job of collecting its press releases at one URL. So without linking to most of them individually, let me jump in to an overview of Teradata news this week (whether or not in actual press release format): Read more
| Categories: Data warehouse appliances, Data warehousing, Teradata | 9 Comments |
A data warehouse pricing complication: Software vs. appliances
Juan Loaiza of Oracle disagrees with a number of my opinions. We plan to talk about some of that when I visit on Thursday, after Teradata Partners.
But I’d like to throw one of his ideas out there right now. Juan contends that comparisons of Oracle Exadata pricing are apt to be misleading because — among other reasons — Oracle licenses can be reused on other hardware, in ways that appliance software can not. (The same reasoning would of course apply to almost everybody else except Teradata and Netezza.) Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle, Pricing | 2 Comments |
Advance sound bites on the Microsoft/DATAllegro announcement
Microsoft said they’d prebrief me on at least the DATAllegro part of tomorrow’s SQL Server announcements, but that didn’t turn out to happen (at least as of 9 pm Eastern time Sunday night). An embargoed press release did just arrive, but it’s so concise and high-level as to contain almost nothing of interest.
So I might as well post sound bites in advance. Here goes:
- With the DATAllegro acquisition, Microsoft leapfrogged Oracle. But with Exadata, Oracle leapfrogged Microsoft back. Exadata is actually shipping.
- There’s no assurance that the first DATAllegro/Microsoft release will inherit SQL Server’s level of concurrency. After all, DATAllegro/Ingres wasn’t as concurrent as plain Ingres.
- Porting DATAllegro from Ingres to SQL Server is likely to be straightforward. If they screw up it will be because they tried to do too much else at the same time, not because the basic port failed.
- Porting DATAllegro from Linux to Windows should also be OK. DATAllegro doesn’t stress the operating system in the areas where Windows remains weak.
- Earlier this year, DATAllegro had exactly one customer known to be in production, but I’ve spoken with that one. It’s TEOCO, which has a multi-hundred terabyte DATAllegro installation. TEOCO is a very price-oriented buyer.
- DATAllegro reports that two more customers are in production with large systems now. Neither of those is believed by industry sources to be especially in love with DATAllegro. Otherwise, nobody seems able and willing to identify other DATAllegro customers.
I’m going to be pretty busy Monday anyway. Linda is having a bit of oral surgery. And if I get back from that in time, I have calls set up with a couple of clients.
| Categories: DATAllegro, Data warehouse appliances, Data warehousing, Microsoft and SQL*Server | 2 Comments |
History, focus, and technology of HP Neoview
On the basis of market impact to date, HP Neoview is just another data warehouse market participant – a dozen sales or so, a few systems in production, some evidence that it can handle 100 TB+ workloads, and so on. But HP’s BI Group CTO Greg Battas thinks Neoview is destined for greater things, because:
| Categories: Data warehouse appliances, Data warehousing, HP and Neoview | 4 Comments |
HP Neoview in the market to date
I evidently got HP’s attention by a recent post in which I questioned its stance on the relative positioning of the Exadata-based HP Oracle data warehouse appliance and the HP Neoview data warehouse appliance. A conversation with Greg Battas and John Miller (respectively CTO and CMO of HP’s BI group) quickly ensued. Mainly we talked about Neoview product goals and architecture. But before I get to that in a separate post, here are some Neoview market-presence highlights, so far as I’ve been able to figure them out:
| Categories: Data warehouse appliances, Data warehousing, HP and Neoview | 1 Comment |
Automatic redistribution of data warehouse data
In a recent Oracle Exadata FAQ, Kevin Closson writes:
Q. [...] don’t some of the DW vendors split the data up in a shared nothing method. Thus when the data has to be repartitioned it gets expensive. Whereas here you just add another cell and ASM goes to work in the background. (depending upon the ASM power level you set.)
A. All the DW Appliance vendors implement shared-nothing so, yes, the data is chopped up into physical partitions. If you add hardware to increase performance of queries against your current dataset the data will have to be reloaded into the new partitioning scheme. As has always been the case with ASM, adding new disks-and therefore Exadata Storage Server cells-will cause the existing data to be redistributed automatically over all (including the new) drives. This ASM data redistribution is an online function.
Hmm. That sounds much like the story I’ve heard from various other data warehousing DBMS vendors as well.
Rather than try to speak for them, however, I’ll just post this and see whether they choose to add anything to the comment thread.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 7 Comments |
Greenplum pricing
Edit: Actually, this post is completely incorrect. The $20K/terabyte is for software only. So far, my attempts to get Greenplum to estimate hardware costs have been unsuccessful.
Greenplum’s Scott Yara was recently quoted citing a $20K/terabyte figure for Greenplum pricing. That naturally raises the question:
Greenplum charges around $20K/terabyte of what?
| Categories: Data warehouse appliances, Data warehousing, Greenplum, Pricing | 3 Comments |
Oracle Database Machine and Exadata pricing: Part 2
My Oracle Database Machine and Exadata pricing spreadsheet has been updated. Specifically:
- The first page has been modestly altered to accommodate more chargeable software options, as per the discussion below.
- Accordingly, my new estimate for HP Oracle Database Machine list price is $5,546,000. Per-terabyte prices (user data) are $60K and $198K for the two configurations.
- There’s a whole new second page, for Exadata configurations smaller than a full Oracle Database Machine. Most of the work on that was done by Bence Arató of BI Consulting (Hungary), who graciously gave me permission to post it.
- The lowest per-terabyte Exadata price estimates are about 20% lower than for the full Oracle Database Machine. The difference is due mainly to eliminating Real Application Clusters for a single-node SMP machine, and secondarily to rounding down slightly on server hardware capacity. But these are rough estimates, as neither Bence nor I is a hardware pricing guy.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle, Pricing | 8 Comments |
Eric Lai on Oracle Exadata, and some addenda
Eric Lai offers a detailed FAQ on Oracle Exadata, including a good selection of links and quotes. I’d like to offer a few comments in response: Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, Greenplum, Netezza, Oracle, Pricing | 4 Comments |
Exadata and Oracle Database Machine parallelization clarified
Some kind Oracle development managers have reached out and helped me better understand where Oracle does or doesn’t stand in query and analytic parallelization. This post supersedes prior discussions of the subject over the past week. Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle, Parallelization | 10 Comments |
Oracle Database Machine performance and compression
Greg Rahn was kind enough to recite in his blog what Oracle has disclosed about the first Exadata testers. I don’t track hardware model details, so I don’t know how the testers’ respective current hardware environments compare to that of the Oracle Database Machine.
Each of the customers cited below received “half” an Oracle Database Machine. As I previously noted, an Oracle Database Machine holds either 14.0 or 46.2 terabytes of uncompressed data. This suggests the 220 TB customer listed below — LGR Telecommunications — got compression of a little under 10:1 for a CDR (Call Detail Record) database. By comparison, Vertica claims 8:1 compression on CDRs.
Greg also writes of POS (Point Of Sale) data being used for the demo. If you do the arithmetic on the throughput figures (13.5 vs. a little over 3), compression was a little under 4.5:1. I don’t know what other vendors claim for POS compression.
Here are the details Greg posted about the four most open Oracle Database Machine tests: Read more
| Categories: Data warehouse appliances, Data warehousing, Database compression, Exadata, Oracle, Telecommunications | 8 Comments |
Oracle Exadata list pricing
The figures in this post have now been updated. There’s a new spreadsheet at that link as well.
I’ve been trying to figure out how much Oracle Exadata actually costs. My first cut comes up with prices of $58-190K/TB (user data), based on a total system price of $5,322,000, and user data figures of 28 and 92.4 TB for the two available sizes of disk drive. But of course there are a lot of uncertainties in these figures. You can use this spreadsheet (Edit: That’s the old one) to see where the final numbers come from, and to modify the estimates as you see fit. Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle, Pricing | 10 Comments |
Oracle Exadata Smart Scan Join Processing
Oracle has put up an Exadata white paper (hat tip to Kevin Closson’s Exadata FAQ). There’s a section on Smart Scan Join Processing. Sounds exciting, huh? It reads, in its entirety:
Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom Filters, which are a very efficient probabilistic method to determine whether a row is a member of the desired result set.
Jeez. That almost sounds as if Exadata is an immature, Release 1 data warehouse appliance!
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 14 Comments |
So what does Oracle Exadata mean for HP Neoview?
That HP is committed to selling a lot of data warehouse hardware — and probably data warehouse appliances in particular — seems obvious, for reasons including:
- HP bought a big BI/data warehousing consulting operation in Knightsbridge.
- HP has put considerable effort into its data warehouse appliance Neoview.
- HP CEO Mark Hurd comes from data warehouse appliance vendor Teradata.
- Data warehousing where the big bucks are.
But Oracle Exadata could produce those appliance sales. So where does HP Neoview fit in?
I was told by an investor today that HP’s investor relations department is saying Oracle Exadata is a Netezza competitor, while Neoview is more in the Teradata market. That’s laughable. Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, Netezza, Teradata | 14 Comments |
Oracle Exadata and Oracle data warehouse appliance sound bites
In addition to my previously posted thoughts on the Oracle Exadata/data warehouse appliance announcement, let me offer some more concise observations.
- Microsoft had leapfrogged Oracle with its DATAllegro acquisition. Now Oracle’s back in the game.
- But Oracle Exadata Release 1 is hardly going to put Teradata, Netezza, or Greenplum out of business.
- After long denying it, Oracle has finally admitted that putting more than 10 TB on Oracle had been an extremely painful thing to do.
- Oracle’s idea of splitting database processing between a couple of types of server is a smart one, and is consistent with what multiple other vendors are doing.
- Medium-long term, the Exadata technical strategy could work very well. Exadata storage management addresses some of the problems with shared-everything; Oracle RAC addresses other; and it may not take many releases before Oracle gets query parallelization right as well. Edit: This point is superseded by my updated take on Oracle query parallelization.
- Now Oracle and Microsoft are both supporting Infiniband for high end data warehousing.
- Oracle’s Exadata-based appliance doesn’t have the out-of-the-box simplicity that other appliances and analytic DBMS do.
- Licensing details aren’t yet clear, but Oracle Exadata’s list price probably won’t be terribly appealing either. Of course, nobody in their right mind pays Oracle list prices anyway.
- New web-based businesses have no reason to buy the Oracle data warehouse appliance. Exadata makes sense only for established enterprises.
Contradicting all that potential goodness, Oracle has been making ringing anti-shared-nothing statements, such as the silly:
There are “speed-of-light issues” associated with … scale-out-style grids
That mindset doesn’t auger well for Oracle to ever be a fully competitive high-end data warehouse DBMS vendor.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle | 5 Comments |
Oracle announcements next week, data warehouse appliance, 11g R2 or otherwise
Eric Lai and Chris Kanarcus put up an article on Oracle’s announcements next week. Much of the speculation revolved around generic grid/clustering, with more detail than I posted yesterday. Most interesting to me was the last section of the article, which sounds as if it could be talking about the same thing Luke Lonergan referred to in a comment thread when he said:
Oracle is about to unveil a secret project that uses HP DL185 servers as storage devices with some predicate pushdowns to implement a data warehouse “appliance”.
| Categories: Data warehouse appliances, Data warehousing, Oracle | 1 Comment |
Netezza overseas
22% of Netezza’s revenue comes from outside the US, at least if we use last quarter’s figures as a guide. At first blush, that doesn’t sound like much. Indeed, percentage-wise it surely lags behind Teradata, Greenplum (which has sold a lot in Asia/Pacific under Netezza’s former head of that region), and a few smaller competitors headquartered outside the US. But a few conversations I had today suggest a rosier view. Read more
| Categories: Data warehouse appliances, Data warehousing, Greenplum, Kognitio, Market share, Netezza, Teradata | Leave a Comment |
Netezza application areas
I’m at the Netezza “Enzee” user conference in Orlando. So one or more Netezza posts are in order.
One theme of the brief analyst meeting was Netezza’s increasing business focus on vertical markets. In particular, Netezza is hiring managers for a range of vertical markets. The commercial ones cited (at various levels of maturity) included: Read more
| Categories: Application areas, Data warehouse appliances, Data warehousing, Market share, Netezza, Telecommunications | Leave a Comment |
Teradata sound bites
In connection with Teradata’s attempt to get into the Netezza news cycle with an appliance product announcement, I’ve whipped up a few Teradata-related sound bites suitable for quoting.
- Teradata has been in the data warehouse appliance business since 1984. I’m glad they’re finally admitting it.
- Teradata’s users love them. The users’ bosses, who sign the checks, aren’t as thrilled. Price competition is a big issue for Teradata.
- Teradata pricing has caused some real resistance, and even anger. Price is the big reason some startups are growing so much faster than Teradata. Ease of installation is sometimes a second factor.
- Teradata isn’t going to win many price-per-terabyte shootouts. (Note: I mean price per terabyte of user data.)
- The 5-10X+ performance advantage isn’t as crazy as it sounds, at least for some use cases. Teradata does still get a lot of business, and wins some price/performance shootouts to get it.
- Many Teradata customers are buying newer analytic DBMS as well. But they aren’t throwing out Teradata. Most stories of Teradata replacements are misunderstandings.
- The analytic DBMS startups all still do most of their business supporting data marts. If you have a high-concurrency workload, you usually need more mature technology. That’s where Teradata shines.
- That said, the very largest data warehouses are usually really data marts. High-concurrency BI is usually run against somewhat smaller databases.
- The upper limit for data warehouse sizes is skyrocketing. In 18 months, we’re seeing the largest known production systems go from under 1 petabyte of user data to multiple petabytes.
- Teradata has more competition for the very largest databases than it used to, which are now being found in relatively young web companies even more than in old-line telcos, retailers, or banks.
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Teradata | Leave a Comment |
Teradata decides to compete head-on as a data warehouse appliance vendor
In a press release today that is surely timed to impinge on the Netezza user conference news cycle, Teradata has come out swinging. Highlights include:
- Teradata, which long avoided the “appliance” term, now says it sells both “data warehouse appliances” and “data mart appliances.” Indeed, it claims to have “invented the original appliance” — which is pretty close to being true.*
- Teradata claims its “new appliance easily delivers up to 5 to 10 times performance improvement over competitors’ appliances,” at $119,000 per terabyte US list price.
- Teradata claims a 150% faster “scan rate” than competitors. Teradata is surely thinking of Netezza when saying that.
- Teradata claims 10X performance improvement on “selected queries” vs. the “competition.”
- Teradata thinks its geospatial data management capability is better than competitors’, and that this is an important indicator of Teradata’s general overall greater sophistication.
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, GIS and geospatial, Netezza, Teradata | 3 Comments |
Some Netezza customer metrics
From the conference call based on Netezza’s July, 2008 Q1, as of the end of Q1:
- There are now 191 Netezza customers.
- 18 of those were new.
- 78% of Netezza’s business was in North America and 22% was international.
- Netezza operates in 10 countries.
- “The top 4 vertical markets represented approximately 75% of our business, with those markets being telcos, retail, financial services, and the analytic service provider segment. “
- One analytic service provider was greater than 10% of revenue for the quarter, and is expected to keep buying a lot in subsequent quarters. Also, one analytic service provider standardized on Netezza. I’m guessing that’s the same customer.
- “We ended the quarter with 45 [quota] carrying teams made up of a sales rep and a systems engineer and our plan is to continue to hire direct sales teams at the pace of 3 to 5 per quarter every quarter. These direct reps accounted for 85% of the business while the indirect activity was 15% this quarter.”
| Categories: Application areas, Data mart outsourcing, Data warehouse appliances, Data warehousing, Market share, Netezza, Telecommunications | 1 Comment |
Teradata’s major vertical markets in 2007
From a May, 2008 earnings conference call transcript:
- telecommunication, media and entertainment industry is 28%;
- financial services is 24%;
- retail is 19% of our revenues last year;
- manufacturing 9%;
- government 7%;
- travel and transportation 6%;
- and healthcare 5%.
| Categories: Application areas, Data warehouse appliances, Data warehousing, Telecommunications, Teradata | Leave a Comment |
Teradata/Netezza/Tesco kerfuffle
Netezza evidently put out a press release bragging of a competitive replacement of Teradata at UK retailing giant Tesco. That press release cannot be now found on Netezza’s site, but it lives on elsewhere. Meanwhile, Teradata has put out a press release in which Tesco is quoted emphatically contradicting what it is quoted as saying in the Netezza press release. While I haven’t discussed this with Netezza, my guess is that somebody there got a little overenthusiastic in advance of their user conference next week and thought they’d gotten a permission they really hadn’t.
Beyond that, I’d note that the Netezza quote made reference to around 25 heavy analytical users, while the Teradata quote talked of 8000 people across more than 2000 suppliers.
