Data warehouse appliances
Analysis of data warehouse appliances – i.e., of hardware/software bundles optimized for fast query and analysis of large volumes of (usually) relational data. Related subjects include:
- Data warehousing
- Parallelization
- Netezza
- DATAllegro
- Teradata
- Kickfire
- (in The Monash Report) Computing appliances in multiple domains
Netezza Skimmer
As I previously complained, last week wasn’t a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this morning, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.
That said, highlights of my Netezza Skimmer briefing included:
- In essence, Netezza Skimmer is 1/3 of Netezza’s previously smallest appliance, for 1/3 the price.
- I.e., Netezza Skimmer has 1 S-blade and 9 disks, vs. 3 S-blades and 24 disks on the Netezza TwinFin 3.
- With 1 disk reserved as a hot spare, that boils down to a 1:1:1 ratio among CPU cores, FPGA cores, and 1-terabyte disks on Netezza skimmer. The same could pretty much be said of Netezza TwinFin, the occasional hot-spare disk notwithstanding.
- Netezza Skimmer costs $125K.
- With 2.8 or so TB of space for user data before compression, that’s right in line with the Netezza price point of slightly <$20K/terabyte of user data.
- That assumes Netezza’s usual 2.25X compression. I forgot to ask when 4X compression was actually being shipped.
- I forgot to ask, but it seems obvious that Netezza Skimmer uses identical or substantially similar components to Netezza TwinFin’s.
- Netezza Skimmer is 7 rack units high.
- In place of the SMP hosts on TwinFin Systems, Netezza Skimmer has a host blade.
- Netezza (specifically Phil Francisco) mentioned that when Kalido uses Netezza Skimmer for its appliance, there will be an additional host computer, but when it uses TwinFin for the same software, the built-in host will suffice. (Even so, I suspect it might be too strong to say that Skimmer’s built-in host computer is underpowered.)
- Netezza also suggested that more appliance OEMs are coming down the pike specifically focused on the affordable Skimmer.
| Categories: Data mart outsourcing, Data warehouse appliances, Data warehousing, Netezza, Pricing | 1 Comment |
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
Comments on a fabricated press release quote
My clients at Kickfire put out a press release last week quoting me as saying things I neither said nor believe. The press release is about a “Queen For A Day” kind of contest announced way back in April, in which users were invited to submit stories of their data warehouse problems, with the biggest sob stories winning free Kickfire appliances. The fabricated “quote” reads: Read more
| Categories: About this blog, Data warehouse appliances, Data warehousing, Kickfire, Market share, Sybase | 2 Comments |
Boston Big Data Summit keynote outline
Last month, Bob Zurek asked me to give a talk on “Big Data”, where “big” is anything from a few terabytes on up, then moderate a panel on cloud computing. We agreed that I could talk just from notes, without slides. So, since I have them typed up, I’m posting them below.
Teradata hardware strategy and tactics
In my opinion, the most important takeaways about Teradata’s hardware strategy from the Teradata Partners conference last week are:
- Teradata’s future lies in solid-state memory. That’s in line with what Carson Schmidt told me six months ago.
- To Teradata’s surprise, the solid-state future is imminent. Teradata is 6-9 months further along with solid-state drives (SSD) than it thought a year ago it would be at this point.
- Short-term, Teradata is going to increase the number of appliance kinds it sells. I didn’t actually get details on anything but the new SSD-based Blurr, but it seems there will be others as well.
- Teradata’s eventual future is to mix and match parts (especially different kinds of storage) in a more modular product line. Teradata Virtual Storage is of pretty limited value otherwise. I probably believe Teradata will go modular more emphatically than Teradata itself does, because I think doing so will meet users needs more effectively than if Teradata relies strictly on fixed appliance configurations.
In addition, some non-SSD componentry tidbits from Carson Schmidt include:
- Teradata really likes Intel’s Nehalem CPUs, with special reference to multi-threading, QuickPath interconnect, and integrated memory controller. Obviously, Nehalem-based Teradata boxes should be expected in the not too distant future.
- Teradata really likes Nehalem’s successor Westmere too, and expects to be pretty fast to market with it (faster than with Nehalem) because Nehalem and Westmere are plug-compatible in motherboards.
- Teradata will go to 10-gigabit Ethernet for external connectivity on all its equipment, which should improve load performance.
- Teradata will also go to 10-gigabit Ethernet to play the Bynet role on appliances. Tests are indicating this improves query performance.
- What’s more, Teradata believes there will be no practical scale-out limitations with 10-gigabit Ethernet.
- Teradata hasn’t decided yet what to do about 2.5” SFF (Small Form Factor) disk drives, but is leaning favorably. Benefits would include lower power consumption and smaller cabinets.
- Also on Carson’s list of “exciting” future technologies is SAS 2.0, which at 6 gigabits/second doubles the I/O bandwidth of SAS 1.0.
- Carson is even excited about removing universal power supplies from the cabinets, increasing space for other components.
- Teradata picked Intel’s Host Bus Adapters for 10-gigabit Ethernet. The switch supplier hasn’t been determined yet.
Let’s get back now to SSDs, because over the next few years they’re the potential game-changer. Read more
| Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Teradata, Uncategorized | 10 Comments |
Reports of perfectly-balanced hardware configurations are greatly exaggerated
Data warehouse appliance and software appliance vendors like to claim that they’ve worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:
- Teradata ascribes considerable importance to a Virtual Storage technology whose main purpose is to allow mixing of heterogeneous storage devices in a single system. And the discussion rarely suggests that these parts will be in a rigid fixed relationship.
- Netezza — as Teradata keeps reminding me — often sells boxes with the expectation that they won’t be filled with data, so as to increase spindle count and hence performance.
- Oracle/Sun have dropped some comments about Exadata being more flexibly configured going forward.
- Kickfire’s new “high-end” appliance lets you attach fairly arbitrary amounts of external storage.
- And of course, software-only analytic DBMS vendors run their software in all sorts of hardware and storage environments.
What’s more, the claim never made a lot of sense anyway. With the rarest of exceptions, even a single data warehouse’s workload will contain different queries that strain different parts of the system in different ratios. Calculating the “ideal” hardware configuration for that single workload would be forbiddingly difficult. And even if one could calculate it, it almost surely would be different than another user’s “ideal” configuration. How a single hardware configuration can be “ideally balanced” for a broad class of use cases boggles the imagination.
| Categories: Data warehouse appliances, Data warehousing, Exadata, Kickfire, Netezza, Oracle, Teradata | 5 Comments |
This week at the Teradata Partners user conference
Teradata tells me that its press embargoes are ending at 9:00 this morning. Here are some highlights of what’s going on, although names, dates, and details will have to await conversations and press releases this week.
- Teradata is productizing “private cloud,” under names including “Teradata Enterprise Analytics Cloud,” “Teradata Agile Analytics Cloud,” and “Teradata Elastic Mart Builder.” I.e., Teradata hopes to leapfrog Greenplum in its “Enterprise Data Cloud” strategy. This is only fair, in that Greenplum lifted the idea from Teradata and eBay in the first place. It also provides major support for what I think is an extremely sensible trend. Give or take issues of who announces and ships what a couple months before or after a competitor, my early thinking is that the main differences between Greenplum and Teradata in this regard will be:
- Virtual as opposed to just physical data marts, based on robust workload management software. (Advantage: Teradata)
- Pricing, deployment options. (Advantage: Greenplum)
- Features that don’t directly relate to enterprise/private cloud. (Advantage: Either, often Teradata.)
- Teradata is generally strengthening its data movement technology, e.g. for making various appliances work in sync. I’m not too clear yet on the details of that. I think this is what Teradata’s phrase “ecosystem management” refers to.
- Teradata is (pre-)announcing – at least as a statement of direction — an appliance based on solid-state drives (SSDs). I’ve thought for a while that Teradata was a leader in thinking through the issues around solid-state memory in data warehousing, so it makes sense that they’re among the leaders in actually coming to market as well. I plan to say more after meeting with, e.g., Carson Schmidt.
- Teradata has achieved a 300%ish speed-up in geospatial processing. I gather this is largely a byproduct of the parallel analytics work Teradata did around strengthening its SAS integration. However, there don’t seem to be a lot of Teradata geospatial users yet.
- Teradata Express, Teradata’s free Windows-based crippleware, is being ported to Amazon EC2 and VMware as well. Presumably to avoid cannibalizing Teradata product sales, there are quite a few limitations on Teradata Express, including system capacity, database size, and “no production use.”
- Teradata continues to extend its optimizations to handle queries issued by business intelligence tools. Previously, the focus of what Teradata discussed in this regard was query rewrite. But soon automatic recommendation and creation of Aggregate Join Indexes – i.e.., materialized views – will be included as well.
Greenplum customer notes
In a briefing about a forthcoming product announcement, Greenplum threw in a slide saying:
- Greenplum is getting 12-15 new (paying) customers per quarter, all of whom it fondly refers to as “Tier 1″ enterprises.
- Greenplum will hit the 100+ customer mark this quarter (thus joining Vertica and Infobright).
- <10% of Greenplum business is now “influenced” by Sun hardware.
I asked Ben Werther to unpack that last claim for me. He quickly noted that it wasn’t his slide, but rather had been put together by colleagues. That said:
- As of the past quarter or two, <10% of Greenplum’s sales activity is on Sun, which works out to maybe one sale per quarter and at most a small number of sales cycles. (That’s down from from 50%+ not that long ago.)
- Most Greenplum business is now on HP or Dell equipment. Some is on IBM. There are some interesting sales cycles on Cisco’s new UCS (Unified Computing System) blades, but no closed deals yet. EMC seems to be part of the Cisco story.
No doubt part of the reason for the move away from Sun equipment is the impending Oracle acquisition. Another may be that the Greenplum/Sun appliance is somewhat underpowered. E.g., without particularly high levels of compression, eBay puts over 60 terabytes of data on each Greenplum node, which probably isn’t ideal from the standpoint of query performance.
Greenplum also says that 50% or so of sales are subscription-priced, rather than perpetual-licensed. I don’t have a sense for how long that’s been going on. (Edit: Ben Werther tells me this has been true for over a year.)
| Categories: Data warehouse appliances, Data warehousing, Greenplum, Market share, Pricing | 1 Comment |
Kickfire capacity and pricing
Kickfire’s marketing communication efforts are still a work in progress. Kickfire did finally relax its secrecy about FPGA-vs.-custom-silicon – not coincidentally during Netezza’s recent publicity cycle. That wise choice helped Kickfire get some favorable attention recently for its technical and market strategy, e.g. from Daniel Abadi, Merv Adrian and, kicking things off — as it were — me. Weeks after a recent Kickfire product release, there’s finally a fairly accurate data sheet up, although there’s still one self-defeatingly misleading line I’ll comment on below. Pricing is a whole other area of confusion, although it seems that current list prices have been inadvertently* leaked in Merv’s post linked above, with only one inaccuracy that I can detect.**
*I gather from the company that they forgot to tell Merv pricing was NDA.
** Merv cited a price as “starting” that I believe to be top-of-the-line. No criticism of Merv is implied in that; Kickfire has not been very clear in communicating hard numbers.
All that said, if one takes Kickfire’s marketing statements literally, Kickfire list pricing is around $20-50K per terabyte for a few small, fixed, high-performance configurations. That’s all-in, for plug-and-play appliances. What’s more, that range is based on the actual published user data capacity numbers for various Kickfire models, which I think are low for several reasons:
- Kickfire doesn’t officially admit that its model with 14.4 terabytes of disk can manage more than 6 terabytes of data, even though it clearly can.
- Actually, those 14.4 terabytes of disk can be increased or lowered as you choose.
- The basic compression figures implied in those calculations seem conservative.
- Compression figures are a lot more conservative yet, in that Kickfire assumes you’ll have a lot of actual indexes on your data. I’m not sure that’s necessary for most workloads.
| Categories: Columnar database management, Data warehouse appliances, Data warehousing, Database compression, Kickfire, Pricing | 3 Comments |
Oracle Exadata 2 capacity pricing
Summary of Oracle Exadata 2 capacity pricing
Analyzing Oracle Exadata pricing is always harder than one would first think. But I’ve finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:
- If we believe Oracle’s claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin — $22-26K/TB vs. TwinFin’s <$20K — but less than the Teradata 2550.
- These figures are highly sensitive to assumptions about Oracle’s hybrid columnar compression.
- Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
- Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata’s total system price.
Longer version
When Oracle introduced Exadata last year it was, well, expensive. Exadata 2 has now been announced, and it is significantly cheaper than Exadata 1 per terabyte of user data, based on:
- Similar overall pricing
- Twice the disk capacity
- Better compression
