After months of leaks, Teradata has unveiled its new lines of data warehouse appliances, raising the total number either from 1 to 3 (my view) or 0 to 2 (what you believe if you think Teradata wasn’t previously an appliance vendor). Most significant is the new Teradata 2500 series, meant to compete directly with the smaller data warehouse specialists. Highlights include:
- An oddly precise estimated capacity of “6.12 terabytes”/node (user data). This estimate is based on 30% compression, which is low by industry standards, and surely explains part of the price umbrella the Teradata 2500 is offering other vendors.
- $125K/TB of user data. Obviously, list pricing and actual pricing aren’t the same thing, and many vendors don’t even bother to disclose official price lists. But the Teradata 2500 seems more expensive than most smaller-vendor alternatives.
- Scalability up to 24 nodes (>140 TB).
- Full Teradata application-facing functionality. Some of Teradata’s rivals are still working on getting all of their certifications with tier-1 and tier-2 business intelligence tools. Teradata has a rich application ecosystem.
- What will be controversial performance, until customer-benchmark trends clearly emerge.
The Teradata 2500 is coming out of the chute with two customers – a new-customer retailer buying a single cabinet (i.e., 6.12 TB), and an existing customer for whom fewer details seem available. So far as I can tell, the sales force has had the product since late January, although the first leaks I got incorrectly suggested the system would only scale to a limited number of nodes.
Other products in the announcement included:
- The Teradata 5550, a routine annual upgrade to the Teradata 5500.
- The Teradata 550. This is a low-end, single-server SMP box introduced 9 or so months ago, originally meant for application development and testing. But some customers have been using it for deployment, and Teradata is now officially acknowledging that. It only scales to 2-3 TB of user data.
The Teradata 2500′s performance should be below the Teradata 5550′s for three reasons:
- More disk per node.
- Less CPU per node (2 cores vs. 4).
- The removal of some “workload management” performance features found in the 5500 series.
The same considerations apply to a comparison between the Teradata 2500 and the older Teradata 5000, but in that case they’re offset by a year of Moore’s Law benefit.
Teradata’s performance claims for the 2500, in essence, are:
- The 2500 is focused on decision-support applications, where all that workload-management stuff doesn’t matter as much.
- Although we can do additional things well our competitors can’t, we also rival them in performance in their sweet area, namely sequential/table-scan-oriented decision support.
- In fact, we beat them on lots of customer benchmarks.
- By the way, even the simplified workload management capability gives good concurrency when compared with what the little guys offer.
Teradata competitors’ stories are along the lines of:
- We clobber Teradata in customer benchmarks.
- Now they’re offering a system a lot slower than the ones we already beat.
DATAllegro offers a detailed critique of the Teradata 2500 based on pre-release information, both on functionality and the numbers. (E.g., they argue that 6.12 TB of user data counted the Teradata way isn’t as much as it sounds like; I’m checking on that.)
So what does this all mean? If the Teradata 2500 were as aggressively priced as I originally thought (my bad – I simply misheard their per-terabyte prices for absolute figures), this announcement would be a huge event. As matters stand – well, DBMS and other enterprise vendors’ “crippled” products don’t have a stellar history. I wouldn’t be surprised if, a year from now, we saw an upgraded Teradata 2500 series, with more aggressive pricing and features.
Alternatively: In the initial release, Teradata has chosen not to have any interoperability between the 5500, 2500, and 550 series. I think that should and perhaps will change, with the 55xx and 25xx working together in a hub/spoke manner. Otherwise, missing-features arguments like the one DATAllegro makes will be too compelling. For that matter, I wouldn’t be surprised if Teradata bought a smaller rival, in which case heterogeneous hub/spoke synchronization would be a really good idea as soon as they could implement it.
If hub/spoke integration is one feature I’d recommend Teradata get cracking on, the other – and even bigger – one is compression. All CPU/disk trade-offs notwithstanding, better compression is an obvious and big price/performance win.