Teradata
Analysis of data warehousing giant Teradata. Related subjects include:
My current customer list among the analytic DBMS specialists
(This is an updated version of an August, 2008 post.)
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the analytic/data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
- Greenplum
- Infobright
- Kickfire
- Kognitio
- Microsoft
- Netezza (my biggest client this year, probably, because of all the Enzee Universe appearances)
- Sybase
- Teradata
- Vertica
- Attivio, which may or may not be construed as being in the analytic DBMS business
- Clearpace, ditto
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
| Categories: About this blog, Aster Data, Data warehousing, Greenplum, Infobright, Kickfire, Microsoft and SQL*Server, Netezza, Sybase, Teradata, Vertica Systems | 2 Comments |
Netezza Q1 earning call transcript
I finally read the Netezza Q1 earnings call transcript, put out by Seeking Alpha. Highlights included:
- Netezza got 14 new-name accounts and 21 follow-on deals. Average sale in both groups was right around $1 million.
- The economy is tough, deals are slipping, and nobody knows for sure what will happen.
- Netezza’s main head-to-head competitors are Oracle and Teradata. Netezza claims good but not perfect win rates against each, but concedes that those vendors (especially Oracle) of course get other deals Netezza never sees.
- Netezza characterizes Teradata as offering its multiple product lines, trying to upsell many customers from cheaper to more expensive product lines, and being selectively aggressive about pricing. None of this is surprising to me.
- 80% of Netezza’s Q1 revenue, and perhaps even a higher fraction of new-name accounts, was in four vertical markets: “Digital media,” telecom, government, and financial services.
- Some time over the next few months, Netezza will give at least some more clarity about future products.
One tip for the Netezza folks, by the way, from this former stock analyst — you should never use the word “certainly” about a deal you haven’t closed yet. “Almost surely” could be OK, but “certainly” — well, it certainly was not the thing to say.
The future of data marts
Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:
- Data marts aren’t just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
- Thus, it would be really cool if business users could have their own analytic “sandboxes” — virtual or physical analytic databases that they can manipulate without breaking anything else.
- In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
- Whether or not you agree with that, it’s an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it’s an empirical fact that many business users have the clout to order up new data marts as well.
- Consolidating data marts onto one common technological platform has important benefits.
In essence, Greenplum is pitching the story:
- Thesis: Enterprise Data Warehouses (EDWs)
- Antithesis: Data Warehouse Appliances
- Synthesis: Greenplum’s Enterprise Data Cloud vision
When put that starkly, it’s overstated, not least because
Specialized Analytic DBMS != Data Warehouse Appliance
But basically it makes sense, for two main reasons:
- Analysis is performed on all sorts of novel data, from sources far beyond an enterprise’s core transactions. This data neither has to fit nor particularly benefits from being tightly fitted into the core enterprise data model. Requiring it to do so is just an unnecessary and painful bureaucratic delay.
- On the other hand, consolidation can be a good idea even when systems don’t particularly interoperate. Data marts, which commonly do in part interoperate with central data stores, have all the more reason to be consolidated onto a central technology platform/stack.
Teradata Developer Exchange (DevX) begins to emerge
Every vendor needs developer-facing web resources, and Teradata turns out to have been working on a new umbrella site for its. It’s called Teradata Developer Exchange — DevX for short. Teradata DevX seems to be in a low-volume beta now, with a press release/bigger roll-out coming next week or so. Major elements are about what one would expect:
- Articles
- Blogs
- Downloads
- Surprisingly, so far as I can tell, no forums
If you’re a Teradata user, you absolutely should check out Teradata DevX. If you just research Teradata — my situation
— there are some aspects that might be of interest anyway. In particular, I found Teradata’s downloads instructive, most particularly those in the area of extensibility. Mainly, these are UDFs (User-Defined Functions), in areas such as:
- Compression
- Geospatial data
- Imitating Oracle or DB2 UDFs (as migration aids)
Also of potential interest is a custom-portlet framework for Teradata’s management tool Viewpoint. A straightforward use would be to plunk some Viewpoint data into a more general system management dashboard. A yet cooler use — and I couldn’t get a clear sense of whether anybody’s ever done this yet — would be to offer end users some insight as to how long their queries are apt to run.
| Categories: Database compression, Emulation, transparency, portability, GIS and geospatial, Teradata | 2 Comments |
eBay’s two enormous data warehouses
A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I’ve already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn’t like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I’m finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.
Metrics on eBay’s main Teradata data warehouse include:
- >2 petabytes of user data
- 10s of 1000s of users
- Millions of queries per day
- 72 nodes
- >140 GB/sec of I/O, or 2 GB/node/sec, or maybe that’s a peak when the workload is scan-heavy
- 100s of production databases being fed in
Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:
- 6 1/2 petabytes of user data
- 17 trillion records
- 150 billion new records/day, which seems to suggest an ingest rate well over 50 terabytes/day
- 96 nodes
- 200 MB/node/sec of I/O (that’s the order of magnitude difference that triggered my post on disk drives)
- 4.5 petabytes of storage
- 70% compression
- A small number of concurrent users
| Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Greenplum, Petabyte-scale data management, Teradata, Web analytics, eBay | 20 Comments |
Data warehouse storage options — cheap, expensive, or solid-state disk drives
This is a long post, so I’m going to recap the highlights up front. In the opinion of somebody I have high regard for, namely Carson Schmidt of Teradata:
- There’s currently a huge — one order of magnitude — performance difference between cheap and expensive disks for data warehousing workloads.
- New disk generations coming soon will have best-of-both-worlds aspects, combining high-end performance with lower-end cost and power consumption.
- Solid-state drives will likely add one or two orders of magnitude to performance a few years down the road. Echoing the most famous logjam in VC history — namely the 60+ hard disk companies that got venture funding in the 1980s — 20+ companies are vying to cash in.
In other news, Carson likes 10 Gigabit Ethernet, dislikes Infiniband, and is “ecstatic” about Intel’s Nehalem, which will be the basis for Teradata’s next generation of servers.
| Categories: Data warehouse appliances, Data warehousing, Solid-state memory, Storage, Teradata, eBay | 11 Comments |
The SAP/Teradata deal explained
When I first saw the press release about the latest SAP/Teradata deal, I thought it sounded very Barney. But it turns out there’s a little bit of substance, as well. Amazingly, SAP BW doesn’t really run on Teradata right now. This deal will fix that. The time frame seems to be that SAP-BW-on-Teradata will ship with SAP BW 7.2 whenever that goes out. (First half of 2010?) Early adopters may be able to get their hands on it as early as Q3 2009.
Note: It surely would be more precise to insert “NetWeaver” a few times into that paragraph.
Just to be clear — I still don’t see this as a big deal. It doesn’t portend any grand SAP/Teradata joint mission to smite Oracle, IBM, and/or Microsoft. Nor is it a telling first step toward an SAP/Teradata merger. It just removes a particular competitive disadvantage Teradata had vs. Oracle et al., from which Teradata’s smaller specialist competitors still suffer. And it offers SAP BW customers another high-quality DBMS option.
| Categories: Business intelligence, Data warehousing, SAP AG, Teradata | Leave a Comment |
eBay thinks MPP DBMS clobber MapReduce
I talked with Oliver Ratzesberger and his team at eBay last week, who I already knew to be MapReduce non-fans. This time I added more detail.
Oliver believes that, on the whole, MapReduce is 6-8X slower than native functionality in an MPP DBMS, and hence should only be used sporadically. This view is based on part on simulations eBay ran of the Terasort benchmark. On 72 Teradata nodes or 96 lower-powered nodes running another (currently unnamed, as per yet another of my PR fire drills) MPP DBMS, a simulation of Terasort executed in 78 and 120 secs respectively, which is very comparable to the times Google and Yahoo got on 1000 nodes or more.
And by the way, if you use many fewer nodes, you also consume much less floor space or electric power.
| Categories: Analytic technologies, Hadoop, MapReduce, Parallelization, Teradata, eBay | 9 Comments |
Somebody is spreading Teradata acquisition rumors again
An mass email from Tom Coffing was forwarded to me today that starts:
I have heard from reliable sources that both HP and SAP have purchased more than 5% of Teradata stock. My sources tell me that both companies appear to be positioning themselves for a bid.
I got my version of the same email from Coffing yesterday with a different introduction but otherwise the same substance (he’s pushing a new product of his). It also had a different From address.
Possible explanations include but are not limited to:
- Coffing knows something (seems unlikely, but I haven’t actually checked www.sec.gov to confirm or disconfirm)
- Coffing thinks he knows something
- Coffing just made this up (I hope not)
- There’s an April Fool’s Day prank going on (not by me — after my bizarre March, I’m recusing myself from April Fool’s pranks this year)
| Categories: Data warehousing, HP and Neoview, SAP AG, Teradata | 4 Comments |
Data warehousing business trends
I’ve talked with a whole lot of vendors recently, some here at TDWI, as well as users, fellow analysts, and so on. Repeated themes include:
| Categories: Analytic technologies, Application areas, Data mart outsourcing, Data warehousing, Microsoft and SQL*Server, MySQL, Oracle, Teradata, eBay | Leave a Comment |
Draft slides on how to select an analytic DBMS
I need to finalize an already-too-long slide deck on how to select an analytic DBMS by late Thursday night. Anybody see something I’m overlooking, or just plain got wrong?
Edit: The slides have now been finalized.
Gartner’s 2008 data warehouse database management system Magic Quadrant is out
Gartner’s annual Magic Quadrant for data warehouse DBMS is out. Thankfully, vendors don’t seem to be taking it as seriously as usual, so I didn’t immediately hear about. (I finally noticed it in a Greenplum pay-per-click ad.) Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ. My posts on the 2007 and 2006 MQs have also been updated with working links. Read more
The Teradata Accelerate program
An article in Intelligent Enterprise clued me in that Teradata has announced the Teradata Accelerate program. A little poking around revealed a press release in which — lo and behold — I am quoted,* to wit:
“The Teradata Accelerate program is a great idea. There’s no safer choice than Teradata technology plus Teradata consulting, bundled in a fixed-cost offering,” said Curt Monash, president of Monash Research. “The Teradata Purpose Built Platform Family members are optimized for a broad range of business intelligence and analytic uses.”
| Categories: Data warehousing, Pricing, Teradata | Leave a Comment |
High-performance analytics
For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:
- MapReduce – controversial or in some cases even disappointing though it may be – has a lot of use cases.
- It’s early days, but Netezza and Teradata (and others) are beefing up their geospatial analytic capabilities.
- Memory-centric analytics is in the spotlight.
Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?
Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including:
| Categories: Analytic technologies, Aster Data, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Netezza, Oracle, Parallelization, SAS Institute, Teradata | 5 Comments |
Carson Schmidt of Teradata on SSDs
Carson Schmidt is, in essence, Teradata’s VP of product development for everything other than applications and database software. For example, he oversees Teradata’s hardware, storage, and switching technology. So when Teradata Chief Development Officer Scott Gnau didn’t have answers at his fingertips to some questions about SSDs (Solid-State Drives), he bucked me over to Carson. A very interesting discussion about SSDs (and other subjects) ensued.
Highlights included: Read more
| Categories: Data warehousing, Solid-state memory, Storage, Teradata | 1 Comment |
How to tell Teradata’s product lines apart
Once Netezza hit the market, Teradata had a classic “disruptive” price problem – it offered a high end product, at a high price, sporting lots of features that not all customers needed or were willing to pay for. Teradata has at times slashed prices in competitive situations, but there are obvious risks to that, especially when a customer already has a number of other Teradata systems for which it paid closer to full price.
This year, Teradata has introduced a range of products that flesh out its competitive lineup. There now are three mainstream Teradata offerings, plus two with more specialized applicability. Teradata no longer has to sell Cadillacs to customers on Corolla budgets.
But how do we tell the five Teradata product lines apart? The names are confusing, both in their hardware-vendor product numbers and their data-warehousing-dogma product names, especially since in real life Teradata products’ capabilities overlap. Indeed, Teradata executives freely admit that the Teradata Data Mart Appliance 551 can run smaller data warehouses, while the Teradata Data Warehouse Appliance 2550 is positioned in large part at what Teradata quite reasonably calls data marts.
When one looks past the difficulties of naming, Teradata’s product lineup begins to make more sense. Let’s start by considering the three main Teradata products.
| Categories: Data warehouse appliances, Data warehousing, Netezza, Pricing, Teradata | 11 Comments |
Teradata’s Petabyte Power Players
As previously hinted, Teradata has now announced 4 of the 5 members of its “Petabyte Power Players” club. These are enterprises with 1+ petabyte of data on Teradata equipment. As is commonly the case when Teradata discusses such figures, there’s some confusion as to how they’re actually counting. But as best I can tell, Teradata is counting: Read more
| Categories: Data warehousing, Market share, Petabyte-scale data management, Specific users, Teradata, eBay | 7 Comments |
Teradata Virtual Storage
One of the big features of Teradata 13.0, announced this week (Edit: and to be shipped some time in 2009), is Teradata Virtual Storage, which sounds pretty cool. So far as I can tell, Teradata Virtual Storage has two major aspects, namely: Read more
| Categories: Data warehousing, Solid-state memory, Storage, Teradata | 2 Comments |
Teradata Geospatial, and datatype extensibility in general
As part of it’s 13.0 release this week, Teradata is productizing its geospatial datatype, which previously was just a downloadable library. (Edit: More precisely, Teradata announced 13.0, which will actually be shipped some time in 2009.) What Teradata Geospatial now amounts to is:
- User-defined functions (UDF) written by Teradata (this is the part that existed before).
- (Possibly new) Enhanced implementations of the Teradata geospatial UDFs, for better performance.
- (Definitely new) Optimizer awareness of the Teradata geospatial UDFs.
Teradata also intends in the future to implement actual geospatial indexing; candidates include r-trees and tesselation.
Hearing this was a good wake-up call for me, because in the past I’ve conflated two issues on datatype extensibility, namely:
- Whether the query executer uses a special access method (i.e., index type) for the datatype
- Whether the optimizer is aware of the datatypes.
But as Teradata just pointed out, those two issues can indeed be separated from each other.
| Categories: Data types, Data warehousing, GIS and geospatial, Teradata | 1 Comment |
Quick guide to Teradata’s announcements this week
The Teradata Partners (i.e., user) conference is this week. So there have been lots of press releases, some presentations, lots of meetings, and so on. A lot of Teradata’s messaging is in flux, as it moves fairly rapidly to correct what I believe have been some deficiencies in the past. One confusing result is that there was very little prebriefing about the actual announcement details, and we’re all scrambling to figure out what’s up.
Teradata does a good job of collecting its press releases at one URL. So without linking to most of them individually, let me jump in to an overview of Teradata news this week (whether or not in actual press release format): Read more
| Categories: Data warehouse appliances, Data warehousing, Teradata | 9 Comments |
Patrick Walravens’ SAP/Teradata speculation doesn’t make much sense
A persistent analyst named Patrick Walravens keeps speculating about an SAP acquisition of Teradata. So far as I can tell, Walravens is the sole source of this rumor, evidently because he actually thinks the combination would make some kind of business sense.
An example of the “logic” behind this theory is:
Mr. Walravens’s latest evidence pointing to such a move stems from the expected departure of a SAP executive who had been running the company’s NetWeaver software line, which includes a data warehouse package.
At a guess, Walravens is saying that Teradata’s products and SAP’s BI Accelerator somehow substitute for each other in the marketplace. If you believe that comparison, I’d like to sell you a railroad locomotive made by Jaguar. Read more
| Categories: Data warehousing, SAP AG, Teradata | 4 Comments |
Netezza and Teradata on analytic geospatial data management
Geospatial data management is one of the flavors of the month:
- Last week, Teradata claimed it has the most sophisticated analytic geospatial data management capability.
- Also last week, Netezza’s newly acquired Netezza Spatial technology attracted a lot of attention.
- This week, Oracle called attention to its geospatial capabilities.
So I asked Netezza and Teradata what this geospatial analytics stuff is all about.
| Categories: Analytic technologies, Data warehousing, GIS and geospatial, Netezza, Teradata | 3 Comments |
So what does Oracle Exadata mean for HP Neoview?
That HP is committed to selling a lot of data warehouse hardware — and probably data warehouse appliances in particular — seems obvious, for reasons including:
- HP bought a big BI/data warehousing consulting operation in Knightsbridge.
- HP has put considerable effort into its data warehouse appliance Neoview.
- HP CEO Mark Hurd comes from data warehouse appliance vendor Teradata.
- Data warehousing where the big bucks are.
But Oracle Exadata could produce those appliance sales. So where does HP Neoview fit in?
I was told by an investor today that HP’s investor relations department is saying Oracle Exadata is a Netezza competitor, while Neoview is more in the Teradata market. That’s laughable. Read more
| Categories: Data warehouse appliances, Data warehousing, Exadata, HP and Neoview, Netezza, Teradata | 16 Comments |
Netezza overseas
22% of Netezza’s revenue comes from outside the US, at least if we use last quarter’s figures as a guide. At first blush, that doesn’t sound like much. Indeed, percentage-wise it surely lags behind Teradata, Greenplum (which has sold a lot in Asia/Pacific under Netezza’s former head of that region), and a few smaller competitors headquartered outside the US. But a few conversations I had today suggest a rosier view. Read more
| Categories: Data warehouse appliances, Data warehousing, Greenplum, Kognitio, Market share, Netezza, Teradata | Leave a Comment |
Teradata sound bites
In connection with Teradata’s attempt to get into the Netezza news cycle with an appliance product announcement, I’ve whipped up a few Teradata-related sound bites suitable for quoting.
- Teradata has been in the data warehouse appliance business since 1984. I’m glad they’re finally admitting it.
- Teradata’s users love them. The users’ bosses, who sign the checks, aren’t as thrilled. Price competition is a big issue for Teradata.
- Teradata pricing has caused some real resistance, and even anger. Price is the big reason some startups are growing so much faster than Teradata. Ease of installation is sometimes a second factor.
- Teradata isn’t going to win many price-per-terabyte shootouts. (Note: I mean price per terabyte of user data.)
- The 5-10X+ performance advantage isn’t as crazy as it sounds, at least for some use cases. Teradata does still get a lot of business, and wins some price/performance shootouts to get it.
- Many Teradata customers are buying newer analytic DBMS as well. But they aren’t throwing out Teradata. Most stories of Teradata replacements are misunderstandings.
- The analytic DBMS startups all still do most of their business supporting data marts. If you have a high-concurrency workload, you usually need more mature technology. That’s where Teradata shines.
- That said, the very largest data warehouses are usually really data marts. High-concurrency BI is usually run against somewhat smaller databases.
- The upper limit for data warehouse sizes is skyrocketing. In 18 months, we’re seeing the largest known production systems go from under 1 petabyte of user data to multiple petabytes.
- Teradata has more competition for the very largest databases than it used to, which are now being found in relatively young web companies even more than in old-line telcos, retailers, or banks.
