eBay doesn’t love MapReduce
The first time I ever heard from Oliver Ratzesberger of eBay, the subject line of his email mentioned MapReduce. That was early this year. Subsequently, however, eBay seems to have become a MapReduce non-fan. The reason is simple: eBay’s parallel efficiency tests show that MapReduce leaves most processors idle most of the time. The specific figure they mentioned was parallel efficiency of 18%.
Categories: eBay, MapReduce, Parallelization | 7 Comments |
Teradata’s Petabyte Power Players
As previously hinted, Teradata has now announced 4 of the 5 members of its “Petabyte Power Players” club. These are enterprises with 1+ petabyte of data on Teradata equipment. As is commonly the case when Teradata discusses such figures, there’s some confusion as to how they’re actually counting. But as best I can tell, Teradata is counting: Read more
Categories: Data warehousing, eBay, Market share and customer counts, Petabyte-scale data management, Specific users, Teradata | 11 Comments |
Vertica offers some more numbers
Eric Lai interviewed Dave Menninger of Vertica. Highlights included:
- $20 million in trailing revenue. Removing a single multi-million-dollar deal from the list, that’s a few hundred thousand dollars each for 50ish customers. At $100K or so per terabyte, that’s an average of several terabytes of user data each, or more depending on what you assume about discounting.
- Dave used a figure of $100K per terabyte of user data, down from the $150K Vertica has previously used.
Categories: Data warehousing, Market share and customer counts, Pricing, Vertica Systems | 10 Comments |
Teradata Virtual Storage
One of the big features of Teradata 13.0, announced this week (Edit: and to be shipped some time in 2009), is Teradata Virtual Storage, which sounds pretty cool. So far as I can tell, Teradata Virtual Storage has two major aspects, namely: Read more
Categories: Data warehousing, Solid-state memory, Storage, Teradata | 3 Comments |
Teradata Geospatial, and datatype extensibility in general
As part of it’s 13.0 release this week, Teradata is productizing its geospatial datatype, which previously was just a downloadable library. (Edit: More precisely, Teradata announced 13.0, which will actually be shipped some time in 2009.) What Teradata Geospatial now amounts to is:
- User-defined functions (UDF) written by Teradata (this is the part that existed before).
- (Possibly new) Enhanced implementations of the Teradata geospatial UDFs, for better performance.
- (Definitely new) Optimizer awareness of the Teradata geospatial UDFs.
Teradata also intends in the future to implement actual geospatial indexing; candidates include r-trees and tesselation.
Hearing this was a good wake-up call for me, because in the past I’ve conflated two issues on datatype extensibility, namely:
- Whether the query executer uses a special access method (i.e., index type) for the datatype
- Whether the optimizer is aware of the datatypes.
But as Teradata just pointed out, those two issues can indeed be separated from each other.
Categories: Data types, Data warehousing, GIS and geospatial, Teradata | 1 Comment |
Quick guide to Teradata’s announcements this week
The Teradata Partners (i.e., user) conference is this week. So there have been lots of press releases, some presentations, lots of meetings, and so on. A lot of Teradata’s messaging is in flux, as it moves fairly rapidly to correct what I believe have been some deficiencies in the past. One confusing result is that there was very little prebriefing about the actual announcement details, and we’re all scrambling to figure out what’s up.
Teradata does a good job of collecting its press releases at one URL. So without linking to most of them individually, let me jump in to an overview of Teradata news this week (whether or not in actual press release format): Read more
Categories: Data warehouse appliances, Data warehousing, Teradata | 9 Comments |
A data warehouse pricing complication: Software vs. appliances
Juan Loaiza of Oracle disagrees with a number of my opinions. We plan to talk about some of that when I visit on Thursday, after Teradata Partners. 🙂 But I’d like to throw one of his ideas out there right now. Juan contends that comparisons of Oracle Exadata pricing are apt to be misleading because — among other reasons — Oracle licenses can be reused on other hardware, in ways that appliance software can not. (The same reasoning would of course apply to almost everybody else except Teradata and Netezza.) Read more
Categories: Data warehouse appliances, Data warehousing, Exadata, Oracle, Pricing | 2 Comments |
Patrick Walravens’ SAP/Teradata speculation doesn’t make much sense
A persistent analyst named Patrick Walravens keeps speculating about an SAP acquisition of Teradata. So far as I can tell, Walravens is the sole source of this rumor, evidently because he actually thinks the combination would make some kind of business sense.
An example of the “logic” behind this theory is:
Mr. Walravens’s latest evidence pointing to such a move stems from the expected departure of a SAP executive who had been running the company’s NetWeaver software line, which includes a data warehouse package.
At a guess, Walravens is saying that Teradata’s products and SAP’s BI Accelerator somehow substitute for each other in the marketplace. If you believe that comparison, I’d like to sell you a railroad locomotive made by Jaguar. Read more
Categories: Data warehousing, SAP AG, Teradata | 5 Comments |
Multitenancy hype is getting out of control
I posted recently on SaaS-data-integration-in-the-cloud, and a couple of vendors stopped by the comment thread to shared what they do. One was Boomi, which has a blog that does a good job of spelling out its opinions. What the Boomi blog is not so good at, however, is giving any good reasons why one should share those opinions.
I refer specifically to a couple of posts claiming that multitenancy is somehow crucial for SaaS data integration to work. To this I can only say — huh? A decent data integration system should be able to handle many parallel threads at once, connecting many pairs of databases at once. So the hard part of multitenancy is pretty much “free.” If, even so, the integration provider chooses not to go fully multitenant, whose business is it but theirs? Read more
Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Software as a Service (SaaS) | 7 Comments |
Aster Data on online marketing data warehousing
Aster Data’s blog is getting to be like Vertica’s, in that I find myself recommending a large fraction of its posts.
The virtue of the latest one is that it strings together several customer examples in related areas of online marketing (which is pretty much the only sector Aster has so far sold into). I’ve tended to overgeneralize a bit, and use terms like “web analytics” or “clickstream analysis” even when they don’t wholly apply. The Aster post is a good antidote to that.
Categories: Application areas, Aster Data, Data warehousing, Web analytics | 1 Comment |