eBay

Discussion of eBay’s use of database and analytic technology. Related subjects include:

October 15, 2008

eBay doesn’t love MapReduce

The first time I ever heard from Oliver Ratzesberger of eBay, the subject line of his email mentioned MapReduce.  That was early this year.  Subsequently, however, eBay seems to have become a MapReduce non-fan.  The reason is simple: eBay’s parallel efficiency tests show that MapReduce leaves most processors idle most of the time.  The specific figure they mentioned was parallel efficiency of 18%.

October 15, 2008

Teradata’s Petabyte Power Players

As previously hinted, Teradata has now announced 4 of the 5 members of its “Petabyte Power Players” club.  These are enterprises with 1+ petabyte of data on Teradata equipment.  As is commonly the case when Teradata discusses such figures, there’s some confusion as to how they’re actually counting.  But as best I can tell, Teradata is counting: Read more

March 25, 2008

The eBay analytics guys have a blog now

Oliver Ratzesberger and his crew have started a blog, focusing on xldb analytics. Naturally, one of the early posts gives a quick overview of their system stats. Highlights include:

Incoming data volumes exceed 40TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 6PB of physical storage with over 2.9PB(1.4+1.5) in our largest cluster.

We leverage compression technologies wherever possible and are achieving compression ratios as high as 99% on our highest volume data feeds.

On any given day our massive parallel systems process more than 27PB of data, not factoring in various levels of caches that serve similar activities or processes and reduce the amount of physical IOs significantly.

We execute millions of requests on a daily basis, spanning from near realtime highly localized access to enormous jobs that span 100s of TB in a single or series of models.

February 27, 2008

eBay OLTP architecture

I’ve posted a couple times about eBay’s analytics side. As a companion, Don Burleson pointed me at a fascinating November, 2006 slide presentation outlining eBay’s transactional architecture and evolution. Highlights include:

The presentation has a bunch of specific numbers, in case anybody wants to dive in.

February 26, 2008

The biggest eBay database

There’s been some confusion over my post about eBay’s multiple petabytes of data. So to clarify, let me say:

February 11, 2008

eBay is over 5 petabytes now

Single largest database >1.4 petabytes.

From Oliver Ratzesberger’s LinkedIn profile:

Our systems process in excess of 10 billion records per day, serving thousands of users and delivering hundreds of millions of queries per month in a true global 24×7 operation with distributed teams around the globe on systems over 5 PB in size (largest single system >1.4PB).

October 9, 2007

Marketing versus reality on the one-petabyte barrier

Usually, I don’t engage in the kind of high-speed quick-response blogging I have over the past couple of days from the Teradata Partners conference (and more generally have for the past week or so). And I’m not sure it’s working out so well.

For example, the claim that Teradata has surpassd the one-petabyte mark comes as quite a surprise to variety of Teradata folks, not to mention at least one reliable outside anonymous correspondent. That claim may indeed be true about raw disk space on systems sold. But the real current upper limit, according to CTO Todd Walter,* is 5-700 terabytes of user data. He thinks half a dozen or so customers are in that range. I’d guess quite strongly that three of those are Wal-Mart, eBay, and an unspecified US intelligence agency.

*Teradata seems to have quite a few CTOs. But I’ve seen things much sillier than that in the titles department, and accordingly shan’t scoff further — at least on that particular subject. 😉

On the other hand, if anybody did want to buy a 10 petabyte system, Teradata could ship them one. And by the way, the Teradata people insist Sybase’s claims in the petabyte area are quite bogus. Teradata claims to have had bigger internal systems tested earlier than the one Sybase writes about.

October 8, 2007

Teradata apparently has crossed the petabyte barrier

According to a hurried conversation I had with Chief Marketing Office Darryl MacDonald, Teradata has customers with over 1 petabyte of user data in a single instance. He wouldn’t disclose any names, but I’d guess one is eBay, who he did confim is a customer. The intelligence area is another one where I’d speculate there are Very Large Databases.

However, since Darryl mentioned testing systems internally up to 4 petabytes, I’d guess the upper limit of Teradata deployments is in the 1-2 petabyte range.

EDIT: I’m now guessing that Teradata’s largest classified database — which previously was the largest overall — isn’t much over a petabyte in size. And there’s a strong chance this is larger than any unclassified one.

Update: That wasn’t really 1+ petabyte of user data.


August 8, 2006

eBay’s version of DBMS2

Every sufficiently large or agile enterprise needs to follow the DBMS2 approach. The following is from an article on eBay’s version:

“eBay has built a software-based Integration Tier. This contains both a data access layer (DAL) and a services framework. The Integration Tier acts as an abstraction layer for software engineers to work with many disparate back-end data sources through a consistent set of abstractions.”

← Previous Page

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.