Discussion of eBay’s use of database and analytic technology. Related subjects include:

May 22, 2010

Notes on SciDB and scientific data management

I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.

The main new activity I know of has been in the open source SciDB project.   Read more

October 27, 2009

Teradata’s nebulous cloud strategy

As the pun goes, Teradata’s cloud strategy is – well, it’s somewhat nebulous. More precisely, for the foreseeable future, Teradata’s cloud strategy is a collection of rather disjointed parts, including:

Teradata openly admits that its direction is heavily influenced by Oliver Ratzesberger at eBay. Like Teradata, Oliver and eBay favor virtual data marts over physical ones. That is, Oliver and eBay believe that the ideal scenario is that every piece of data is only stored once, in an integrated Teradata warehouse. But eBay believes and Teradata increasingly agrees that users need a great deal of control over their use of this data, including the ability to import additional data into private sandboxes, and join it to the warehouse data already there. Read more

September 12, 2009

Introduction to the XLDB and SciDB projects

Before I write anything else about the overlapping efforts known as XLDB and SciDB, I probably should explain and disambiguate what they are as best I can. XLDB was organized and still is run by guys who want to solve a scientific problem in eXtremely Large DataBase Management, most especially Jacek Becla of SLAC (the organization previously known as Stanford Linear Accelerator Center). Becla’s original motivation was that he needs a DBMS to manage what will be 55 petabytes of raw image data and 100 petabytes of astronomical data total for LSST (Large Synoptic Survey Telescope). Read more

June 8, 2009

The future of data marts

Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:

In essence, Greenplum is pitching the story:

When put that starkly, it’s overstated, not least because

Specialized Analytic DBMS != Data Warehouse Appliance

But basically it makes sense, for two main reasons:

Read more

April 30, 2009

eBay’s two enormous data warehouses

A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I’ve already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn’t like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I’m finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.

Metrics on eBay’s main Teradata data warehouse include:

Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:

Read more

April 28, 2009

Data warehouse storage options — cheap, expensive, or solid-state disk drives

This is a long post, so I’m going to recap the highlights up front. In the opinion of somebody I have high regard for, namely Carson Schmidt of Teradata:

In other news, Carson likes 10 Gigabit Ethernet, dislikes Infiniband, and is “ecstatic” about Intel’s Nehalem, which will be the basis for Teradata’s next generation of servers.

Read more

April 14, 2009

eBay thinks MPP DBMS clobber MapReduce

I talked with Oliver Ratzesberger and his team at eBay last week, who I already knew to be MapReduce non-fans. This time I added more detail.

Oliver believes that, on the whole, MapReduce is 6-8X slower than native functionality in an MPP DBMS, and hence should only be used sporadically. This view is based on part on simulations eBay ran of the Terasort benchmark. On 72 Teradata nodes or 96 lower-powered nodes running another (currently unnamed, as per yet another of my PR fire drills) MPP DBMS, a simulation of Terasort executed in 78 and 120 secs respectively, which is very comparable to the times Google and Yahoo got on 1000 nodes or more.

And by the way, if you use many fewer nodes, you also consume much less floor space or electric power.

March 2, 2009

Named customer silliness

Neither Greenplum nor eBay will say for the record that eBay is a Greenplum customer. Indeed, saying that is quite verboten. On the other hand, Greenplum’s press release boilerplate says that Skype is a Greenplum customer, and Skype is of course a subsidiary of eBay.  (Edit: Speaking of silliness, fixed a typo there.)

The point of such distinctions is sometimes lost on me.

In related news, of Greenplum’s two customers who back in August were supposedly heading into production soon with petabyte-plus databases, one hasn’t yet made it to that size. (“As we speak” turned out to be a longer conversation than I might have anticipated ….) The other (of course unnamed) customer has, Greenplum assures me, made it that high.  But upon checking with that (unnamed, in case I forgot to mention the point) customer, I don’t detect a whole lot of enthusiasm about Greenplum.

February 26, 2009

Data warehousing business trends

I’ve talked with a whole lot of vendors recently, some here at TDWI, as well as users, fellow analysts, and so on. Repeated themes include: Read more

February 25, 2009

Talend update

I chatted yesterday at TDWI with Yves de Montcheuil of Talend, as a follow-up to some chats at Teradata Partners in October. This time around I got more metrics, including:

It seems that Talend’s revenue was somewhat shy of $10 million in 2008.

Specific large paying customers Yves mentioned include: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.