May 22, 2010

Notes on SciDB and scientific data management

I firmly believe that, as a community, we should look for ways to support scientific data management and related analytics. That’s why, for example, I went to XLDB3 in Lyon, France at my own expense. Eight months ago, I wrote about issues in scientific data management. Here’s some of what has transpired since then.

October 27, 2009

Teradata’s nebulous cloud strategy

As the pun goes, Teradata’s cloud strategy is – well, it’s somewhat nebulous. More precisely, for the foreseeable future, Teradata’s cloud strategy is a collection of rather disjointed parts, including:

September 12, 2009

Introduction to the XLDB and SciDB projects

June 8, 2009

The future of data marts

Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:

In essence, Greenplum is pitching the story:

When put that starkly, it’s overstated, not least because

Specialized Analytic DBMS != Data Warehouse Appliance

But basically it makes sense, for two main reasons:

April 30, 2009

eBay’s two enormous data warehouses

A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I’ve already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn’t like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I’m finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.

Metrics on eBay’s main Teradata data warehouse include:

Metrics on eBay’s Greenplum data warehouse (or, if you like, data mart) include:

April 28, 2009

Data warehouse storage options — cheap, expensive, or solid-state disk drives

This is a long post, so I’m going to recap the highlights up front. In the opinion of somebody I have high regard for, namely Carson Schmidt of Teradata:

In other news, Carson likes 10 Gigabit Ethernet, dislikes Infiniband, and is “ecstatic” about Intel’s Nehalem, which will be the basis for Teradata’s next generation of servers.

April 14, 2009

eBay thinks MPP DBMS clobber MapReduce

I talked with Oliver Ratzesberger and his team at eBay last week, who I already knew to be MapReduce non-fans. This time I added more detail.

Oliver believes that, on the whole, MapReduce is 6-8X slower than native functionality in an MPP DBMS, and hence should only be used sporadically. This view is based on part on simulations eBay ran of the Terasort benchmark. On 72 Teradata nodes or 96 lower-powered nodes running another (currently unnamed, as per yet another of my PR fire drills) MPP DBMS, a simulation of Terasort executed in 78 and 120 secs respectively, which is very comparable to the times Google and Yahoo got on 1000 nodes or more.

And by the way, if you use many fewer nodes, you also consume much less floor space or electric power.

March 2, 2009

Named customer silliness

Neither Greenplum nor eBay will say for the record that eBay is a Greenplum customer. Indeed, saying that is quite verboten. On the other hand, Greenplum’s press release boilerplate says that Skype is a Greenplum customer, and Skype is of course a subsidiary of eBay.  (Edit: Speaking of silliness, fixed a typo there.)

The point of such distinctions is sometimes lost on me.

In related news, of Greenplum’s two customers who back in August were supposedly heading into production soon with petabyte-plus databases, one hasn’t yet made it to that size. (“As we speak” turned out to be a longer conversation than I might have anticipated ….) The other (of course unnamed) customer has, Greenplum assures me, made it that high.  But upon checking with that (unnamed, in case I forgot to mention the point) customer, I don’t detect a whole lot of enthusiasm about Greenplum.

February 26, 2009

Data warehousing business trends

February 25, 2009

Talend update

I chatted yesterday at TDWI with Yves de Montcheuil of Talend, as a follow-up to some chats at Teradata Partners in October. This time around I got more metrics, including:

It seems that Talend’s revenue was somewhat shy of $10 million in 2008.

