Yahoo

Discussion of Yahoo’s use of database and analytic technology. Related subjects include:

May 11, 2009

Facebook, Hadoop, and Hive

I few weeks ago, I posted about a conversation I had with Jeff Hammerbacher of Cloudera, in which he discussed a Hadoop-based effort at Facebook he previously directed. Subsequently, Ashish Thusoo and Joydeep Sarma of Facebook contacted me to expand upon and in a couple of instances correct what Jeff had said. They also filled me in on Hive, a data-manipulation add-on to Hadoop that they developed and subsequently open-sourced.

Updating the metrics in my Cloudera post,

Nothing else in my Cloudera post was called out as being wrong.

In a new-to-me metric, Facebook has 610 Hadoop nodes, running in a single cluster, due to be increased to 1000 soon. Facebook thinks this is the second-largest* Hadoop installation, or else close to it. What’s more, Facebook believes it is unusual in spreading all its apps across a single huge cluster, rather than doing different kinds of work on different, smaller sub-clusters.

Read more

September 24, 2008

Some of Oracle’s largest data warehouses

Googling around, I came across an Oracle presentation – given some time this year – that lists some of Oracle’s largest data warehouses. 10 databases total are listed with >16 TB, which is fairly consistent with Larry Ellison’s confession during the Exadata announcement that Oracle has trouble over 10 TB (which is something I’ve gotten a lot of flack from a few Oracle partisans for pointing out … :D ).

However, what’s being measured is probably not the same in all cases. For example, I think the Amazon 70 TB figure is obviously for spinning disk (elsewhere in the presentation it’s stated that Amazon has 71 TB of disk). But the 16 TB British Telecom figure probably is user data — indeed, it’s the same figure Computergram cited for BT user data way back in 2001.

The list is:

Read more

May 29, 2008

Yahoo scales its web analytics database to petabyte range

Information Week has an article with details on what sounds like Yahoo’s core web analytics database. Highlights include:

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.