May 29, 2008

Yahoo scales its web analytics database to petabyte range

Information Week has an article with details on what sounds like Yahoo’s core web analytics database. Highlights include:

Share: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • Digg
  • DZone
  • Mixx
  • Reddit
  • Slashdot
  • Sphinn
  • StumbleUpon
  • Technorati

Comments

9 Responses to “Yahoo scales its web analytics database to petabyte range”

  1. dave on May 30th, 2008 7:06 am

    actually there was a bit more to it - the article described a small startup acquired by yahoo that supported their changes around mysql architecture (mahit? i forget the exact name)

    as for their column-store comment, i saw that too though they offered zero in the way of supporting evidence, though one may assume that too many yahoo properties rely on mutltvariable-intensive searches and so wouldn’t be quite as well served…

  2. Curt Monash on May 30th, 2008 5:04 pm

    On the column store point — sometimes there isn’t much difference between a vertically partitioned row store and a true column store, as per http://www.dbms2.com/2007/03/19/datallegro-versus-vertica-columnar-systems/.

    CAM

  3. Curt Monash on May 30th, 2008 5:08 pm

    Eric Lai of Computerworld has an article too: http://www.infoworld.com/news/feeds/08/05/22/Yahoo-claims-2-petabyte-database-is-worlds-biggest–busiest.html

    Again, he says PostgreSQL, not MySQL. The fact that it’s an acquisition may help explain why it’s not MySQL. :)

    The name was Mahat, with I gather is a philosophically-inspiring word in Sanskrit or something.

    CAM

  4. Daniel Weinreb on May 31st, 2008 6:55 am

    The claim that they got great performance advantages by optimizing for a specific application sounds very plausible to me. If you look at the published literature from companies like Amazon and Google about their high-performance, high-availability systems, these papers explain all kinds of interesting techniques that buy lots of performance by providing semantics that are unconventional, but carefully optimized for the particular needs and tradeoffs of their applications.

  5. Response to Rita Sallam of Oracle | DBMS2 -- DataBase Management System Services on June 28th, 2008 4:35 am

    [...] by the way — the largest Oracle warehouse by far on that list is at Yahoo.  But Oracle isn’t Yahoo’s major data warehouse software provider. If a shared disk architecture is not scalable, then how is it that Oracle is the leader in Data [...]

  6. david bandel on August 25th, 2008 12:02 pm

    because Oracle won’t support their custom DB structure.

  7. Yahoo reaches 1-Petabyte… « Wisps in the Ethereal on August 25th, 2008 2:14 pm

    [...] You can read more on the Yahoo side of things HERE. [...]

  8. Infobright’s open source move has a lot of potential | DBMS2 -- DataBase Management System Services on September 15th, 2008 8:05 am

    [...] data. Those outfits have already been buying massive data warehouse appliances – or doing things even more dramatic — and don’t need Infobright. But for anybody else in the MySQL world who needs [...]

  9. Some of Oracle’s largest data warehouses | DBMS2 -- DataBase Management System Services on September 24th, 2008 8:22 pm

    [...] one of Greenplum’s flagship accounts. And despite its ongoing Oracle relationship Yahoo has a much bigger data warehouse based on Postgres [...]

Leave a Reply




Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.