May 29, 2008

Yahoo scales its web analytics database to petabyte range

Information Week has an article with details on what sounds like Yahoo’s core web analytics database. Highlights include:


13 Responses to “Yahoo scales its web analytics database to petabyte range”

  1. dave on May 30th, 2008 7:06 am

    actually there was a bit more to it – the article described a small startup acquired by yahoo that supported their changes around mysql architecture (mahit? i forget the exact name)

    as for their column-store comment, i saw that too though they offered zero in the way of supporting evidence, though one may assume that too many yahoo properties rely on mutltvariable-intensive searches and so wouldn’t be quite as well served…

  2. Curt Monash on May 30th, 2008 5:04 pm

    On the column store point — sometimes there isn’t much difference between a vertically partitioned row store and a true column store, as per


  3. Curt Monash on May 30th, 2008 5:08 pm

    Eric Lai of Computerworld has an article too:–busiest.html

    Again, he says PostgreSQL, not MySQL. The fact that it’s an acquisition may help explain why it’s not MySQL. 🙂

    The name was Mahat, with I gather is a philosophically-inspiring word in Sanskrit or something.


  4. Daniel Weinreb on May 31st, 2008 6:55 am

    The claim that they got great performance advantages by optimizing for a specific application sounds very plausible to me. If you look at the published literature from companies like Amazon and Google about their high-performance, high-availability systems, these papers explain all kinds of interesting techniques that buy lots of performance by providing semantics that are unconventional, but carefully optimized for the particular needs and tradeoffs of their applications.

  5. Response to Rita Sallam of Oracle | DBMS2 -- DataBase Management System Services on June 28th, 2008 4:35 am

    […] by the way — the largest Oracle warehouse by far on that list is at Yahoo.  But Oracle isn’t Yahoo’s major data warehouse software provider. If a shared disk architecture is not scalable, then how is it that Oracle is the leader in Data […]

  6. david bandel on August 25th, 2008 12:02 pm

    because Oracle won’t support their custom DB structure.

  7. Yahoo reaches 1-Petabyte… « Wisps in the Ethereal on August 25th, 2008 2:14 pm

    […] You can read more on the Yahoo side of things HERE. […]

  8. Infobright’s open source move has a lot of potential | DBMS2 -- DataBase Management System Services on September 15th, 2008 8:05 am

    […] data. Those outfits have already been buying massive data warehouse appliances – or doing things even more dramatic — and don’t need Infobright. But for anybody else in the MySQL world who needs […]

  9. Some of Oracle’s largest data warehouses | DBMS2 -- DataBase Management System Services on September 24th, 2008 8:22 pm

    […] one of Greenplum’s flagship accounts. And despite its ongoing Oracle relationship Yahoo has a much bigger data warehouse based on Postgres […]

  10. eBay’s two enormous data warehouses | DBMS2 -- DataBase Management System Services on April 30th, 2009 6:25 am

    […] web/network events database, running on proprietary software, sounded about 1/6th the size of eBay’s Greenplum system when it was described about a year […]

  11. Analytics Team » Blog Archive » Web analytics databases keep getting bigger on April 30th, 2009 10:23 pm

    […] Ebay has a 6.5 petabyte Greenplum warehouse and a 2.5 petabyte Teradata warehouse. This system ingests hundreds of billions of new rows of data every day. Facebook has a 2.5 petabyte Hadoop system Yahoo has more than 1 petabyte running on their homemade system […]

  12. Yahoo is up to 10 petabytes now? | DBMS2 -- DataBase Management System Services on July 6th, 2009 2:03 am

    […] to somebody (I forget who) who attended Yahoo’s SIGMOD presentation last week, the big Yahoo database is now up to 10 petabytes in size, in line with Yahoo’s predictions last year.  Apparently, […]

  13. The fall (and rise?) of Yahoo: How the web giant crumbled and built some great tech in the process — Tech News and Analysis on November 27th, 2013 9:01 am

    […] relational databases, NoSQL databases and even a columnar analytic database called Everest that was designed for querying big data related to targeted advertising. He views Yahoo’s decision to port so many workloads to […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.