September 24, 2011

Confusion about Teradata’s big customers

Evidently further attempts to get information on this subject would be fruitless, but anyhow:

Comments

9 Responses to “Confusion about Teradata’s big customers”

  1. Oliver Ratzesberger on September 25th, 2011 8:06 am

    FYI From our public presentations:
    Our largest system (Singularity) is currently at 37PB raw storage and at 5.5x compression for the vast majority of semi structured data. That is raw un-modeled data, mainly name/value pairs generated by upstream systems.
    Largest single table is 2PB after compression, 2+ trillion records, holding 200+ trillion name/value pairs.
    This table alone is accessed 20k+ times per day with an average response time of 18s.
    e.g. Taking a full day worth of eBay data, extracting 100+ billion search impressions (all items shown on all search pages) out of semi structured name/value pairs, pivoting them, counting and sorting them descending to find the highest impression counts runs for about 30s
    e.g. Sorting a raw TB takes about 9s

  2. Curt Monash on September 25th, 2011 9:51 am

    Hi, Oliver!

    Cool stats!

    37 PB would have to be sliced up into primary copy of data, mirror, and temp space. Anything else major, or is roughly 1/3 of it holding the actual database?

  3. Vlad Rodionov on September 25th, 2011 5:05 pm

    Just one note. You do not have to sort 100+ billion items to find the list of top x ones. Otherwise, everything (if it is true) is very impressive.

  4. Paul Johnson on September 26th, 2011 2:19 pm

    Very cool stats indeed.

    As a general guide, the 1/3 figure for the amount of spinning disk available for data storage is about right.

    The rest goes in mirroring, overhead and spool (work) space.

  5. Oliver Ratzesberger on September 27th, 2011 9:23 pm

    @Vlad: Correct and I did not state that. I meant to say counting (basically group by) and then sorting the result. Its still way north of hundred million for the result set. The stat about sorting a TB is a totally separate stat.

    @Curt: 1/2 is for the mirror and a few % for the file system. All the rest the MPP RDBMS gets to see as one. The traditional 30% for temp or spool does not apply for these extreme large BigData systems. Most of the time it runs well below 1% temp space or spool, only occasional spikes to 5% – which is almost a PB times 5.5x compression is more like 5PB of raw data

  6. Curt Monash on September 27th, 2011 9:43 pm

    Oliver,

    That sounds like 37 PB of raw storage should be multiplied by about 5/2 to get the total amount of data under management. Yikes! No wonder you’re proud of how big it is. :)

  7. Neil Raden on September 30th, 2011 9:52 pm

    Curt,

    I merely said Teradata announced that had 20 petabyte level installations. I made no claim to validate it, only to mention it. I did not say they convinced me.

  8. Curt Monash on October 1st, 2011 12:48 am

    Ahh.

    Since you seemingly stated it as fact, without any kind of qualification I could detect, I hope you’ll forgive me my error of interpretation. :)

    That kind of thing happens when there are 140 character limits …

  9. Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same : DBMS 2 : DataBase Management System Services on February 9th, 2012 4:21 am

    […] has an outstanding track record both for managing large data volumes and for high-concurrency mixed […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.