October 6, 2010

eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla’s home base of SLAC, which used to stand for “Stanford Linear Accelerator Center”). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included: 

Comments

30 Responses to “eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more”

  1. M-A-O-L » eBay replaces Greenplum with Teradata on October 7th, 2010 12:50 am

    […] quicky: eBay followup — Greenplum out, Teradata > 10 petabytes, Hadoop has some value, and more. Interesting to see that the impression is that Greenplum got thrown out more for reliability […]

  2. Paul Johnson on October 7th, 2010 8:50 am

    Many sites try and keep Teradata on their toes through either the threat or actual deployment of a competing technology.

    Along with credibility gained through being presented as the ‘Sun Data Warehouse’ (Greenplum on Thumper), I saw this as the main driver behind Greenplum’s adoption at ebay. It has been happening increasingly since Netezza came on the scene in ~2003.

    Now that Sun has gone to Oracle and Greenplum to EMC (ironically one of Teradata’s disk sub-system suppliers), the roadmap in support of ebay’s continued use of Sun/Greenplum must be non-existent.

    Big companies expect multi-year technology roadmaps and some expect to be able to bend the roadmap significantly to meet its own requirements. This must have been a factor, surely?

    The name-value pair database is interesting given that Nokia presented with Teradata at last year’s Teradata Partners event in Washington and basically said the POC they carried out couldn’t make the data easy to use, with multiple self-joins required for even the simplest query.

    Do ebay have new Teradata features in support of this approach I wonder?

  3. Curt Monash on October 7th, 2010 9:51 am

    Yes, there are new features.
    No, I don’t know what they are. :)

  4. Ramakrishna Vedantam on October 7th, 2010 10:18 am

    Greenplum bought by EMC and the platform bought by database company…. Does it mean that the product has become lask lustured now?

  5. Michael McIntire on October 7th, 2010 12:01 pm

    Curt, the fundamental thrust of this post, that GP was thrown out, is simply not true and implies that GP is not viable in the MPP space, which is also not true.

    eBay and GP did a research project, they both learned a great deal. Sun, to it’s credit, stepped up and put a lot of time and energy into the 45xx platform.

    The issue about NVP, is the need for branch and loop control in expressions. If your database cannot branch/loop in an expression – you cannot process NVP…

    There was a lot of technology developed by all Three companies – so when eBay goes and does the next big and unexpected thing – don’t throw Teradata under the bus like you just did GP.

    In the interest of fair disclosure, I lead the Singularity project and the research relationships between eBay and GP, and eBay and TD. Two months ago, I became the Chief Architect for User Data and Analytics at Yahoo!.

    Michael McIntire

  6. Curt Monash on October 7th, 2010 1:53 pm

    Michael,

    I neither implied nor believe that GP is not viable in the MPP space — that would be pretty silly.

    Umm — NVP?

    Thanks,

    CAM

  7. Michael McIntire on October 7th, 2010 2:52 pm

    Curt – NVP = Name Value Pair. -Michael

  8. Implementation Engineer on October 8th, 2010 1:07 pm

    I have worked on either technologies and respect either. Honestly, I think I smell marketing. For people who make decision based on comments, I’d encourage you all to do an evaluation before you go with any solution.

  9. Michael McIntire on October 8th, 2010 3:51 pm

    BTW – the simple calculation of usable space works like this….

    8PB Raw = 4PB mirrored.
    4PB Less File System + Overhead (30%) = 2.8PB.

    Compression:
    60% avg = 2.8PB * 2.5 = 7PB Usable
    70% avg = 2.8PB * 3.3 = 9.3PB Usable

    Ergo, a 20PB box would be:
    60% avg = 7PB * 2.5 = 17.5PB Usable
    70% avg = 7PB * 3.3 = 23.1PB Usable

    The Basic rule of thumb is:
    with generalized compression Raw Disk Size = Usable Disk Size

  10. Curt Monash on October 8th, 2010 9:58 pm

    So using Oliver’s 80% figure, we’re talking 14 PB?

    Or is 80% more like a peak than an average number?

  11. Oliver Ratzesberger on October 9th, 2010 12:13 am

    Curt,

    As I stated previously the “thrown out” part of your statement could not be further from the truth.

    The answer to a casual question over lunch was: Do you still use vendor XYZ? And my response was a simple No.

    For various reasons that I will not go into in this form, we have simply selected a different vendor for V2 or our Singularity project. The same is true for many areas of our business. That said we treat our vendors, current or past with respect. The guys at Greenplum have gone above and beyond during the time we worked with them on a next generation prototype.
    We value and respect their entire team and as Michael previously stated, have simply selected a different vendor for the next generation system implementation.

    I realize that provocative statements drive traffic to your blog, but I would appreciate if you could remove any exaggerations from what reads like a quote from myself.

    I am sorry but I cannot support your statements in this blog post.

  12. Curt Monash on October 9th, 2010 10:37 am

    Oliver,

    Thank you (and ditto Michael) for correcting any connotations you feel people may have wrongly inferred from what I wrote.

    Usually that’s something I have to do myself.

    Best,

    CAM

  13. Partnering with Cloudera | DBMS 2 : DataBase Management System Services on October 10th, 2010 12:40 pm

    […] Owners of that much data commonly like to store it using free or quasi-free software, especially if the data isn’t structured in such a way that relational tables are a great fit in the first place. HDFS (Hadoop Distributed File System) is the default choice. (Of course, there always are exceptions.) […]

  14. La petite revue de presse du décisionnel | www.LeGrandBI.com on October 10th, 2010 2:41 pm

    […] of Teradata also sat in on the latter part of the conversation. Things I learned included… Lire l’article Article liésIBM va-t-il s’emparer de Netezza ?La Revue de Presse de l’été […]

  15. Notes and links October 22, 2010 | DBMS 2 : DataBase Management System Services on October 22nd, 2010 2:48 am

    […] That eBay comment was particularly interesting. […]

  16. A few notes from XLDB 4 | DBMS 2 : DataBase Management System Services on January 25th, 2011 3:09 am

    […] (ditto), Luke Lonergan (ditto), Todd Walter (almost unrecognizable without his usual cowboy gear), Oliver Ratzesberger, and a bunch of actual science […]

  17. Newbie on April 7th, 2011 1:47 pm

    can any one tell me what could be the job of an ETL developer at Ebay?

    Thanks in advance.

  18. Quora on June 10th, 2011 3:21 am

    What are the scalability limits of existing data warehouse products?…

    Those aren’t all the same thing. Oracle RAC isn’t shared-nothing, although Exadata gets some of the shared-nothing benefits. Microsoft’s shared-nothing offering is very immature, as it was based on the troubled DATAllegro acquisition. Anyhow, Terada…

  19. No, companies are NOT entitled to manage news about themselves | Strategic Messaging on June 21st, 2011 4:49 pm

    […] I got into a flap with EMC Greenplum. I blindsided them on a story; they retaliated for the story by, among other things, screwing me over business-wise. Why did I […]

  20. Data management at Zynga and LinkedIn | DBMS 2 : DataBase Management System Services on September 6th, 2011 2:50 am

    […] this time that is indeed the phrase that was […]

  21. Data Management at Zynga and LinkedIn | Inside-BigData.com on September 24th, 2011 11:02 am

    […] ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much like eBay‘s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into […]

  22. Social technology in the enterprise | Text Technologies on November 16th, 2011 5:17 am

    […] an important concept on the monitoring-oriented side of business intelligence and — if Oliver Ratzesberger is to be believed — in investigative analytics as well. But the operational side may actually […]

  23. Data into results » Big data and mobile BI : New hype but same old issue on December 29th, 2011 2:22 pm

    […] the first issue, which is size, let me point out that eBay have two data warehouse with many petabytes running Teradata. Obviously, Teradata is far from cutting edge new stuff. I didn’t heard of a […]

  24. Big data and mobile BI : New hype but same old issue - Business Intelligence Weekly on January 17th, 2012 9:15 pm

    […] the first issue, which is size, let me point out that eBay have two data warehouse with many petabytes running Teradata. Obviously, Teradata is far from cutting edge new stuff. I didn’t heard of a […]

  25. Under the covers of eBay’s big data operation — Cloud Computing News on January 31st, 2012 12:57 pm

    […] says eBay’s traffic volumes produce huge data, not just big data. In late 2010, eBay predicted its Teradata deployment would grow from about 10 petabytes to 20 petabytes (or 20,000 terabytes — equivalent to about 266 years […]

  26. Under the covers of eBay’s big data operation | Ubuntu Cloud Portal on January 31st, 2012 1:34 pm

    […] says eBay’s traffic volumes produce huge data, not just big data. In late 2010, eBay predicted its Teradata deployment would grow from about 10 petabytes to 20 petabytes (or 20,000 terabytes — equivalent to about 266 years […]

  27. SquareCows.com » Under the covers of eBay’s big data operation on February 1st, 2012 1:04 am

    […] says eBay’s traffic volumes produce huge data, not just big data. In late 2010, eBay predicted its Teradata deployment would grow from about 10 petabytes to 20 petabytes (or 20,000 terabytes — equivalent to about 266 years […]

  28. SquareCows.com » Under the covers of eBay’s big data operation on February 1st, 2012 1:04 am

    […] says eBay’s traffic volumes produce huge data, not just big data. In late 2010, eBay predicted its Teradata deployment would grow from about 10 petabytes to 20 petabytes (or 20,000 terabytes — equivalent to about 266 years […]

  29. What those nested data structures are about | DBMS 2 : DataBase Management System Services on January 28th, 2013 6:30 am

    […] explanation was led by Oliver Ratzesberger, late of eBay* and progenitor of eBay’s Singularity project. In simplest terms, one event can spawn a lot of event attribute information, perhaps in the form […]

  30. Schema-on-need | DBMS 2 : DataBase Management System Services on October 30th, 2013 10:30 am

    […] ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much like eBay‘s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.