April 14, 2009

eBay thinks MPP DBMS clobber MapReduce

I talked with Oliver Ratzesberger and his team at eBay last week, who I already knew to be MapReduce non-fans. This time I added more detail.

Oliver believes that, on the whole, MapReduce is 6-8X slower than native functionality in an MPP DBMS, and hence should only be used sporadically. This view is based on part on simulations eBay ran of the Terasort benchmark. On 72 Teradata nodes or 96 lower-powered nodes running another (currently unnamed, as per yet another of my PR fire drills) MPP DBMS, a simulation of Terasort executed in 78 and 120 secs respectively, which is very comparable to the times Google and Yahoo got on 1000 nodes or more.

And by the way, if you use many fewer nodes, you also consume much less floor space or electric power.

Comments

11 Responses to “eBay thinks MPP DBMS clobber MapReduce”

  1. UnHolyGuy on April 14th, 2009 12:29 pm

    Yet you don’t see eBay or Teradata publishing a Terasort benchmark of their own for some reason.

    Also, while mapreduce might be 6-8 times slower, if it ends up being a hundred times cheaper, and it scales linearly as you add hardware, then it starts to not matter don’t you think? Expect as you said for floor and rack space.

    Would be interesting to see some solid, unbiased numbers on how much a Hadoop node costs, TOC, as comapred to a Teradata node. I’m not sure what eBay is paying for their Teradata nodes, but Teradata list price is easily 100 times marked up from base hardware cost.

    I think there are really three interplaying factors here:

    1: Cost of actual hardware/footprint as it compares to software licensing cost
    2: Pros/cons of distributed db’s as comapred to map/reduce infrastructures
    3: ease of use, target market, target user

  2. Curt Monash on April 14th, 2009 5:01 pm

    Purchase and maintenance prices for hardware are no longer dispositive if viewed in isolation from power and floor space costs.

    eBay talked about that to me, and it’s also in the link above.

    Putting a big markup on hardware because you add your software doesn’t add to the power consumption.

  3. UnHolyGuy on April 14th, 2009 5:53 pm

    Perfectly possible to back in to the break even point given the inputs.

    Assume parity of functionality between the two systems and linear scalability.

    Assume that the open source Map/Reduce system (Hadoop) requires 8 times the compute resources on identical hardware to solve the same problem as the proprietary (Teradata)

    Let N = #nodes
    Let C = software cost
    Let K = fully loaded cost to operate a node for a year (power, rack maintenance )

    Parity is solved by

    C * N + KN = 8KN, N’s reduce

    C+K=7K

    Plug in $200,000 a node for Teradata

    The cost to operate a machine has to be around $28,000/year before the teradata solution is cost effective.

    If you assume a fully loaded total cost to operate commodity hardware of around $1000/year (I’m not sure this is correct but it is likely ballpark)

    then Teradata needs to come down to about $7000/node to break even with the open source solution.

  4. UnHolyGuy on April 14th, 2009 6:13 pm

    Whoops left out the hardware purchase cost

    Assume parity of functionality between the two systems and linear scalability.

    Assume that the open source Map/Reduce system (Hadoop) requires 8 times the compute resources on identical hardware to solve the same problem as the proprietary (Teradata)

    Assume nodes cost $3000

    Let N = #nodes in the Teradata cluster
    Let C = software cost
    Let K = fully loaded cost to operate a node for a year (power, rack maintenance )

    Parity is solved by

    C * N + KN = 8KN + $3000*8N

    N’s reduce

    C = 7K +$24,000

    Plug in $200,000 a node for Teradata

    The cost to operate a machine has to be around $25,000/year before the teradata solution is cost effective.

    If you assume a fully loaded total cost to operate commodity hardware of around $1000/year (I’m not sure this is correct but it is likely ballpark)

    then Teradata needs to come down to about $31,000/node to break even with the open source solution.

  5. Mark Callaghan on April 15th, 2009 1:09 am

    @UnHolyGuy – stop it with all of the facts. I prefer to read that MapReduce is bad because a big Teradata customer says so.

  6. Curt Monash on April 15th, 2009 2:29 am

    If I understood correctly, eBay told me that it costs $10-20K/year to operate a node. One data point was that power tends to run 3X what you’d think it would just looking at a device’s power rating.

    That said, I think the real competitor at eBay for MapReduce isn’t Teradata but rather a much cheaper MPP database management system alternative, whose vendor’s executives I’m quite displeased with at the moment because of the stupid games they’re playing around yea or nay having their name mentioned.

    CAM

  7. Mark Callaghan on April 15th, 2009 2:59 am

    @Curt – have vendors such as Teradata become more open about the cost of their systems (per node, per GB, …) given the availability of open-source alternatives?

  8. Curt Monash on April 15th, 2009 3:51 am

    Mark,

    I think the driver was doing business over the Web. Large vendors started publishing very clear price sheets, just in case somebody wanted to place an online order. Perhaps in response to this transparency, other vendors did the same thing, even if they didn’t take online orders. I’ve certainly posted references to detailed price sheets from Oracle and Teradata.

    Smaller vendors are more circumspect. In some cases I’ve had to beat them up pretty hard to get any clarity.

    It’s also a little bit academic, as there are huge quantity discounts for much of this stuff. I’m glad Vertica (posted a while ago) and Greenplum (coming soon) gave me permission to post their list prices, but frankly those are not very precise guides to the actual cost of buying a Vertica or Greenplum system.

  9. UnHolyGuy on April 15th, 2009 12:01 pm

    Unless you are running your servers on uranium power packs, power does not cost $20K a year for a $3K server.

    However, the fully loaded cost might indeed come close to that.

    “Fully loaded” is such a hard thing to come to grips with however. Certainly 1000 exactly idenitical, low availabilty, very simple hadoop servers are going to be a lot less expensive then 1000 on-offs.

    I think another one of the hidden values to the map/reduce architecture is the extreme simplicity of the whole stack, and lack of requirement for any kind of uptime requirements on an individual server. Something goes wrong, kill the server, replace with a freshly minted copy, at your leisure.

    However, in all fairness, current map/reduce infrastrucutres offer NOTHING like the ease of use and interoperability that something like a distributed database offers.

    My opinion as follows:
    1: Map/Reduce and distributed databases are synergistic not competitive technologies and will converge
    2: Th price point on a distributed database node can not stay above $30K, anymore then Sun could continue to charge $200K for an 8proc E4500
    3: Pricing per core is kind of dumb in general, price point should drive off data under management.

  10. There always seems to be a fire drill around MapReduce news | DBMS2 -- DataBase Management System Services on July 6th, 2009 3:00 am

    [...] segments. (For the record, I probably put more weight on that reason than Aster itself does.) eBay seems even more negative on MapReduce, if that is possible, than it previously was. Also, I gathered I should talk with Hadoop-centric [...]

  11. Cloudera Enterprise and Hadoop evolution | DBMS2 -- DataBase Management System Services on June 30th, 2010 1:22 pm

    [...] eBay’s prior skepticism about MapReduce, it is quoted saying nice things in a Cloudera press release, and has apparently become quite a [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.