February 23, 2009

MapReduce user eHarmony chose Netezza over Aster or Greenplum

Depending on which IDG reporter you believe, eHarmony has either 4 TB of data or more than 12 TB, stored in Oracle but now analyzed on Netezza.  Interestingly, eHarmony is a Hadoop/MapReduce shop, but chose Netezza over Aster Data or Greenplum even so.  Price was apparently an important aspect of the purchase decision. Netezza also seems to have had a very smooth POC.

Chris Karnarcus says:

EHarmony has more than 20 million registered users and pushes up to 12TB of information at a time into the Netezza system, …

The company uses the Hadoop large-scale computing framework to run scoring algorithms against its pool of users to match potential mates,  …

The Netezza system, which eHarmony began using in October, is “more or less working as advertised. … It runs complicated queries, it’s fantastic in terms of table scanning and those sorts of things.” …

In addition, Netezza provided “really all of the plug-ins we needed” for Oracle, MicroStrategy and other platforms, he said.

Implementation was straightforward. “The way we did a proof of concept with them was, they shipped us a box, we put it into our data center and plugged into our network,” he said. “Within 24 hours we were up and running. I’m not exaggerating, it was that easy.”

EHarmony also looked at systems from startup Aster Data Systems and Greenplum. Aster’s technology was intriguing but the company was too immature to take a chance on, he said.  …

Netezza also “worked very aggressively” with eHarmony on pricing.

Robert Mitchell tells it a little differently:

eHarmony has one of the most sophisticated data centers. Joseph Essas, vice president of technology, says the company stores 4 terabytes of data on some 20 million registered users, each of whom has filled out a 400-question psychological profile (eHarmony’s founder is a clinical psychologist).

The company uses proprietary algorithms to score that data against 29 “dimensions of compatibility” — such as values, personality styles, attitudes and interests — and match up customers with the best possible prospects for a long-term relationship.

A giant Oracle 10G database spits out a few preliminary candidates immediately after a user signs up, to prime the pump, but the real matching work happens later, after eHarmony’s system scores and matches up answers to hundreds of questions from thousands of users. The process requires just under 1 billion calculations that are processed in a giant batch operation each day. These MapReduce operations execute in parallel on hundreds of computers and are orchestrated using software written to the open-source Hadoop software platform.

Once matches are sent to users, the users’ actions and outcomes are fed back into the model for the next day’s calculations. For example, if a customer clicked on many matches that were at the outset of his or her geographical range — say, 25 miles away — the system would assume distance wasn’t a deal-breaker and next offer more matches that were just a bit farther away.

“Our biggest challenge is the amount of data that we have to constantly score, move, apply and serve to people, and that is fluid,” Essas says. To that end, the architecture is designed to scale quickly to meet growth and demand peaks around major holidays. The highest demand comes just before Valentine’s Day. “Our demand doubles, if not quadruples,” Essas says.

Comments

5 Responses to “MapReduce user eHarmony chose Netezza over Aster or Greenplum”

  1. Joseph Essas on February 24th, 2009 4:20 pm

    Curt,

    Thanks for your interest on our analytical infrastructure and for a great technology blog.

    I just wanted to clarify that we did look at Aster and others, besides Netezza. What we found in particular was that Aster had most impressive analytical capabilities, both with their MapReduce framework and their analytical operators (like nPath for clickstream analytics) which would have been a great fit for eHarmony’s business. Also, they came with very strong customer references so we had no concerns about the maturity of the product, which were overstated in some press articles. However, Netezza provided the easiest migration path from Oracle to the new DW environment and also priced themselves very aggressively to win our business. In the end, that’s what tilted our decision towards their system. We keep in touch with the Aster team and we will consider their offerings again in the future.

    Joseph Essas, VP Technology, eHarmony.com

  2. Curt Monash on February 25th, 2009 11:52 am

    Hi Joseph, and thanks for clarifying!

    I understand the importance of price. 🙂 But I’m curious as to how the migration to Netezza was easier than the migration to Aster. Just more available tools or something?

    Thanks!

    CAM

  3. Charney Hoffmann on March 2nd, 2009 11:48 am

    Thanks for the fascinating information. I was looking for information on Netezza, and learned some cool stuff about eHarmony too! Great blog.

  4. Todd Fin on April 23rd, 2009 11:08 pm

    eHarmony utilized a Fastreader Wisdomforce extensively to migrate the data from Oracle to Netezza. fastreader has a good netezza integration

  5. Rodrick’s Web Log !! » Blog Archive » MapReduce & Netezza on October 30th, 2009 8:52 pm

    […] recently came across this article about how eHarmoney uses Netezza and HadOOP to match potential love interest very from the […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.