August 25, 2008

Greenplum is in the big leagues

After a March, 2007 call, I didn’t talk again with Greenplum until earlier this month. That changed fast. I flew out to see Greenplum last week and spent over a day with president/co-founder Scott Yara, CTO/co-founder Luke Lonergan, marketing VP Paul Salazar, and product management/marketing director Ben Werther. Highlights – besides some really great sushi at Sakae in Burlingame – start with an eye-opening set of customer proof points, such as:

Even though the bulk of Greenplum’s revenue comes from the Sun appliance relationship, 20 paying customers run Greenplum on Linux. Another interesting demographic is that 25-40% of Greenplum’s revenue tends to come from Asia (obviously, the figure fluctuates greatly from quarter to quarter). Perhaps not coincidentally, one of Greenplum’s three salespeople last year was based in Asia. (The current total is 15, and growing fast.)

Technical highlights include:

Comments

19 Responses to “Greenplum is in the big leagues”

  1. Greenplum’s single biggest customer | DBMS2 -- DataBase Management System Services on August 25th, 2008 3:04 pm

    […] offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except […]

  2. Greenplum’s single biggest customer | DBMS2 -- DataBase Management System Services on August 25th, 2008 3:04 pm

    […] offered a bit of clarification regarding the usage figures I posted last night. Everything on the list is in production, except […]

  3. Confluence: Office of the CTO on August 26th, 2008 1:29 pm

    BI Futurewatch…

    This page tracks trends in BI which are not likely to be implemented at PayCycle in the near term…….

  4. Anon on August 27th, 2008 10:52 am

    Dead link on “Confluence: …”

  5. Sales figures for analytic DBMS | DBMS2 -- DataBase Management System Services on August 29th, 2008 11:25 pm

    […] claims 50 paying customers, all within the past year. Greenplum also claims 50 paying customers, almost all within the past […]

  6. Are analytic DBMS vendors overcomplicating their interconnect architectures? | DBMS2 -- DataBase Management System Services on August 30th, 2008 2:10 am

    […] is a big challenge there. Among the very-large-scale MPP data warehouse software vendors, Greenplum is unusual in that its interconnect of choice is (sufficiently many) cheap 1 gigabit Ethernet […]

  7. Estimating user data vs. spinning disk | DBMS2 -- DataBase Management System Services on September 1st, 2008 6:01 am

    […] just to confuse things — compression can get most or all of that back. For example, at a multi-petabyte customer that is loading up its Greenplum/Thor machines now, early indications suggest a compression factor […]

  8. Phil Rack on September 9th, 2008 4:04 pm

    So how do you use R for large scale analytics when it has to hold its data in memory?

  9. Luke Lonergan on September 9th, 2008 6:01 pm

    Hi Phil,

    That’s one problem with R in general, it holds its results in RAM.

    With Greenplum, we enable you to run R programs as stored procedures, which provides you the ability to reuse the math routines in R to some extent, specifically to help you calculate intermediate results as part of WINDOW functions or other OLAP use-cases.

    We have also re-implemented some of the routines that R provides as native parallel functions within Greenplum, including multi-variable linear regression, a naive bayes classifier and some others.

  10. Curt Monash on September 9th, 2008 6:05 pm

    Luke,

    It might be helpful if you listed a few ways R results might wind up on disk — if indeed there are a few different ways. :)

    Thanks,

    CAM

  11. Luke Lonergan on September 9th, 2008 6:15 pm

    I see – it’s actually still an in-memory proposition in Greenplum within the R functions themselves, but we can stream data through the R functions and the output may end up spooling to disk if our optimizer thinks it has to.

    An example use-case where we’ve used R as a UDF: doing various forms of linear regression required the use of a matrix pseudo-inverse routine to solve the eigenvalue problem. Instead of writing our own pseudo-inverse routine, we instead used the one that comes with R to evaluate different approaches. The matrix solve part is actually pretty small, so we were able to do it in memory as the final stage of processing and the R routine was a good fit.

    In the end, we ended up implementing our own pseudo-inverse routine, now available as the ‘pinv()’ from within Greenplum. It’s written in C internally and is blazingly fast.

    So – the embedded R UDF capability within Greenplum is useful, but it’s often good to re-write the routine for performance optimization when moving to production. We provide many of these kinds of functions to our customers in the form of libraries. Note that we also provide a large array of built-in matrix manipulation routines as well.

  12. Phil Rack on September 9th, 2008 7:01 pm

    Interesting indeed. I’ve been writing some software for WPS as well as SAS so that users can have access to R routines and R graphics. Of course, part of the problem is the memory constraint issue. I’ve been playing with executing R where the user can determine which R routines/programs they want to run in parallel and have WPS or SAS collect the output and write it back into the appropriate windows. This actually works but I’m not satisfied with what I have done.

    Since I don’t see R going 64 bit on Windows anytime soon, I’m starting the process of specing out a system where R runs in a Linux 64 bit OS and has access to a lot more memory space to solve statistical problems. Currently, the idea is to make the Linux system a VM that is easily installed and has all quite a bit of the R libraries already installed.

    All I need is time!

  13. Web analytics — clickstream and network event data | DBMS2 -- DataBase Management System Services on September 22nd, 2008 6:10 am

    […] believe that both of the previously mentioned petabyte+ databases on Greenplum will feature clickstream […]

  14. Greenplum pushes envelope with MapReduce and parallelism enhancements to its extreme-scale data offering | Dana Gardner’s BriefingsDirect | ZDNet.com on September 29th, 2008 9:50 am

    […] promise to wrap MapReduce into the newest version of its data solutions. The announcement from the data warehousing and analytics supplier comes to a fast-changing landscape, given last week’s HP-Oracle Exadata […]

  15. Infology.Ru » Blog Archive » Оценивая КПД системы хранения: какую долю объема системы хранения занимают данные пользователя on October 21st, 2008 5:14 pm

    […] вернуть все или почти все это назад. Например, для клиента, у которого объем хранилища равен нескольким п… и который сейчас загружает данными свои системы […]

  16. Greenplum – Reaching Escape Velocity « Market Strategies for IT Suppliers on May 11th, 2009 6:50 pm

    […] there’s no need for me to do that here – Curt Monash does an excellent job on this post from 2008, and he recently talked with Ebay about their use of Greenplum on a massive scale in this article. […]

  17. Greenplum update — Release 3.3 and so on | DBMS2 -- DataBase Management System Services on September 21st, 2009 4:52 am

    […] Greenplum had about 65 paying customers at the end of Q1. I’ve forgotten how that jibes with a figure of 50 customers last August. […]

  18. Greenplum customer notes | DBMS2 -- DataBase Management System Services on October 18th, 2009 2:43 pm

    […] As of the past quarter or two, <10% of Greenplum’s sales activity is on Sun, which works out to maybe one sale per quarter and at most a small number of sales cycles. (That’s down from from 50%+ not that long ago.) […]

  19. Greenplum Chorus and Greenplum 4.0 | DBMS2 -- DataBase Management System Services on April 12th, 2010 7:54 am

    […] customers, including Fox/MySpace, eBay, Sears, and T-Mobile. While Fox/MySpace never got to the predicted 1-petabyte level of user data, T-Mobile is loosely projected to indeed get there. The same […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.