Greenplum
Analysis of data warehouse DBMS vendor Greenplum. Related subjects include:
Links and observations
I’m back from a trip to the SF Bay area, with a lot of writing ahead of me. I’ll dive in with some quick comments here, then write at greater length about some of these points when I can. From my trip: Read more
Notes on EMC’s Greenplum subsidiary
I spent considerable time last week with my clients at both Greenplum and EMC (if we ignore the fact that the deal has closed and they’re now the same company). I also had more of a hardcore engineering discussion than I’ve had with Greenplum for quite a while (I should have been pushier about that earlier). Takeaways included:
- This is starting off as a honeymoon deal. Everything Greenplum was planning to do is being continued. Additional resources are being poured into Greenplum to do more.
- Some Greenplum execs seem to envision staying long term, some seem to envision moving on to their next startups. The ones who envision moving on are, however, going to work hard first to make the merger a success.
- Greenplum has, for quite a while, had more of an advanced analytics/embedded predictive modeling story than I realized. Bad on them for not fleshing it out more in marketing and product packaging alike.
- Greenplum both denies the concurrency problems I previously noted and also has a very credible story as to how it will eliminate them.
Seriously, Greenplum tells of one customer that routinely runs 150 simultaneous queries – on what I think is not a terribly big system — and a number of POCs (Proofs of Concept) that simulated similar levels of concurrency.
| Categories: Analytic technologies, Data warehousing, EMC, Greenplum | Leave a Comment |
More on Greenplum and EMC
I talked with Ben Werther of Greenplum for about 40 minutes, which was my first post-merger Greenplum/EMC briefing. “Historical” highlights include:
- Ben says Greenplum wasn’t being shopped, by which he means Greenplum was out raising more capital and the fund-raising was going well. Note: Half or so of Greenplum’s deals were subscription-priced, so it had weaker cash flow than it would have if it were doing equally well selling perpetual licenses.
- However, joint engineering was also going well with, e.g., Greenplum CTO Luke Lonergan spending time at EMC facilities in Cork, Ireland. And one thing led to another …
- Greenplum has ~ 140 customers, vs. ~65 five quarters ago, 100+ at year-end, and an acquisition rate of 12-15/quarter last fall.
- A typical “small” paying customer for Greenplum starts with 10-20 TB of data.
- Greenplum Chorus isn’t generally available yet, with rollout energy being focused on Greenplum 4.0. Note: As important as it is for overall industry direction, Greenplum Chorus is a product which won’t be a terribly big deal in Release 1 anyway.
Highlights looking forward include: Read more
| Categories: Data warehouse appliances, Data warehousing, EMC, Greenplum, Market share | 6 Comments |
EMC is buying Greenplum
EMC is buying Greenplum. Most of the press release is a general recapitulation of Greenplum’s marketing messages, the main exceptions being (emphasis mine):
The acquisition of Greenplum will be an all-cash transaction and is expected to be completed in the third quarter of 2010, subject to customary closing conditions and regulatory approvals. The acquisition is not expected to have a material impact to EMC GAAP and non-GAAP EPS for the full 2010 fiscal year. Upon close, Bill Cook will lead the new data computing product division and report to Pat Gelsinger. EMC will continue to offer Greenplum’s full product portfolio to customers and plans to deliver new EMC Proven reference architectures as well as an integrated hardware and software offering designed to improve performance and drive down implementation costs.
Greenplum is one of my biggest vendor clients, and EMC is just becoming one, but of course neither side gave me a heads-up before the deal happened, nor have I yet been briefed subsequently. With those disclaimers out of the way, some of my early thoughts include:
- I wish my clients would never buy each other, but it’s inevitable.
- I don’t think anybody evaluating Greenplum should be much influenced by this deal one way or the other. (Whether they will be is of course a different matter.)
- EMC tends to run its bigger software acquisitions in a fairly hands-off manner. There’s no particular FUD (Fear/Uncertainty/Doubt) reason why this deal should stop anybody from buying Greenplum software.
- I also don’t think adding a rich parent adds much of a reason to buy from Greenplum. But if you’re the type who’s nervous about smaller vendors — well, Greenplum now isn’t so small.
- Greenplum Chorus could, in principle, work with non-Greenplum DBMS. That possibility suddenly looks a lot more realistic.
- The list of analytic DBMS vendors with an appliance orientation is pretty impressive, including:
- Oracle, with Exadata
- Microsoft, partially
- Teradata
- Netezza
- Now EMC/Greenplum, at least partially
- Weaker players such as:
- The ailing Kickfire, which a client (not Kickfire itself) tells me is being shopped around
- The reeling HP Neoview
- XtremeData, but I’m still waiting to hear of XtremeData’s first real sale
- Greenplum is something of a specialist in large databases. EMC has to love that.
- Greenplum’s weakness is concurrency.
- Greenplum’s “polymorphic storage” is a good fit for a storage vendor with appliance-y ideas.
- And finally — I think that even software-only analytic DBMS vendors should design their systems in an increasingly storage-aware manner, and have been advising my vendor clients of same. I’ll blog that line of reasoning separately when I get a chance, and edit in a link here after I do.
Related links (edit)
- Here’s the promised post as to why analytic DBMS need to be ever more storage-aware.
- Dave Kellogg crunched the EMC/Greenplum numbers, coming up with an estimated valuation range of $3-400 million, the high end of which is rumored to be correct.
- Merv Adrian suggests the big EMC/Greenplum loser is ParAccel, a viewpoint which presumably presupposes that the EMC/ParAccel partnership was significant in the first place.
- I talked with Ben Werther and posted more about Greenplum and EMC.
| Categories: Data warehouse appliances, EMC, Greenplum, Storage | 11 Comments |
Greenplum et alia’s BigDataNews.com site
Greenplum recently started a website BigDataNews.com, and quickly signed up Aster Data as a co-sponsor. (Edit: As per a comment below, the decision to sign up additional sponsors was made by the site’s independent publisher.) It’s actually being run by Brett Sheppard, a former Gartner/DataQuest analyst who now gets involved in this kind of thing. (Brett and I may be working on another project soon, with Greenplum funding.)
The heart of the site is feeds* from a variety of high-profile blogs (DBMS2, Daniel Abadi’s, Joe Hellerstein’s, James Kobelius’, et al.), plus some additional posts written by Brett (primarily) or Greenplum folks. Highlights of Brett’s posts include:
- What I am told was an unauthorized revelation that Greenplum Chorus is built on CouchDB and Erlang.
- An impassioned defense of the integrity of Gartner’s analysis.
*At least in my case, that’s just a post title or snippet, plus a link back to the main post. The same goes for mapreduce.org, actually.
| Categories: Analytic technologies, Data warehousing, Greenplum, NoSQL | 2 Comments |
Story of an analytic DBMS evaluation
One of our readers was kind enough to walk me through his analytic DBMS evaluation process. The story is:
- The X Company (XCo) has a <1 TB database.
- 100s of XCo’s customers log in at once to run reports. 50-200 concurrent queries is a good target number.
- XCo had been “suffering” with Oracle and wanted to upgrade.
- XCo didn’t have a lot of money to spend. Netezza pulled out of the sales cycle early due to budget (and this was recently enough that Netezza Skimmer could have been bid).
- Greenplum didn’t offer any references that approached the desired number of concurrent users.
- Ultimately the evaluation came down to Vertica and ParAccel.
- Vertica won.
Notes on the Vertica vs. ParAccel selection include: Read more
| Categories: Analytic technologies, Benchmarks and POCs, Buying processes, Data warehousing, Greenplum, Netezza, Oracle, ParAccel, Vertica Systems | 7 Comments |
Greenplum Chorus and Greenplum 4.0
Greenplum is making two product announcements this morning. Greenplum 4.0 is a revision of the core Greenplum database technology. In addition, Greenplum is announcing Greenplum Chorus, which is the first product release instantiating last year’s EDC (Enterprise Data Cloud) vision statement and marketing campaign.
Greenplum 4.0 highlights and related observations include: Read more
Intelligent Enterprise’s Editors’/Editor’s Choice list for 2010
As he has before, Intelligent Enterprise Editor Doug Henschen
- Personally selected annual lists of 12 “Most influential” companies and 36 “Companies to watch” in analytics- and database-related sectors.
- Made it clear that these are his personal selections.
- Nonetheless has called it an Editors’ Choice list, rather than Editor’s Choice.
(Actually, he’s really called it an “award.”)
Comments on the Gartner 2009/2010 Data Warehouse Database Management System Magic Quadrant
At intervals of little over a year, Gartner Group publishes a Data Warehouse Database Management System Magic Quadrant. Gartner’s 2009 data warehouse DBMS Magic Quadrant — actually, January 2010 — is now out.* For many reasons, including those I noted in my comments on Gartner’s 2008 Data Warehouse DBMS Magic Quadrant, the Gartner quadrant pictures are a bad use of good research. Rather than rehash that this year, I’ll merely call out some points in the surrounding commentary that I find interesting or just plain strange. Read more
Greenplum Single-Node Edition — sometimes free is a real cool price
Greenplum is announcing today that you can run Greenplum software on a single 8-core commodity server, free. First and foremost, that’s a strong statement that Greenplum wants enterprises to pay it for Greenplum’s parallelization/”private cloud” capabilities. Second, it may be an attractive gift to a variety of folks who want to extract insight from terabyte-scale databases of various kinds.
Greenplum Single-Node Edition:
- Is free of charge, although you can buy support.
- Has no restrictions on use, production or otherwise.
- Has no restrictions on database size.
- Is closed-source.
For those who want free, terabyte-scale data warehousing software, Greenplum Single-Node Edition may be quite appealing, considering that the main available alternatives are:
- General-purpose open-source DBMS, such as PostgreSQL and MySQL (lacking analytic DBMS performance and features)
- Infobright Community Edition (the other best choice – Infobright’s commercial sales success indicates the solidity of Infobright’s technology)
- Rough research-project code and other other questionable open source offerings
- Crippleware from other commercial analytic DBMS vendors (e.g., Teradata)
For example, comparing PostgreSQL-based Greenplum with PostgreSQL itself, Greenplum offers:
- The ability to scale out queries across all cores in your box (and no, pgpool is not a serious alternative)
- Storage alternatives such as columnar (I am told that EnterpriseDB recently stopped funding a project for a PostgreSQL columnar option)
