Discussion of storage titan EMC, especially its efforts in the data warehouse appliance market. Related subjects include:
1. EMC Greenplum has evolved its appliance product line. As I read that, the latest announcement boils down to saying that you can neatly network together various Greenplum appliances in quarter-rack increments. If you take a quarter rack each of four different things, then Greenplum says “Hooray! Our appliance is all-in-one!” Big whoop.
2. That said, the Hadoop part of EMC ‘s story is based on MapR, which so far as I can tell is actually a pretty good Hadoop implementation. More precisely, MapR makes strong claims about performance and so on, and Apache Hadoop folks don’t reply “MapR is full of &#$!” Rather, they say “We’re going to close the gap with MapR a lot faster than the MapR folks like to think — and by the way, guys, thanks for the butt-kick.” A lot more precision about MapR may be found in this M. C. Srivas SlideShare.
|Categories: Data warehouse appliances, eBay, EMC, Greenplum, Hadoop, MapR, MapReduce, Open source, Oracle||2 Comments|
Alan Scott commented with concern about Parallel Iron’s patent lawsuit attacking HDFS (Hadoop Distributed File System), filed in — where else? — Eastern Texas. The patent in question — US 7,415,565 — seems to in essence cover any shared-nothing block storage that exploits a “configurable switch fabric”; indeed, it’s more oriented to OLTP (OnLine Transaction Processing) than to analytics. For example, the Background section starts: Read more
There’s been a flurry of announcements recently in the Hadoop world. Much of it has been concentrated on Hadoop data storage and management. This is understandable, since HDFS (Hadoop Distributed File System) is quite a young (i.e. immature) system, with much strengthening and Bottleneck Whack-A-Mole remaining in its future.
Known HDFS and Hadoop data storage and management issues include but are not limited to:
- Hadoop is run by a master node, and specifically a namenode, that’s a single point of failure.
- HDFS compression could be better.
- HDFS likes to store three copies of everything, whereas many DBMS and file systems are satisfied with two.
- Hive (the canonical way to do SQL joins and so on in Hadoop) is slow.
Different entities have different ideas about how such deficiencies should be addressed. Read more
|Categories: Aster Data, Cassandra, Cloudera, Data warehouse appliances, DataStax, EMC, Greenplum, Hadapt, Hadoop, IBM and DB2, MapReduce, MongoDB and 10gen, Netezza, Parallelization||22 Comments|
I talked with SAS about its new approach to parallel modeling. The two key points are:
- SAS no longer plans to go as far with in-database modeling as it previously intended.
- Rather, SAS plans to run in RAM on MPP DBMS appliances, exploiting MPI (Message Passing Interface).
The whole thing is called SAS HPA (High-Performance Analytics), in an obvious reference to HPC (High-Performance Computing). It will run initially on RAM-heavy appliances from Teradata and EMC Greenplum.
A lot of what’s going on here is that SAS found it annoyingly difficult to parallelize modeling within the framework of a massively parallel DBMS such as Teradata. Notes on that aspect include:
- SAS wasn’t exploiting the capabilities of individual DBMS to their fullest; rather, it was looking for an approach that would work across multiple brands of DBMS. Thus, for example, the fact that Aster’s analytic platform architecture is more flexible or powerful than Teradata’s didn’t help much with making SAS run within the Aster nCluster database.
- Notwithstanding everything else, SAS did make a certain set of modeling procedures run in-database.
- SAS’ previous plans to run in-database modeling in Aster and/or Netezza DBMS may never come to fruition.
|Categories: Aster Data, Data warehouse appliances, Data warehousing, EMC, Greenplum, Memory-centric data management, Netezza, Parallelization, Predictive modeling and advanced analytics, SAS Institute, Teradata, Workload management||7 Comments|
A well-connected tipster believes:
- EMC Greenplum’s* revenue target for Q1 had been $35 million.
- Actual EMC Greenplum revenue for Q1 was $3 million, or maybe it was $8 million.
- EMC Greenplum had 75 sales teams trying to generate this revenue.
I am annoyed with my former friends at Greenplum, who took umbrage at a brief sentence I wrote in October, namely “eBay has thrown out Greenplum“. Their reaction included:
- EMC Greenplum no longer uses my services.
- EMC Greenplum no longer briefs me.
- EMC Greenplum reneged on a commitment to fund an effort in the area of privacy.
The last one really hurt, because in trusting them, I put in quite a bit of effort, and discussed their promise with quite a few other people.
|Categories: Analytic technologies, Aster Data, Data integration and middleware, Data warehouse appliances, Data warehousing, EAI, EII, ETL, ELT, ETLT, EMC, Greenplum, SAS Institute, Solid-state memory||8 Comments|
Edit: This disclosure has been superseded by a March, 2012 version.
From time to time, I disclose our vendor client lists. Another iteration is below. To be clear:
- This is a list of Monash Advantage members.
- All our vendor clients are Monash Advantage members, unless …
- … we work with them primarily in their capacity as technology users. (A large fraction of our user clients happen to be SaaS vendors.)
- We do not usually disclose our user clients.
- We do not usually disclose our venture capital clients, nor those who invest in publicly-traded securities.
- Included in the list below are two expired Monash Advantage members who haven’t said they will renew, as mentioned in my recent post on analyst bias. (You can probably imagine a couple of reasons for that obfuscation.)
With that said, our vendor client disclosures at this time are:
- Aster Data
- SAND Technology
- Schooner Information Technology
The Forrester Wave: Enterprise Data Warehouse Platforms, Q1 2011 is now out,* hot on the heels of the Gartner Magic Quadrant. Unfortunately, this particular Forrester Wave is riddled with inaccuracy. Read more
|Categories: Analytic technologies, Columnar database management, Data warehousing, EMC, Exadata, Greenplum, Netezza, Oracle, Pricing, SAP AG, Sybase, Teradata, Vertica Systems||8 Comments|
Edit: Comments on the February, 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems — and on the companies reviewed in it — are now up.
The Gartner 2010 Data Warehouse Database Management Systems Magic Quadrant is out. I shall now comment, just as I did to varying degrees on the 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants.
Note: Links to Gartner Magic Quadrants tend to be unstable. Please alert me if any problems arise; I’ll edit accordingly.
In my comments on the 2008 Gartner Data Warehouse Database Management Systems Magic Quadrant, I observed that Gartner’s “completeness of vision” scores were generally pretty reasonable, but their “ability to execute” rankings were somewhat bizarre; the same remains true this year. For example, Gartner ranks Ingres higher by that metric than Vertica, Aster Data, ParAccel, or Infobright. Yet each of those companies is growing nicely and delivering products that meet serious cutting-edge analytic DBMS needs, neither of which has been true of Ingres since about 1987. Read more
I posted last October about PADB (ParAccel Analytic DataBase), but held back on various topics since PADB 3.0 was still under NDA. By the time PADB 3.0 was released, I was on blogging hiatus. Let’s do a bit of ParAccel catch-up now.
|Categories: Analytic technologies, Data warehousing, EMC, MapReduce, ParAccel, Parallelization, Storage||2 Comments|