Analysis of data warehouse DBMS vendor Greenplum and its successor, EMC’s Data Computing division. Related subjects include:
My former friends at Greenplum no longer talk to me, so in particular I wasn’t briefed on Pivotal HD and Greenplum HAWQ. Pivotal HD seems to be yet another Hadoop distribution, with the idea that you use Greenplum’s management tools. Greenplum HAWQ seems to be Greenplum tied to HDFS.
The basic idea seems to be much like what I mentioned a few days ago — the low-level file store for Greenplum can now be something else one has heard of before, namely HDFS (Hadoop Distributed File System, which is also an option for, say, NuoDB). Beyond that, two interesting quotes in a Greenplum blog post are:
When a query starts up, the data is loaded out of HDFS and into the HAWQ execution engine.
In addition, it has native support for HBase, supporting HBase predicate pushdown, hive[sic] connectivity, and offering a ton of intelligent features to retrieve HBase data.
The first sounds like the invisible loading that Daniel Abadi wrote about last September on Hadapt’s blog. (Edit: Actually, see Daniel’s comment below.) The second sounds like a good idea that, again, would also be a natural direction for vendors such as Hadapt.
Comments on Gartner’s 2012 Magic Quadrant for Data Warehouse Database Management Systems — evaluations
To my taste, the most glaring mis-rankings in the 2012/2013 Gartner Magic Quadrant for Data Warehouse Database Management are that it is too positive on Kognitio and too negative on Infobright. Secondarily, it is too negative on HP Vertica, and too positive on ParAccel and Actian/VectorWise. So let’s consider those vendors first.
Gartner seems confused about Kognitio’s products and history alike.
- Gartner calls Kognitio an “in-memory” DBMS, which is not accurate.
- Gartner doesn’t remark on Kognitio’s worst-in-class* compression.
- Gartner gives Kognitio oddly high marks for a late, me-too Hadoop integration strategy.
- Gartner writes as if Kognitio’s next attempt at the US market will be the first one, which is not the case.
- Gartner says that Kognitio pioneered data warehouse SaaS (Software as a Service), which actually has existed since the pre-relational 1970s.
Gartner is correct, however, to note that Kognitio doesn’t sell much stuff overall.
In the cases of HP Vertica, Infobright, ParAccel, and Actian/VectorWise, the 2012 Gartner Magic Quadrant for Data Warehouse Database Management’s facts are fairly accurate, but I dispute Gartner’s evaluation. When it comes to Vertica: Read more
I was asked to clarify one of my July comments on Oracle12c,
I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.
by somebody smart who however seemed to have half-forgotten my post comparing (hybrid) columnar compression to (hybrid) columnar storage.
In simplest terms:
- Columnar storage and columnar compression are two different things. The main connections are:
- Columnar storage can make columnar compression more effective.
- In different ways, both technologies reduce I/O.
- EMC Greenplum, Teradata Aster, and Teradata Classic are all originally row-based systems that have gone hybrid columnar.
- Vertica is an originally column-based system that has gone hybrid columnar.
|Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Oracle, Teradata, Vertica Systems||4 Comments|
This is a draft entry for the DBMS2 analytic glossary. Please comment with any ideas you have for its improvement!
Note: Words and phrases in italics will be linked to other entries when the glossary is complete.
A data warehouse appliance is a combination of hardware and software that includes an analytic DBMS (DataBase Management System). However, some observers incorrectly apply the term “data warehouse appliance” to any analytic DBMS.
The paradigmatic vendors of data warehouse appliances are:
- Teradata, which embraced the term “data warehouse appliance” in 2008.
- Netezza — now an IBM company — which popularized the term “data warehouse appliance” in the 2000s.
Further, vendors of analytic DBMS commonly offer — directly or through partnerships — optional data warehouse appliance configurations; examples include:
- Greenplum, now part of EMC.
- Vertica, now an HP company.
- IBM DB2, under the brand “Smart Analytic System”.
- Microsoft (Parallel Data Warehouse).
Oracle Exadata is sometimes regarded as a data warehouse appliance as well, despite not being solely focused on analytic use cases.
Data warehouse appliances inherit marketing claims from the category of analytic DBMS, such as: Read more
|Categories: Analytic glossary, Data warehouse appliances, Data warehousing, EMC, Exadata, Greenplum, HP and Neoview, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Teradata||4 Comments|
In a call Monday with a prominent company, I was told:
- Teradata, Netezza, Greenplum and Vertica aren’t relational.
- Teradata, Netezza, Greenplum and Vertica are all data warehouse appliances.
That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.
In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …
1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.
2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:
- Was designed to do relational stuff* from the get-go, even if it now does other things too.
- Supports a lot of SQL.
- Was designed primarily to do non-relational things.*
- Doesn’t support all that much SQL.
*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.
3. There are two chief kinds of relational DBMS: Read more
I’d like to survey a few related ideas:
- Enterprises should each have a variety of different analytic data stores.
- Vendors — especially but not only IBM and Teradata — are acknowledging and marketing around the point that enterprises should each have a number of different analytic data stores.
- In addition to having multiple analytic data management technology stacks, it is also desirable to have an agile way to spin out multiple virtual or physical relational data marts using a single RDBMS. Vendors are addressing that need.
- Some observers think that the real essence of analytic data management will be in data integration, not the actual data management.
Here goes. Read more
|Categories: Data warehousing, Database diversity, EAI, EII, ETL, ELT, ETLT, Exadata, Greenplum, Hadoop, Hortonworks, IBM and DB2, Informatica, Netezza, Oracle, Sybase, Teradata, Workload management||11 Comments|
This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:
- In general, I regard Gartner Magic Quadrants as a bad use of good research.
- Illustrating the uselessness of — or at least poor execution on — the overall quadrant metaphor, a large majority of the vendors covered are lined up near the line x = y, each outpacing the one below in both of the quadrant’s dimensions.
- I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year’s versions. Two factors jump to mind as possible reasons:
- This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn’t add as much discussion of overall trends. So there’s less to (potentially) disagree with.
- Merv Adrian is now at Gartner.
- Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations.
*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.
Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more
Forrester has released its Q1 2012 Forrester Wave: Enterprise Hadoop Solutions. (Googling turns up a direct link, but in case that doesn’t prove stable, here also is a registration-required link from IBM’s Conor O’Mahony.) My comments include:
- The Forrester Wave’s relative vendor rankings are meaningless, in that the document compares apples, peaches, almonds, and peanuts. Apparently, it covers any vendor that includes a distribution of Apache Hadoop MapReduce into something it offers, and that offered at least two (not necessarily full production) references for same.
- The Forrester Wave for “enterprise Hadoop” contradicts itself on the subject of Hortonworks.
- The Forrester Wave for “enterprise Hadoop” is correct when it says “Hortonworks … has Hadoop training and professional services offerings that are still embryonic.”
- Peculiarly, the Forrester Wave for “enterprise Hadoop” also says “Hortonworks offers an impressive Hadoop professional services portfolio”. Hortonworks will likely win one or more nice partnership deals with vendors in adjacent fields, but even so its professional services capabilities are … well, a good word might be “embryonic”.
- Forrester Waves always seem to have weird implicit definitions of “data warehousing”. This one is no exception.
- Forrester gave top marks in “Functionality” to 11 of 13 “enterprise Hadoop” vendors. This seems odd.
- I don’t know why MapR, which doesn’t like HDFS (Hadoop Distributed File System), got top marks in “Subproject integration”.
- Forrester gave top marks in “Storage” to Datameer. It also gave higher marks to MapR than to EMC Greenplum, even though EMC Greenplum’s technology is a superset of MapR’s. Very strange. (Edit: Actually, as per a comment below, there is some uncertainty about the EMC/MapR relationship.)
- Forrester gave higher marks in “Acceleration and optimization” to Hortonworks than to Cloudera and IBM, and higher marks yet to Pentaho. Very odd.
- I’m not sure what Forrester is calling a “Distributed EDW file store connector”, but it sounds like something that Cloudera has provided via partnership to a number of analytic DBMS vendors.
- Forrester’s “Strategy” rankings seem to correlate to a metric of “We’re a large enough vendor to go in N directions at once”, for various values of N.
- Forrester is correct to rank Cloudera’s “Adoption” as being stronger than EMC/Greenplum’s or MapR’s. But Hortonworks’ strong mark for “Adoption” baffles me.
|Categories: Cloudera, Data warehousing, EMC, Greenplum, Hadoop, Hortonworks, MapR, MapReduce, Pentaho||11 Comments|
In a comedy of briefing errors, I’m not too clear on the details of my client salesforce.com’s new PostgreSQL-as-a-service offering, nor exactly on what my clients at VMware are bringing to the PostgreSQL virtualization/cloud party. That said:
- PostgreSQL is good technology.
- MySQL is narrowing the gap, but PostgreSQL is still ahead of MySQL in some ways. (Database extensibility if nothing else.)
- PostgreSQL has a lot of users. (Many of them in academia and/or Russia.)
- Neither EnterpriseDB (which now calls itself “The enterprise PostgreSQL company”) nor the PostgreSQL community leadership have covered themselves with stewardship glory.
- A significant number of interesting DBMS products can be regarded as PostgreSQL forks (e.g. Greenplum, Aster Data nCluster, Netezza if you squint, and Vertica if you stand on your head*).
- PostgreSQL advancement is not dead. For example, Hadapt beta users are running actual PostgreSQL on many nodes each.
- There’s no assurance that Oracle will be a benevolent MySQL steward forever. (Specifically, Oracle’s “Play nicely with others” antitrust commitments expire in 2014.)
So I think it would be cool if one or the other big company put significant wood behind the PostgreSQL arrow.
*While Vertica was originally released using little or no PostgreSQL code — reports varied — it featured high degrees of PostgreSQL compatibility.
|Categories: Aster Data, EnterpriseDB and Postgres Plus, Greenplum, MySQL, Netezza, Open source, salesforce.com, Vertica Systems||8 Comments|
As a new year approaches, it’s the season for lists, forecasts and general look-ahead. Press interviews of that nature have already begun. And so I’m working on a trilogy of related posts, all based on an inquiry about hot analytic trends for 2012.
This post is a moderately edited form of an actual interview. Two other posts cover analytic trends to watch (planned) and analytic vendor execution challenges to watch (already up).
|Categories: Business intelligence, Cloud computing, Data warehouse appliances, Data warehousing, EMC, Greenplum, HP and Neoview, QlikTech and QlikView, SAP AG, Software as a Service (SaaS), Tableau Software, Vertica Systems||4 Comments|