Emulation, transparency, portability
Analysis of products that support the emulation of market-leading database management systems. Related subjects include:
Hortonworks, IBM, EMC Pivotal and others have announced a project called “Open Data Platform” to do … well, I’m not exactly sure what. Mainly, it sounds like:
- An attempt to minimize the importance of any technical advantages Cloudera or MapR might have.
- A face-saving way to admit that IBM’s and Pivotal’s insistence on having their own Hadoop distributions has been silly.
- An excuse for press releases.
- A source of an extra logo graphic to put on marketing slides.
Edit: Now there’s a press report saying explicitly that Hortonworks is taking over Pivotal’s Hadoop distro customers (which basically would mean taking over the support contracts and then working to migrate them to Hortonworks’ distro).
The claim is being made that this announcement solves some kind of problem about developing to multiple versions of the Hadoop platform, but to my knowledge that’s a problem rarely encountered in real life. When you already have a multi-enterprise open source community agreeing on APIs (Application Programming interfaces), what API inconsistency remains for a vendor consortium to painstakingly resolve?
Anyhow, it now seems clear that if you want to use a Hadoop distribution, there are three main choices:
- Cloudera’s flavor, whether as software (from Cloudera) or in an appliance (e.g. from Oracle).
- MapR’s flavor, as software from MapR.
- Hortonworks’ flavor, from a number of vendors, including Hortonworks, IBM, Pivotal, Teradata et al.
In saying that, I’m glossing over a few points, such as: Read more
|Categories: Amazon and its cloud, Cloudera, EMC, Emulation, transparency, portability, Greenplum, Hadoop, Hortonworks, IBM and DB2, MapR, Open source||10 Comments|
There is much confusion about migration, by which I mean applications or investment being moved from one “platform” technology — hardware, operating system, DBMS, Hadoop, appliance, cluster, cloud, etc. — to another. Let’s sort some of that out. For starters:
- There are several fundamentally different kinds of “migration”.
- You can re-host an existing application.
- You can replace an existing application with another one that does similar (and hopefully also new) things. This new application may be on a different platform than the old one.
- You can build or buy a wholly new application.
- There’s also the inbetween case in which you extend an old application with significant new capabilities — which may not be well-suited for the existing platform.
- Motives for migration generally fall into a few buckets. The main ones are:
- You want to use a new app, and it only runs on certain platforms.
- The new platform may be cheaper to buy, rent or lease.
- The new platform may have lower operating costs in other ways, such as administration.
- Your employees may like the new platform’s “cool” aspect. (If the employee is sufficiently high-ranking, substitute “strategic” for “cool”.)
- Different apps may be much easier or harder to re-host. At two extremes:
- It can be forbiddingly difficult to re-host an OLTP (OnLine Transaction Processing) app that is heavily tuned, tightly integrated with your other apps, and built using your DBMS vendor’s proprietary stored-procedure language.
- It might be trivial to migrate a few long-running SQL queries to a new engine, and pretty easy to handle the data connectivity part of the move as well.
- Certain organizations, usually packaged software companies, design portability into their products from the get-go, with at least partial success.
A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”
*If you believe that Splunk is a data integration company, that changes these observations only a little.
On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.
The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.
MemSQL’s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):
Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.
All four of those use cases fit MemSQL’s positioning in “real-time analytics”. Besides Zynga, MemSQL cites penetration into traditional low-latency markets — financial services (various subsectors) and ad-tech.
Highlights of MemSQL’s new distributed architecture start: Read more
|Categories: Clustering, Database compression, Emulation, transparency, portability, Games and virtual worlds, Investment research and trading, Log analysis, MemSQL, MySQL, NewSQL, Transparent sharding, Zynga||6 Comments|
As vendors so often do, Teradata has caused itself some naming confusion. SQL-H was introduced as a facility of Teradata Aster, to complement SQL-MR.* But while SQL-MR is in essence a set of SQL extensions, SQL-H is not. Rather, SQL-H is a transparency interface that makes Hadoop data responsive to the same code that would work on Teradata Aster …
*Speaking of confusion — Teradata Aster seems to use the spellings SQL/MR and SQL-MR interchangeably.
… except that now there’s also a SQL-H for regular Teradata systems as well. While it has the same general features and benefits as SQL-H for Teradata Aster, the details are different, since the underlying systems are.
I hope that’s clear.
|Categories: Data integration and middleware, Data warehousing, Emulation, transparency, portability, Hadoop, SQL/Hadoop integration, Teradata||2 Comments|
The 2012 Gartner Magic Quadrant for Data Warehouse Database Management Systems is out. I’ll split my comments into two posts — this one on concepts, and a companion on specific vendor evaluations.
- Maintaining working links to Gartner Magic Quadrants is an adventure. But as of early February, 2013, this link seems live.
- I also commented on the 2011, 2010, 2009, 2008, 2007, and 2006 Gartner Magic Quadrants for Data Warehouse DBMS.
Let’s start by again noting that I regard Gartner Magic Quadrants as a bad use of good research. On the facts:
- Gartner collects a lot of input from traditional enterprises. I envy that resource.
- Gartner also does a good job of rounding up vendor claims about user base sizes and the like. If nothing else, you should skim the MQ report for that reason.
- Gartner observations about product feature sets are usually correct, although not so consistently that they should be relied on.
When it comes to evaluations, however, the Gartner Data Warehouse DBMS Magic Quadrant doesn’t do as well. My concerns (which overlap) start:
- The Gartner MQ conflates many different use cases into one ranking (inevitable in this kind of work, but still regrettable).
- A number of the MQ vendor evaluations seem hard to defend. So do some of Gartner’s specific comments.
- Some of Gartner’s criteria seemingly amount to “parrots back our opinions to us”.
- As do I, Gartner thinks a vendor’s business and financial strength are important. But Gartner overdoes the matter, drilling down into picky issues it can’t hope to judge, such as assessing a vendor’s “ability to generate and develop leads.” *
- The 2012 Gartner Data Warehouse DBMS Magic Quadrant is closer to being a 1-dimensional ranking than 2-dimensional, in that entries are clustered along the line x=y. This suggests strong correlation among the results on various specific evaluation criteria.
|Categories: Data integration and middleware, Data warehousing, Database compression, Emulation, transparency, portability, Hadoop, Market share and customer counts, Oracle, Text||5 Comments|
When I grumbled about the conference-related rush of Hadoop announcements, one example of many was Teradata Aster’s SQL-H. Still, it’s an interesting idea, and a good hook for my first shot at writing about HCatalog. Indeed, other than the Talend integration bundled into Hortonworks’ HDP 1, Teradata SQL-H is the first real use of HCatalog I’m aware of.
The Teradata SQL-H idea is:
- Register your Hadoop data to HCatalog. I’ll confess to being unclear about the details of how that works, for example in the case of data that just doesn’t fit well into flat relational tables. Stay tuned for future posts. For now, I’ll just note that:
- HCatalog is closely based on Hive’s metadata management. If you’ve run Hive against the data, HCatalog should already know about it.
- HCatalog can handle Pig and HBase data as well.
- Write SQL DDL (Data Description Language) so that your Aster cluster knows about the data.
- Write any Teradata Aster SQL/MR against that data. Some of the execution will be done on the Hadoop cluster, but pulling data back into Aster may well be necessary.
At least in theory, Teradata SQL-H lets you use a full set of analytic tools against your Hadoop data, with little limitation except price and/or performance. Teradata thinks the performance of all this can be much better than if you just use Hadoop (35X was mentioned in one particularly favorable example), but perhaps much worse than if you just copy/extract the data to an Aster cluster in the first place.
So what might the use cases be for something like SQL-H? Offhand, I’d say:
- SQL-H use cases are probably focused in areas where copying the data to Aster in advance doesn’t make a lot of sense. So presumably …
- … the Hadoop clusters involved would hold a lot more data than you’d want to pay for storing in Teradata Aster. E.g., think of cases where Hadoop is used as a big bit bucket or archival data store.
- There could be a kind of investigative workflow. First you play around with the Hadoop data via SQL-H. Then when you think you’re onto something, you set up ETL (Extract/Transform/Load) to get the data into Aster and ratchet up the effort.
By way of contrast, the whole thing makes less sense for dashboarding kinds of uses, unless the dashboard users are very patient when they want to drill down.
|Categories: Aster Data, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, Emulation, transparency, portability, Hadoop, MapReduce, SQL/Hadoop integration, Teradata||10 Comments|
Last Friday I stopped by Oracle for my first conversation since January, 2010, in this case for a chat with Andy Mendelsohn, Mark Townsend, Tim Shetler, and George Lumpkin, covering Exadata and the Oracle DBMS. Key points included: Read more
Edit: ANTs Software seems to have subsequently collapsed, which may be why some of these links broke too.
Jeff Pryslak of Sybase put up a post insulting ANTs Software and the general idea of ANTs-aided Sybase-to-DB2 migration. CEO Joe Kozak of ANTs hit back with a rambling diatribe, which came to my attention because he mentioned my name in it, making some rather fanciful remarks about the “long” relationship I used to have with ANTs Software. (I do recall at least one briefing, plus some attempts from them to buy my services under the condition that I agree to a ridiculous NDA, which I refused to sign.)
This piqued my interest, so — recalling that ANTs is a public company — I decided to take a look at just how successful their software products business is. Well, for the quarter ended March 31, 2010, ANTs’ 10-Q filing says (emphasis mine): Read more
EnterpriseDB has some deplorable business practices (my stories of being screwed by EnterpriseDB have been met by “Well, you’re hardly the only one”). But a couple of more successful DBMS vendors have happily partnered with EnterpriseDB even so, to help pick off Oracle users. IBM’s approach was in the vein of an EnterpriseDB-infused version of SQL handling within DB2.* Netezza just announced an EnterpriseDB-based Netezza Migrator that is rather different.
*The comment threads are the most informative parts of those posts.
I’m a little unclear as to the Netezza Migrator details, not least because Netezza folks don’t seem to care too much about Netezza Migrator themselves. That said, the core ideas of Netezza Migrator are: Read more
|Categories: Data integration and middleware, Data warehousing, Emulation, transparency, portability, EnterpriseDB and Postgres Plus, Netezza, Oracle||19 Comments|