Analysis of data warehouse appliance vendor DATAllegro and its products. Related subjects include:
I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:
- Many of the independent vendors were swooped up by acquisition.
- None of those acquisitions was a big success.
- Microsoft did little with DATAllegro.
- Netezza struggled with R&D after being bought by IBM. An IBMer recently told me that their main analytic RDBMS engine was BLU.
- I hear about Vertica more as a technology to be replaced than as a significant ongoing market player.
- Pivotal open-sourced Greenplum. I have detected few people who care.
- Ditto for Actian’s offerings.
- Teradata claimed a few large Aster accounts, but I never hear of Aster as something to compete or partner with.
- Smaller vendors fizzled too. Hadapt and Kickfire went to Teradata as more-or-less acquihires. InfiniDB folded. Etc.
- Impala and other Hadoop-based alternatives are technology options.
- Oracle, Microsoft, IBM and to some extent SAP/Sybase are still pedaling along … but I rarely talk with companies that big.
Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.
My answer, in a nutshell, is:
Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud – are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.
Two subjects in one post, because they were too hard to separate from each other
Any sufficiently complex software is developed in modules and subsystems. DBMS are no exception; the core trinity of parser, optimizer/planner, and execution engine merely starts the discussion. But increasingly, database technology is layered in a more fundamental way as well, to the extent that different parts of what would seem to be an integrated DBMS can sometimes be developed by separate vendors.
Major examples of this trend — where by “major” I mean “spanning a lot of different vendors or projects” — include:
- The object/relational, aka universal, extensibility features developed in the 1990s for Oracle, DB2, Informix, Illustra, and Postgres. The most successful extensions probably have been:
- Geospatial indexing via ESRI.
- Full-text indexing, notwithstanding questionable features and performance.
- MySQL storage engines.
- MPP (Massively Parallel Processing) analytic RDBMS relying on single-node PostgreSQL, Ingres, and/or Microsoft SQL Server — e.g. Greenplum (especially early on), Aster (ditto), DATAllegro, DATAllegro’s offspring Microsoft PDW (Parallel Data Warehouse), or Hadapt.
- Splits in which a DBMS has serious processing both in a “database” layer and in a predicate-pushdown “storage” layer — most famously Oracle Exadata, but also MarkLogic, InfiniDB, and others.
- SQL-on-HDFS — Hive, Impala, Stinger, Shark and so on (including Hadapt).
Other examples on my mind include:
- Data manipulation APIs being added to key-value stores such as Couchbase and Aerospike.
- TokuMX, the Tokutek/MongoDB hybrid I just blogged about.
- NuoDB’s willing reliance on third-party key-value stores (or HDFS in the role of one).
- FoundationDB’s strategy, and specifically its acquisition of Akiban.
And there are several others I hope to blog about soon, e.g. current-day PostgreSQL.
In an overlapping trend, DBMS increasingly have multiple data manipulation APIs. Examples include: Read more
- Vertica is putting out a press release today touting its 100th customer, and talking of triple digit growth last year.
- Multiple sources have told me that the DATAllegro system is being thrown out of Dell, so evidently Dell is telling this to one and all. If that goes through, this would presumably leave TEOCO as DATAllegro’s single happy customer. (I haven’t checked with Microsoft for its view.)
- A rumor has it that Infiniband technology vendor Voltaire, Ltd. privately claims triple-digit sales of switches for Exadata 1 (I think that one would be one switch per Exadata installation, not per rack). Based just on a quick glance, this is far from confirmed by Voltaire’s earnings conference call transcripts or SEC filings. However, the most recent transcript does seem to indicate Voltaire got multiple Exadata deals in the telecommunications sector, and suggests some Exadata penetration in other sectors as well.
- I was told of a classified-agency user that has >1 petabyte of data on Exadata 1 and 600 terabytes or so on Netezza. My not-obviously-biased source says the agency is distinctly happier with Netezza than Exadata.
- Like ParAccel, Oracle just got dinged for TPC-related misbehavior.
- Rumor has it that Sun has no intention of helping ParAccel rerun its withdrawn TPC-H benchmark.
- ParAccel has withdrawn the claim from its home page to be the “CERTIFIED” price-performance leader. This seems to confirm that the claim was a reference to the TPC-H. In my opinion, that was a gross misrepresentation of what the TPC-H shows.
Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept — mixing mine and Greenplum’s together — include:
- Data marts aren’t just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
- Thus, it would be really cool if business users could have their own analytic “sandboxes” — virtual or physical analytic databases that they can manipulate without breaking anything else.
- In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
- Whether or not you agree with that, it’s an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it’s an empirical fact that many business users have the clout to order up new data marts as well.
- Consolidating data marts onto one common technological platform has important benefits.
In essence, Greenplum is pitching the story:
- Thesis: Enterprise Data Warehouses (EDWs)
- Antithesis: Data Warehouse Appliances
- Synthesis: Greenplum’s Enterprise Data Cloud vision
When put that starkly, it’s overstated, not least because
Specialized Analytic DBMS != Data Warehouse Appliance
But basically it makes sense, for two main reasons:
- Analysis is performed on all sorts of novel data, from sources far beyond an enterprise’s core transactions. This data neither has to fit nor particularly benefits from being tightly fitted into the core enterprise data model. Requiring it to do so is just an unnecessary and painful bureaucratic delay.
- On the other hand, consolidation can be a good idea even when systems don’t particularly interoperate. Data marts, which commonly do in part interoperate with central data stores, have all the more reason to be consolidated onto a central technology platform/stack.
|Categories: Analytic technologies, Data warehouse appliances, Data warehousing, DATAllegro, EAI, EII, ETL, ELT, ETLT, eBay, Greenplum, Microsoft and SQL*Server, Parallelization, Specific users, Teradata||31 Comments|
Microsoft purchased DATAllegro for $275 million
Technically, that needn’t shut down the rumor mill altogether, since given the way deals are structured and reported, it’s unlikely that Microsoft actually cut checks to DATAllegro stockholders in the aggregate amount of $275 million promptly after the close of the acquisition.
Still, it’s a data point of some weight.
Hat tip to Mark Myers.
I’m prepared to call an end to the “Guess DATAllegro’s customers” game. Bottom line is that there are three in all, two of which are TEOCO and Dell, and the third of which is a semi-open secret. I wrote last week:
The number of DATAllegro production references is expected to double imminently, from one to two. Few will be surprised at the identity of the second reference. I imagine the number will then stay at two, as DATAllegro technology is no longer being sold, and the third known production user has never been reputed to be particularly pleased with it.
Dell did indeed disclose at TDWI that it was a large DATAllegro user, notwithstanding that Dell is a huge Teradata user as well. No doubt, Dell is gearing up to be a big user of Madison too.
Also at TDWI, I talked with some former DATAllegro employees who now work for rival vendors. None thinks DATAllegro has more than three customers. Neither do I.
Edit: Subsequently, the DATAllegro customer count declined to 1.
|Categories: Data warehouse appliances, Data warehousing, DATAllegro, Market share and customer counts, Microsoft and SQL*Server, Specific users||10 Comments|
Microsoft said they’d prebrief me on at least the DATAllegro part of tomorrow’s SQL Server announcements, but that didn’t turn out to happen (at least as of 9 pm Eastern time Sunday night). An embargoed press release did just arrive, but it’s so concise and high-level as to contain almost nothing of interest.
So I might as well post sound bites in advance. Here goes:
- With the DATAllegro acquisition, Microsoft leapfrogged Oracle. But with Exadata, Oracle leapfrogged Microsoft back. Exadata is actually shipping.
- There’s no assurance that the first DATAllegro/Microsoft release will inherit SQL Server’s level of concurrency. After all, DATAllegro/Ingres wasn’t as concurrent as plain Ingres.
- Porting DATAllegro from Ingres to SQL Server is likely to be straightforward. If they screw up it will be because they tried to do too much else at the same time, not because the basic port failed.
- Porting DATAllegro from Linux to Windows should also be OK. DATAllegro doesn’t stress the operating system in the areas where Windows remains weak.
- Earlier this year, DATAllegro had exactly one customer known to be in production, but I’ve spoken with that one. It’s TEOCO, which has a multi-hundred terabyte DATAllegro installation. TEOCO is a very price-oriented buyer.
- DATAllegro reports that two more customers are in production with large systems now. Neither of those is believed by industry sources to be especially in love with DATAllegro. Otherwise, nobody seems able and willing to identify other DATAllegro customers.
I’m going to be pretty busy Monday anyway. Linda is having a bit of oral surgery. And if I get back from that in time, I have calls set up with a couple of clients.
|Categories: Data warehouse appliances, Data warehousing, DATAllegro, Microsoft and SQL*Server||3 Comments|
Edit: Actually, an email did eventually wend its way to me about a day later, which evidently had run into major congestion somewhere in the intertubes.
My resolve to eschew scathing sarcasm is being sorely tested tonight. The lastest trial is my discovery that nobody thought to so much as email me a press release, let alone brief me, on Microsoft’s announcement of a timetable for DATAllegro/SQL Server integration. Per Ina Fried — with a hat tip to anonymous commenter L.J. — Microsoft says:
The final version of that product is slated for the first half of 2010, though Microsoft said it will begin giving customers and partners access to early “community technology preview” releases within the next 12 months.
One of my favorite pages on the Monash Research website is the list of many current and a few notable past customers. (Another favorite page is the one for testimonials.) For a variety of reasons, I won’t undertake to be more precise about my current customer list than that. But I don’t think it would hurt anything to list the data warehouse DBMS/appliance specialists in the group. They are:
- Aster Data
All of those are Monash Advantage members.
If you care about all this, you may also be interested in the rest of my standards and disclosures.
|Categories: About this blog, Aster Data, Calpont, Data warehousing, DATAllegro, Greenplum, Infobright, Netezza, ParAccel, Teradata, Vertica Systems||3 Comments|
Over on my Network World blog, I asked the question “So who are DATAllegro’s actual current customers?” As regular readers know, that’s a fairly hard question to answer. TEOCO is widely known as DATAllegro’s flagship reference, but after that the list gets thin in a hurry.
As a by-the-by to other discussions, DATAllegro Stuart Frost undertook to respond in part himself. Specifically, he gave me two names of two other happy customers that are or imminently will be running DATAllegro against 100+ terabytes of user data. Read more
|Categories: Data warehouse appliances, Data warehousing, DATAllegro, DBMS product categories||Leave a Comment|