A significant fraction of IT professional services industry revenue comes from data integration. But as a software business, data integration has been more problematic. Informatica, the largest independent data integration software vendor, does $1 billion in revenue. INFA’s enterprise value (market capitalization after adjusting for cash and debt) is $3 billion, which puts it way short of other category leaders such as VMware, and even sits behind Tableau.* When I talk with data integration startups, I ask questions such as “What fraction of Informatica’s revenue are you shooting for?” and, as a follow-up, “Why would that be grounds for excitement?”
*If you believe that Splunk is a data integration company, that changes these observations only a little.
On the other hand, several successful software categories have, at particular points in their history, been focused on data integration. One of the major benefits of 1990s business intelligence was “Combines data from multiple sources on the same screen” and, in some cases, even “Joins data from multiple sources in a single view”. The last few years before application servers were commoditized, data integration was one of their chief benefits. Data warehousing and Hadoop both of course have a “collect all your data in one place” part to their stories — which I call data mustering — and Hadoop is a data transformation tool as well.
And it’s not as if successful data integration companies have no value. IBM bought a few EAI (Enterprise Application Integration) companies, plus top Informatica competitor Ascential, plus Cast Iron Systems. DataDirect (I mean the ODBC/JDBC guys, not the storage ones) has been a decent little business through various name changes and ownerships (independent under a couple of names, then Intersolv/Merant, then independent again, then Progress Software). Master data management (MDM) and data cleaning have had some passable exits. Talend raised $40 million last December, which is a nice accomplishment if you’re French.
I can explain much of this in seven words: Data integration is both important and fragmented. The “important” part is self-evident; I gave examples of “fragmented” a couple years back. Beyond that, I’d say:
- A new class of “engine” can be a nice business — consider for example Informatica/Ascential/Ab Initio, or the MDM players (who sold out to bigger ETL companies), or Splunk. Indeed, much early Hadoop adoption was for its capabilities as a data transformation engine.
- Data transformation is a better business to enter than data movement. Differentiated value in data movement comes in areas such as performance, reliability and maturity, where established players have major advantages. But differentiated value in data transformation can come from “intelligence”, which is easier to excel in as a start-up.
- “Transparent connectivity” is a tough business. It is hard to offer true transparency, with minimal performance overhead, among enough different systems for anybody to much care. And without that you’re probably offering a low-value/niche capability. Migration aids are not an exception; the value in those is captured by the vendor of what’s being migrated to, not by the vendor who actually does the transparent translation. Indeed …
- … I can’t think of a single case in which migration support was a big software business. (Services are a whole other story.) Perhaps Cast Iron Systems came closest, but I’m not sure I’d categorize it as either “migration support” or “big”.
And I’ll stop there, because I’m not as conversant with some of the new “smart data transformation” companies as I’d like to be.
- DBMS transparency layers never seem to sell well (April, 2009)
- ClearStory’s approach to data integration (September, 2013)
- Judging opportunities (July, 2014)