EAI, EII, ETL, ELT, ETLT
Analysis of data integration products and technologies, especially ones related to data warehousing, such as ELT (Extract/Transform/Load). Related subjects include:
Expressor pre-announces a data loading benchmark leapfrog
Expressor Software plans to blow the Vertica/Syncsort “benchmark” out of the water, to wit
What I know already is that our numbers will between 7 and 8 min to load one TB of data and will set another world record for the tpc-h benchmark.
The whole blog post has a delightful air of skepticism, e.g.:
Sometimes the mention of a join and lookup are documented but why? If the files are load ready what is there to join or lookup?
… If the files are load ready and the bulk load interface is used, what exactly is done with the DI product?
My guess… nothing.
… But what I can’t figure out is what is so complex about this test in the first place?
Data warehouse load speeds in the spotlight
Syncsort and Vertica combined to devise and run a benchmark in which a data warehouse got loaded at 5 ½ terabytes per hour, which is several times faster than the figures used in any other vendors’ similar press releases in the past. Takeaways include:
- Syncsort isn’t just a mainframe sort utility company, but also does data integration. Who knew?
- Vertica’s design to overcome the traditional slow load speed of columnar DBMS works.
The latter is unsurprising. Back in February, I wrote at length about how Vertica makes rapid columnar updates. I don’t have a lot of subsequent new detail, but it made sense then and now.
High-performance analytics
For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:
- MapReduce – controversial or in some cases even disappointing though it may be – has a lot of use cases.
- It’s early days, but Netezza and Teradata (and others) are beefing up their geospatial analytic capabilities.
- Memory-centric analytics is in the spotlight.
Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?
Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including:
| Categories: Analytic technologies, Aster Data, Data warehousing, EAI, EII, ETL, ELT, ETLT, Greenplum, MapReduce, Netezza, Oracle, Parallelization, SAS Institute, Teradata | 5 Comments |
Introduction to Talend
I didn’t spend much time on the show floor at Teradata Partners, but I did connect with Yves de Montcheuil of Talend for a couple of little chats. Highlights of the Talend story include: Read more
Multitenancy hype is getting out of control
I posted recently on SaaS-data-integration-in-the-cloud, and a couple of vendors stopped by the comment thread to shared what they do. One was Boomi, which has a blog that does a good job of spelling out its opinions. What the Boomi blog is not so good at, however, is giving any good reasons why one should share those opinions.
I refer specifically to a couple of posts claiming that multitenancy is somehow crucial for SaaS data integration to work. To this I can only say — huh? A decent data integration system should be able to handle many parallel threads at once, connecting many pairs of databases at once. So the hard part of multitenancy is pretty much “free.” If, even so, the integration provider chooses not to go fully multitenant, whose business is it but theirs? Read more
| Categories: Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Software as a Service (SaaS) | 6 Comments |
Everybody’s putting integration services in the cloud
Both Pervasive Software and Cast Iron Systems told me recently of fairly pure cloud offerings. In this, they’re joining Informatica, which started offering Salesforce.com integration-as-a-service back in 2006. So far as I can tell, the three vendors are doing somewhat different things. Read more
| Categories: Cast Iron Systems, Cloud computing, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Informatica, Pervasive Software, Software as a Service (SaaS) | 8 Comments |
Schema flexibility and XML data management
Conor O’Mahony, marketing manager for IBM’s DB2 pureXML talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:
- Tax authorities change their rules and forms every year, but don’t want to do total rewrites of their electronic submission and processing software.
- The financial services industry keeps inventing new products, which don’t just have different terms and conditions, but may also have different kinds of terms and conditions.
- The same, to some extent, goes for the travel industry, which also keeps adding different and different kinds of destinations.
- The energy industry keeps adding new kinds of highly complex equipment it has to manage.
Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange.
| Categories: Data models and architecture, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, pureXML | 3 Comments |
Vertical market XML standards
Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.
Among the most important or successful IBM pureXML-supported standards, in terms of downloads and other evidence of customer interest, are:
| Categories: Application areas, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, pureXML | 2 Comments |
Overview of IBM DB2 pureXML
On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)
As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.
What I understand so far about the basic DB2 pureXML architecture goes like this:
| Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, Native XML, pureXML | 4 Comments |
Three approaches to parallelizing data transformation
Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.
*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.
| Categories: Aster Data, Data integration and middleware, Data warehousing, EAI, EII, ETL, ELT, ETLT, MapReduce, Parallelization, Pervasive Software | 7 Comments |
Pervasive is also pursuing simplicity and SaaS integration
I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.
Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.
Two things are new. Read more
| Categories: Cloud computing, EAI, EII, ETL, ELT, ETLT, Pervasive Software, Software as a Service (SaaS) | 5 Comments |
Cast Iron Systems focuses on SaaS data integration
When I wrote about data integration vendor Cast Iron Systems a year ago, its core message was “simplicity, simplicity, simplicity.” Supporting points included:
- An appliance delivery format.
- Lots of heuristics for automatic mapping and quick set-up. E.g., Cast Iron claims that 70% of a typical SAP-Salesforce.com connection can be done straight out of the box.
- The absence of data cleaning/transformation features that might complicate things.
Cast Iron still believes in all that.
Even so, its messaging has changed a bit. Cast Iron now bills itself, in the first sentence of its press release boilerplate, as “the fastest growing SaaS integration appliance vendor.” And when I talked with marketing chief Simon Peel today, the only use cases we discussed were connections between SaaS and on-premises apps. Read more
| Categories: Cast Iron Systems, Cloud computing, EAI, EII, ETL, ELT, ETLT, Informatica, Software as a Service (SaaS) | 2 Comments |
Kalido — CASE for complex data warehouses
Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:
- Kalido Business Information Modeler (the newest part)
- Kalido Dynamic Information Warehouse
- Kalido Universal Information Director
- Kalido Master Data Management
But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.
| Categories: Data integration and middleware, Data models and architecture, Data warehousing, EAI, EII, ETL, ELT, ETLT, Kalido, Theory and architecture | 1 Comment |
Load speeds and related issues in columnar DBMS
Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.
I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:
- Can columnar DBMS do operational BI?
- Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
- Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?
OK, now I get it — the guys at Ab Initio have something to spin or hide
According to the comments on this blog post, Ab Initio has been throwing analysts out of their trade show booths and being otherwise rude for at least two years, and probably a long longer. That goes beyond marketing strategy or quirkiness. It means Ab Initio has some secrets it desperately doesn’t want to have found out, or at least that it wants to conceal unless there are Ab Initio salespeople present to spin the prospects’ response to the news. Read more
| Categories: Ab Initio Software, EAI, EII, ETL, ELT, ETLT | 13 Comments |
Wrinkles in the Informatica versus Business Objects patent litigation
Business Objects recently lost a patent lawsuit to Informatica in the area of data integration. While I was at the Business Objects conference, I asked about it, and was told in effect “It’s no big deal. In fact, the monetary award was reduced. Anyhow, we shipped a non-infringing version within 12 days after the decision, and sales are rolling along.” I then reflected that answer back to Informatica’s stellar analyst relations guy Chas Kielt. He checked with corporate counsel, and sent back the detailed clarification below. Since I got my Business Objects answers from a couple of caught-off-guard non-lawyer French guys, while Chas got a careful explanation of an American court’s judgment from an American lawyer, I’m inclined to think that in any details where they might conflict, Chas’ version is more likely to be accurate.
There’s a more substantive disagreement as to whether the features deleted from BOBJ’s product due to the injunction are actually important in the marketplace. I’m looking into that subject, and hope to post about it in the near future. Read more
| Categories: Business Objects, EAI, EII, ETL, ELT, ETLT, Informatica | Leave a Comment |
Oracle and BEA — sometimes I am waaaay early
Back in December, 2002, I wrote up the rationale for an Oracle acquisition of BEA. The deal finally seems like it may be happening. Oddly, when I proposed it then, I was accused by Oracle’s analyst relations department of being “unprofessional” for having the temerity to suggest it. And while the specific individual who threw that tantrum is long gone, I haven’t talked all that much with Oracle’s core server groups since … but I digress.
Actually, the logic of an Oracle/BEA deal now isn’t much different from what it was way back then. One exception is that in the intervening half-decade Oracle has acquired a formidable amount of experience in integrating large and/or technically overlapping acquisitions. Technically, however, the story remains pretty much the same. Oracle’s app server and BEA Weblogic do pretty similar things, more or less compliant to standards, only with different add-on functionality. And BEA’s most important add-ons are in an area — integration with outside applications — where Oracle has long needed to improve. Read more
| Categories: Application servers, Data integration and middleware, EAI, EII, ETL, ELT, ETLT, Oracle | 3 Comments |
Filemaker for composite application development
It’s not accurate to judge a product by its most obnoxious or least clueful partisans. Hence, even though some insult-spewers take umbrage at an accurate description of FileMaker’s capabilities,* it wouldn’t be fair to write the product off entirely.
*Mercifully, none of said insult-spewers seems to actually work at the company. I must confess that this makes it easier for me to take the (somewhat) high road here.
Possibly due to an actual understanding of enterprise technology, Tim Dietrich has weighed in on on the discussion from a different angle. Here’s a quote in which he gives an example of very successful FileMaker use:
Read more
| Categories: EAI, EII, ETL, ELT, ETLT, FileMaker | 2 Comments |
More on Cast Iron Systems
I chatted again recently with Simon Peel of Cast Iron Systems, and this time I got a better understanding of Cast Iron’s simplicity claim. It refers largely to a drag-and-drop interface that furthermore provides default mappings between pairs of application suites. Simon bristled a bit when I referred to this as mapping “like to like,” because he’s proud that it’s a little smarter than that. Still, “like to like” seems to be what it typically amounts to — customers go to customers, customer addresses go to customer addresses, and so on. Read more
The boom in Salesforce.com integration
SaaS integration is in the air.
- I recently talked with Pervasive Software about their data integration line. A large part of Pervasive’s new business is Salesforce.com integration, including at some big-name software vendors as customer/partner switch-hitters.
- I just rechecked my notes from my January talk with Cast Iron Systems. A large part of Cast Iron’s new business is also integration with Salesforce.com, Netsuite, and other SaaS vendors.
- Informatica keeps putting out press releases about Salesforce.com integration, most recently by offering replication in SaaS form itself.
But of course this makes sense. Without good data integration, SaaS applications would be pretty useless, at least at large and medium-sized enterprises.
| Categories: Cast Iron Systems, EAI, EII, ETL, ELT, ETLT, Informatica, Pervasive Software, Software as a Service (SaaS) | Leave a Comment |
Data integration appliance vendor Cast Iron Systems
I’ve been doing a lot of research lately into computing appliances – not just data warehouse appliances, but security, anti-spam and other appliance types as well. Today I added Cast Iron Systems to the list.
Essentially, they offer data integration without the common add-ons. I.e., there’s little or nothing in the way of data cleansing, composite apps, business process management, and/or business activity monitoring. Data just gets imported, extracted, and/or synchronized, whether between pairs of transactional systems, or between a transactional system and a reporting database. A particularly hot area of application for them seems to be SaaS/on-demand app integration (Salesforce.com, Netsuite, etc.) In particular, they boast both Lawson and Salesforce.com as internal users, and at least at Lawson they are used for a Salesforce/Lawson integration.
The big advantage to this strategy is that their integrator is simple enough for appliance deployment.
| Categories: Cast Iron Systems, EAI, EII, ETL, ELT, ETLT | 3 Comments |
Oracle and Microsoft in data warehousing
Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:
- Shared-everything architecture
- End-to-end solution story
- OLTP industrial-strengthness carried over to data warehousing
In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.
| Categories: DATAllegro, Data warehouse appliances, EAI, EII, ETL, ELT, ETLT, IBM and DB2, Microsoft and SQL*Server, Netezza, Oracle, Parallelization, Teradata | 1 Comment |
More on data warehouse architecture choices
The very name of this blog comes from the kind of “horses for courses” data store strategy implied by my recent post on different kinds of data warehouse uses. A number of other commentators have recently made similar points, although they may not agree with every detail. For example, William McKnight pretty much makes the pure DBMS2 argument, pointing out that a partially virtual warehouse is often superior to a fully centralized physical one. And Andy Hayler of Kalido says pretty much the same thing, although he strongly calls out his difference in emphasis from William’s view.
A tip of the hat to Mark Rittman for pointing me to those two and others.
| Categories: Data warehouse appliances, EAI, EII, ETL, ELT, ETLT, Theory and architecture | Leave a Comment |
Business Objects on EIM, ETL, etc.
I chatted with some Business Objects ETL/EIM (Enterprise Information Management) folks today, in a call that was a direct response to what I heard from and posted about Informatica. The core of the Business Objects story can be summarized (albeit brutally!) like this:
| Categories: Business Objects, Business intelligence, EAI, EII, ETL, ELT, ETLT, Theory and architecture | 1 Comment |
eBay’s version of DBMS2
Every sufficiently large or agile enterprise needs to follow the DBMS2 approach. The following is from an article on eBay’s version:
“eBay has built a software-based Integration Tier. This contains both a data access layer (DAL) and a services framework. The Integration Tier acts as an abstraction layer for software engineers to work with many disparate back-end data sources through a consistent set of abstractions.”
| Categories: EAI, EII, ETL, ELT, ETLT, Specific users, eBay | Leave a Comment |
