EAI, EII, ETL, ELT, ETLT

Analysis of data integration products and technologies, especially ones related to data warehousing, such as ELT (Extract/Transform/Load). Related subjects include:

August 26, 2008

Three approaches to parallelizing data transformation

Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.

*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.

Read more

March 26, 2008

Pervasive is also pursuing simplicity and SaaS integration

I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.

Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.

Two things are new. Read more

March 21, 2008

Cast Iron Systems focuses on SaaS data integration

When I wrote about data integration vendor Cast Iron Systems a year ago, its core message was “simplicity, simplicity, simplicity.” Supporting points included:

  1. An appliance delivery format.
  2. Lots of heuristics for automatic mapping and quick set-up. E.g., Cast Iron claims that 70% of a typical SAP-Salesforce.com connection can be done straight out of the box.
  3. The absence of data cleaning/transformation features that might complicate things.

Cast Iron still believes in all that.

Even so, its messaging has changed a bit. Cast Iron now bills itself, in the first sentence of its press release boilerplate, as “the fastest growing SaaS integration appliance vendor.” And when I talked with marketing chief Simon Peel today, the only use cases we discussed were connections between SaaS and on-premises apps. Read more

February 19, 2008

Kalido — CASE for complex data warehouses

Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:

But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.

Read more

February 8, 2008

Load speeds and related issues in columnar DBMS

Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.

I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:

  1. Can columnar DBMS do operational BI?
  2. Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
  3. Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?

Read more

November 16, 2007

OK, now I get it — the guys at Ab Initio have something to spin or hide

According to the comments on this blog post, Ab Initio has been throwing analysts out of their trade show booths and being otherwise rude for at least two years, and probably a long longer. That goes beyond marketing strategy or quirkiness. It means Ab Initio has some secrets it desperately doesn’t want to have found out, or at least that it wants to conceal unless there are Ab Initio salespeople present to spin the prospects’ response to the news. Read more

October 20, 2007

Wrinkles in the Informatica versus Business Objects patent litigation

Business Objects recently lost a patent lawsuit to Informatica in the area of data integration. While I was at the Business Objects conference, I asked about it, and was told in effect “It’s no big deal. In fact, the monetary award was reduced. Anyhow, we shipped a non-infringing version within 12 days after the decision, and sales are rolling along.” I then reflected that answer back to Informatica’s stellar analyst relations guy Chas Kielt. He checked with corporate counsel, and sent back the detailed clarification below. Since I got my Business Objects answers from a couple of caught-off-guard non-lawyer French guys, while Chas got a careful explanation of an American court’s judgment from an American lawyer, I’m inclined to think that in any details where they might conflict, Chas’ version is more likely to be accurate.

There’s a more substantive disagreement as to whether the features deleted from BOBJ’s product due to the injunction are actually important in the marketplace. I’m looking into that subject, and hope to post about it in the near future. Read more

October 12, 2007

Oracle and BEA — sometimes I am waaaay early

Back in December, 2002, I wrote up the rationale for an Oracle acquisition of BEA. The deal finally seems like it may be happening. Oddly, when I proposed it then, I was accused by Oracle’s analyst relations department of being “unprofessional” for having the temerity to suggest it. And while the specific individual who threw that tantrum is long gone, I haven’t talked all that much with Oracle’s core server groups since … but I digress.

Actually, the logic of an Oracle/BEA deal now isn’t much different from what it was way back then. One exception is that in the intervening half-decade Oracle has acquired a formidable amount of experience in integrating large and/or technically overlapping acquisitions. Technically, however, the story remains pretty much the same. Oracle’s app server and BEA Weblogic do pretty similar things, more or less compliant to standards, only with different add-on functionality. And BEA’s most important add-ons are in an area — integration with outside applications — where Oracle has long needed to improve. Read more

July 26, 2007

Filemaker for composite application development

It’s not accurate to judge a product by its most obnoxious or least clueful partisans. Hence, even though some insult-spewers take umbrage at an accurate description of FileMaker’s capabilities,* it wouldn’t be fair to write the product off entirely.

*Mercifully, none of said insult-spewers seems to actually work at the company. I must confess that this makes it easier for me to take the (somewhat) high road here.

Possibly due to an actual understanding of enterprise technology, Tim Dietrich has weighed in on on the discussion from a different angle. Here’s a quote in which he gives an example of very successful FileMaker use:
Read more

April 26, 2007

More on Cast Iron Systems

I chatted again recently with Simon Peel of Cast Iron Systems, and this time I got a better understanding of Cast Iron’s simplicity claim. It refers largely to a drag-and-drop interface that furthermore provides default mappings between pairs of application suites. Simon bristled a bit when I referred to this as mapping “like to like,” because he’s proud that it’s a little smarter than that. Still, “like to like” seems to be what it typically amounts to — customers go to customers, customer addresses go to customer addresses, and so on. Read more

March 17, 2007

The boom in Salesforce.com integration

SaaS integration is in the air.

But of course this makes sense. Without good data integration, SaaS applications would be pretty useless, at least at large and medium-sized enterprises.

January 4, 2007

Data integration appliance vendor Cast Iron Systems

I’ve been doing a lot of research lately into computing appliances – not just data warehouse appliances, but security, anti-spam and other appliance types as well. Today I added Cast Iron Systems to the list.

Essentially, they offer data integration without the common add-ons. I.e., there’s little or nothing in the way of data cleansing, composite apps, business process management, and/or business activity monitoring. Data just gets imported, extracted, and/or synchronized, whether between pairs of transactional systems, or between a transactional system and a reporting database. A particularly hot area of application for them seems to be SaaS/on-demand app integration (Salesforce.com, Netsuite, etc.) In particular, they boast both Lawson and Salesforce.com as internal users, and at least at Lawson they are used for a Salesforce/Lawson integration.

The big advantage to this strategy is that their integrator is simple enough for appliance deployment.

Read more

September 27, 2006

Oracle and Microsoft in data warehousing

Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:

  1. Shared-everything architecture
  2. End-to-end solution story
  3. OLTP industrial-strengthness carried over to data warehousing

In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.

Read more

September 24, 2006

More on data warehouse architecture choices

The very name of this blog comes from the kind of “horses for courses” data store strategy implied by my recent post on different kinds of data warehouse uses. A number of other commentators have recently made similar points, although they may not agree with every detail. For example, William McKnight pretty much makes the pure DBMS2 argument, pointing out that a partially virtual warehouse is often superior to a fully centralized physical one. And Andy Hayler of Kalido says pretty much the same thing, although he strongly calls out his difference in emphasis from William’s view.

A tip of the hat to Mark Rittman for pointing me to those two and others.

August 17, 2006

Business Objects on EIM, ETL, etc.

I chatted with some Business Objects ETL/EIM (Enterprise Information Management) folks today, in a call that was a direct response to what I heard from and posted about Informatica. The core of the Business Objects story can be summarized (albeit brutally!) like this:

Read more

August 8, 2006

eBay’s version of DBMS2

Every sufficiently large or agile enterprise needs to follow the DBMS2 approach. The following is from an article on eBay’s version:

“eBay has built a software-based Integration Tier. This contains both a data access layer (DAL) and a services framework. The Integration Tier acts as an abstraction layer for software engineers to work with many disparate back-end data sources through a consistent set of abstractions.”

July 26, 2006

Informatica’s SaaS/Outsourcing story

The coolest part of Informatica’s visit today was the new SaaS story. Naturally, they’re starting with Salesforce.com, but they hope to use the technology they’re developing for Salesforce with other SaaS vendors, with Business Process Outsourcers, and with anybody else who needs robust cross-enterprise data integration. I don’t actually think there’s a lot of hard technology there; nonetheless, somebody had to build it. And they apparently have, in two main parts.

Read more

July 26, 2006

Informatica’s general story

Informatica came by today. In general their story is: Data integration is very important; all vendors except Informatica and IBM/Ascential are low end; IBM/Ascential is confused; most BI vendors except Business Objects are likely to follow Hyperion’s lead in partnering with them.

Read more

May 13, 2006

Hot times at Intersystems

About a year ago, I wrote a very favorable column focusing on Intersystems’ OODBMS Cache’. Cache’ appears to be the one OODBMS product that has good performance even in a standard disk-centric configuration, notwithstanding that random pointer access seems to be antithetical to good disk performance.

Intersystems also has a hot new Cache’-based integration product, Ensemble. They attempted to brief me on it (somewhat belatedly, truth be told) last Wednesday. Through no fault of the product, however, the briefing didn’t go so well. I still look forward to learning more about Ensemble.

May 2, 2006

DBMS2 at IBM

I had a chat a couple of weeks ago with Bob Picciano, who runs servers (i.e., DBMS) for IBM. I came away feeling that, while they don’t use that name, they’re well down the DBMS2 path. By no means is this SAP’s level of commitment; after all, they have to cater to traditional technology strategies as well. But they definitely seem to be getting there.

Why do I say that? Well, in no particular order:

The big piece of a DBMS2 strategy that IBM seems to be lacking is a data-oriented services repository. IBM has had disasters in the past with over-grand repository plans, so they’re treading cautiously this time around. There also might be an organizational issue; DBMS and integration technology sit in separate divisions, and I doubt it’s yet appreciated throughout IBM how central data is to an SOA strategy.

But that not-so-minor detail aside, IBM definitely seems to be developing a DBMS2-like technology vision.

December 9, 2005

SAP’s version of DBMS2

I just spent a couple of days at SAP’s analyst meeting, and realized something I’d somewhat forgotten – much of the DBMS2 concept was inspired by SAP’s technical strategy. That’s not to say that SAP’s techies necessarily agree with me on every last point. But I do think it is interesting to review SAP’s version of DBMS2, to the extent I understand it.

1. SAP’s Enterprise Services Architecture (ESA) is meant to be, among other things, an abstraction layer over relational DBMS. The mantra is that they’re moving to a “message-based architecture” as opposed to a “database architecture.” These messages are in the context of a standards-based SOA, with a strong commitment to remaining open and standards-based, at least on the data and messaging levels. (The main limitation on openness that I’ve detected is that they don’t think much of standards such as BPEL in the business process definition area, which aren’t powerful enough for them.)

2. One big benefit they see to this strategy is that it reduces the need to have grand integrated databases. If one application manages data for an entity that is also important to another application, the two applications can exchange messages about the entity. Anyhow, many of their comments make it clear that, between partner company databases (a bit of a future) and legacy app databases (a very big factor in the present day), SAP is constantly aware of situations in which a single integrated database in infeasible.

3. SAP is still deeply suspicious of redundant transactional data. They feel that with redundant data you can’t have a really clean model – unless, of course, you code up really rigorous synchronization. However, if for some reason synchronization is preferred – e.g., for performance reasons — it can be hidden from users and most developers.

4. One area where SAP definitely favors redundancy and synchronization is data warehousing. Indeed, they have an ever more elaborate staging system to move data from operational to analytic systems.

5. In general, they are far from being relational purists. For example, Shai Agassi referred to doing things that you can’t do in a pure relational approach. And Peter Zencke reminded me that this attitude is nothing new. SAP has long had complex business objects, and even done some of its own memory management to make them performant, when they were structured in a manner that RDBMS weren’t well suited for. (I presume he was referring largely to BAPI.)

6. That said, they’re of course using relational data stores today for most things. One exception is text/content, which they prefer to store in their own text indexing/management system TREX. Another example is their historical support for MOLAP, although they seem to be edging as far away from that as they can without offending the MOLAP-loving part of their customer base.

Incidentally, the whole TREX strategy is subject to considerable doubt too. It’s not a state-of-the-art product, and they currently don’t plan to make it into one. In particular, they have a prejudice against semi-automated ontology creation, and that has clearly become a requirement for top-tier text technologies.

7. One thing that Peter said which confused me a bit is when we were talking about nonrelational data retrieval. The example he used was retrieving information on all of a specific sales reps’ customers, or perhaps on several sales reps’ customers. I got the feeling he was talking about the ability to text search on multiple columns and/or multiple tables/objects/whatever at once, but I can’t honestly claim that I connected all the dots.

And of course, the memory-centric ROLAP tool BI Accelerator — technology that’s based on TREX — is just another example of how SAP is willing to go beyond passively connecting to a single RDBMS. And while their sponsorship of MaxDB isn’t really an example of that, it is another example of how SAP’s strategy is not one to gladden the hearts of the top-tier DBMS vendors.

October 18, 2005

EII marketing soup

In the comments to another thread, the subject of EII (Enterprise Information Integration) came up. It’s a tricky one, for several reasons.

First, it’s a marketing construction — a blend between between ETL (Extract, Transform, Load) and EAI (Enterprise Application Integration). It’s a legitimate category; all those things are getting smushed together as near-real-time apps become more prominent. Still, it’s also an attempt to grab marketing turf.

Second, it’s commonly associated with a marketing overreach — the claim that an EII “platform” or “suite” will do everything a DBMS does (almost), but fully and heterogeneously distributed as well. Yeah, right.

Third, two of the sharpest proponents have been acquired by behemoths that tend to obscure their acquirees marketing pitches — Ascential by IBM and SeeBeyond by Sun.

Fourth, some of the best grand integrated EII suites (at least the ones that started as ETL, which is the side I’m more familiar with) aren’t complete yet. So vendors didn’t want to be too clear for fear of freezing current sales. I’m referring here mainly to Ascential and Informatica. They told analysts of their grand plans, but they haven’t been so eager to openly publicize the full details.

Fifth, the area is getting integrated with development tools for composite applications. Good examples there are SeeBeyond and Intersystems’ Cache’.

Sixth, no EII vendors’ plans fully work unless they have full relational and XML integration, and nobody really has been doing a great job on that, typically being strong in one area or the other.

Obviously, this is an area I have to research actively; EII is the neuromuscular system that holds DBMS2 together. But all the research in the world won’t change the fact that as of now it’s the weak spot in the story. There’s lots of great database management technology, and lots of excellent reasons to use a variety of kinds of that technology in your enterprise. But the tools to knit the resulting heterogeneous databases together are still sadly deficient.

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.