EAI, EII, ETL, ELT, ETLT

Analysis of data integration products and technologies, especially ones related to data warehousing, such as ELT (Extract/Transform/Load). Related subjects include:

January 4, 2009

Expressor pre-announces a data loading benchmark leapfrog

Expressor Software plans to blow the Vertica/Syncsort “benchmark” out of the water, to wit

What I know already is that our numbers will between 7 and 8 min to load one TB of data and will set another world record for the tpc-h benchmark.

The whole blog post has a delightful air of skepticism, e.g.:

Sometimes the mention of a join and lookup are documented but why? If the files are load ready what is there to join or lookup?

… If the files are load ready and the bulk load interface is used, what exactly is done with the DI product?

My guess… nothing.

…  But what I can’t figure out is what is so complex about this test in the first place?

December 2, 2008

Data warehouse load speeds in the spotlight

Syncsort and Vertica combined to devise and run a benchmark in which a data warehouse got loaded at 5 ½ terabytes per hour, which is several times faster than the figures used in any other vendors’ similar press releases in the past. Takeaways include:

The latter is unsurprising. Back in February, I wrote at length about how Vertica makes rapid columnar updates. I don’t have a lot of subsequent new detail, but it made sense then and now.

Read more

November 15, 2008

High-performance analytics

For the past few months, I’ve collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I’ve written about some of them at length. For example:

Ack. I can’t decide whether “analytics” should be a singular or plural noun. Thoughts?

Another area that’s come up which I haven‘t blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including:

Read more

October 17, 2008

Introduction to Talend

I didn’t spend much time on the show floor at Teradata Partners, but I did connect with Yves de Montcheuil of Talend for a couple of little chats.  Highlights of the Talend story include: Read more

October 10, 2008

Multitenancy hype is getting out of control

I posted recently on SaaS-data-integration-in-the-cloud, and a couple of vendors stopped by the comment thread to shared what they do. One was Boomi, which has a blog that does a good job of spelling out its opinions. What the Boomi blog is not so good at, however, is giving any good reasons why one should share those opinions.

I refer specifically to a couple of posts claiming that multitenancy is somehow crucial for SaaS data integration to work. To this I can only say — huh? A decent data integration system should be able to handle many parallel threads at once, connecting many pairs of databases at once. So the hard part of multitenancy is pretty much “free.” If, even so, the integration provider chooses not to go fully multitenant, whose business is it but theirs? Read more

October 9, 2008

Everybody’s putting integration services in the cloud

Both Pervasive Software and Cast Iron Systems told me recently of fairly pure cloud offerings. In this, they’re joining Informatica, which started offering Salesforce.com integration-as-a-service back in 2006. So far as I can tell, the three vendors are doing somewhat different things. Read more

October 5, 2008

Schema flexibility and XML data management

Conor O’Mahony, marketing manager for IBM’s DB2 pureXML talks a lot about one of my favorite hobbyhorses — schema flexibility — as a reason to use an XML data model. In a number of industries he sees use cases based around ongoing change in the information being managed:

Conor also thinks market evidence shows that XML’s schema flexibility is important for data interchange.

Read more

October 5, 2008

Vertical market XML standards

Tracking the alphabet soup of vertical market XML standards is hard. So as a starting point, I’m splitting a list I got from IBM into a standalone post.

Among the most important or successful IBM pureXML-supported standards, in terms of downloads and other evidence of customer interest, are:

Read more

October 5, 2008

Overview of IBM DB2 pureXML

On August 29, I had a great call with IBM about DB2 pureXML (most of the IBM side of the talking was done by Conor O’Mahony and Qi Jin). I’m finally getting around to writing it up now. (The world of tabular data warehousing has kept me just a wee bit busy …)

As I write it, I see there are a considerable number of holes, but that’s the way it seems to go when researching XML storage. I’m also writing up a September call from which I finally figured out (I think) the essence of how MarkLogic Server works – but only after five months of trying. It turns out that MarkLogic works rather differently from DB2 pureXML. Not coincidentally, IBM and Mark Logic focus on rather different use cases for native XML storage.

What I understand so far about the basic DB2 pureXML architecture goes like this:

Read more

August 26, 2008

Three approaches to parallelizing data transformation

Many MPP data warehousing vendors have told me their products are used for ELT (Extract/Load/Transform) instead of ETL (Extract/Transform/Load). I.e., needed data transformations are done on the MPP system, rather than on the — probably SMP — system the data comes from.* If the data transformation is being applied on a record-by-record basis, then it’s automatically fully parallelized. Even if the transforms are more complex, considerable parallel processing may still be going on.

*Or it’s some of each, at which point it’s called ETLT — I bet you can work out what that stands for.

Read more

March 26, 2008

Pervasive is also pursuing simplicity and SaaS integration

I blogged recently about Cast Iron Systems, a simplicity-oriented data integration appliance vendor that is increasingly focusing on the SaaS market. Well, Pervasive Software is doing something similar.

Via Data Integrator, Pervasive is a leader in the low-cost integration market, with revenue split about 50/25/25 between direct sales, ISVs, and SaaS. Pervasive fondly believes that its products cost half as much as Cast Iron’s, and wind up taking no more installation effort when you factor in Pervasive’s broader capabilities in areas such as workflow. However, there’s some doubt as to whether this is apples-to-apples. Cast Iron does include hardware, after all, and as Pervasive itself points out, Cast Iron will bundle some professional services into a sale if you ask nicely.

Two things are new. Read more

March 21, 2008

Cast Iron Systems focuses on SaaS data integration

When I wrote about data integration vendor Cast Iron Systems a year ago, its core message was “simplicity, simplicity, simplicity.” Supporting points included:

  1. An appliance delivery format.
  2. Lots of heuristics for automatic mapping and quick set-up. E.g., Cast Iron claims that 70% of a typical SAP-Salesforce.com connection can be done straight out of the box.
  3. The absence of data cleaning/transformation features that might complicate things.

Cast Iron still believes in all that.

Even so, its messaging has changed a bit. Cast Iron now bills itself, in the first sentence of its press release boilerplate, as “the fastest growing SaaS integration appliance vendor.” And when I talked with marketing chief Simon Peel today, the only use cases we discussed were connections between SaaS and on-premises apps. Read more

February 19, 2008

Kalido — CASE for complex data warehouses

Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:

But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.

Read more

February 8, 2008

Load speeds and related issues in columnar DBMS

Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.

I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:

  1. Can columnar DBMS do operational BI?
  2. Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
  3. Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?

Read more

November 16, 2007

OK, now I get it — the guys at Ab Initio have something to spin or hide

According to the comments on this blog post, Ab Initio has been throwing analysts out of their trade show booths and being otherwise rude for at least two years, and probably a long longer. That goes beyond marketing strategy or quirkiness. It means Ab Initio has some secrets it desperately doesn’t want to have found out, or at least that it wants to conceal unless there are Ab Initio salespeople present to spin the prospects’ response to the news. Read more

October 20, 2007

Wrinkles in the Informatica versus Business Objects patent litigation

Business Objects recently lost a patent lawsuit to Informatica in the area of data integration. While I was at the Business Objects conference, I asked about it, and was told in effect “It’s no big deal. In fact, the monetary award was reduced. Anyhow, we shipped a non-infringing version within 12 days after the decision, and sales are rolling along.” I then reflected that answer back to Informatica’s stellar analyst relations guy Chas Kielt. He checked with corporate counsel, and sent back the detailed clarification below. Since I got my Business Objects answers from a couple of caught-off-guard non-lawyer French guys, while Chas got a careful explanation of an American court’s judgment from an American lawyer, I’m inclined to think that in any details where they might conflict, Chas’ version is more likely to be accurate.

There’s a more substantive disagreement as to whether the features deleted from BOBJ’s product due to the injunction are actually important in the marketplace. I’m looking into that subject, and hope to post about it in the near future. Read more

October 12, 2007

Oracle and BEA — sometimes I am waaaay early

Back in December, 2002, I wrote up the rationale for an Oracle acquisition of BEA. The deal finally seems like it may be happening. Oddly, when I proposed it then, I was accused by Oracle’s analyst relations department of being “unprofessional” for having the temerity to suggest it. And while the specific individual who threw that tantrum is long gone, I haven’t talked all that much with Oracle’s core server groups since … but I digress.

Actually, the logic of an Oracle/BEA deal now isn’t much different from what it was way back then. One exception is that in the intervening half-decade Oracle has acquired a formidable amount of experience in integrating large and/or technically overlapping acquisitions. Technically, however, the story remains pretty much the same. Oracle’s app server and BEA Weblogic do pretty similar things, more or less compliant to standards, only with different add-on functionality. And BEA’s most important add-ons are in an area — integration with outside applications — where Oracle has long needed to improve. Read more

July 26, 2007

Filemaker for composite application development

It’s not accurate to judge a product by its most obnoxious or least clueful partisans. Hence, even though some insult-spewers take umbrage at an accurate description of FileMaker’s capabilities,* it wouldn’t be fair to write the product off entirely.

*Mercifully, none of said insult-spewers seems to actually work at the company. I must confess that this makes it easier for me to take the (somewhat) high road here.

Possibly due to an actual understanding of enterprise technology, Tim Dietrich has weighed in on on the discussion from a different angle. Here’s a quote in which he gives an example of very successful FileMaker use:
Read more

April 26, 2007

More on Cast Iron Systems

I chatted again recently with Simon Peel of Cast Iron Systems, and this time I got a better understanding of Cast Iron’s simplicity claim. It refers largely to a drag-and-drop interface that furthermore provides default mappings between pairs of application suites. Simon bristled a bit when I referred to this as mapping “like to like,” because he’s proud that it’s a little smarter than that. Still, “like to like” seems to be what it typically amounts to — customers go to customers, customer addresses go to customer addresses, and so on. Read more

March 17, 2007

The boom in Salesforce.com integration

SaaS integration is in the air.

But of course this makes sense. Without good data integration, SaaS applications would be pretty useless, at least at large and medium-sized enterprises.

January 4, 2007

Data integration appliance vendor Cast Iron Systems

I’ve been doing a lot of research lately into computing appliances – not just data warehouse appliances, but security, anti-spam and other appliance types as well. Today I added Cast Iron Systems to the list.

Essentially, they offer data integration without the common add-ons. I.e., there’s little or nothing in the way of data cleansing, composite apps, business process management, and/or business activity monitoring. Data just gets imported, extracted, and/or synchronized, whether between pairs of transactional systems, or between a transactional system and a reporting database. A particularly hot area of application for them seems to be SaaS/on-demand app integration (Salesforce.com, Netsuite, etc.) In particular, they boast both Lawson and Salesforce.com as internal users, and at least at Lawson they are used for a Salesforce/Lawson integration.

The big advantage to this strategy is that their integrator is simple enough for appliance deployment.

Read more

September 27, 2006

Oracle and Microsoft in data warehousing

Most of my recent data warehouse engine research has been with the specialists. But over the past couple of days I caught up with Oracle and Microsoft (IBM is scheduled for Friday). In at least three ways, it makes sense to lump those vendors together, and contrast them with the newer data warehouse appliance startups:

  1. Shared-everything architecture
  2. End-to-end solution story
  3. OLTP industrial-strengthness carried over to data warehousing

In other ways, of course, their positions are greatly different. Oracle may have a full order-of-magnitude lead on Microsoft in warehouse sizes, for example, and has a broad range of advanced features that Microsoft either hasn’t matched yet, or else just released in SQL Server 2005. Microsoft was earlier in pushing DBA ease as a major product design emphasis, although Oracle has played vigorous catch-up in Oracle10g.

Read more

September 24, 2006

More on data warehouse architecture choices

The very name of this blog comes from the kind of “horses for courses” data store strategy implied by my recent post on different kinds of data warehouse uses. A number of other commentators have recently made similar points, although they may not agree with every detail. For example, William McKnight pretty much makes the pure DBMS2 argument, pointing out that a partially virtual warehouse is often superior to a fully centralized physical one. And Andy Hayler of Kalido says pretty much the same thing, although he strongly calls out his difference in emphasis from William’s view.

A tip of the hat to Mark Rittman for pointing me to those two and others.

August 17, 2006

Business Objects on EIM, ETL, etc.

I chatted with some Business Objects ETL/EIM (Enterprise Information Management) folks today, in a call that was a direct response to what I heard from and posted about Informatica. The core of the Business Objects story can be summarized (albeit brutally!) like this:

Read more

August 8, 2006

eBay’s version of DBMS2

Every sufficiently large or agile enterprise needs to follow the DBMS2 approach. The following is from an article on eBay’s version:

“eBay has built a software-based Integration Tier. This contains both a data access layer (DAL) and a services framework. The Integration Tier acts as an abstraction layer for software engineers to work with many disparate back-end data sources through a consistent set of abstractions.”

Next Page →

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.