November 14, 2005

So how robust is Ingres?

CA is spinning off Ingres, more or less, to an investment fund led by Terry Garnett, who will also be interim of CEO. Now, I’ve given Terry a lot of grief over the decades. It started by accident, when I bashed his presentation of Lightyear at a 1984 party in Rosann Stach’s house (where we also used Jerry Kaplan as a subject for the Mindprober psychological analysis product — those were the days of goofy software!). Years later, I didn’t even recall that had been Terry until I was reminded. But in the early 1990s, when Terry and Jerry Baker were dueling at Oracle, I was firmly in the Jerry Baker camp, and believe I was right to this day. Still — be all that as it may, Terry knows DBMS and knows promotion, and if the company falls flat it won’t be because he screwed it up. He’s no dunce, and he’s been around DBMS a loooong time.

But how stands the product? Let’s flash back a decade, to when CA bought it. Ingres was a solid general-purpose RDBMS. But it was beginning to fall behind the technology power curve, especially on the data warehousing side. (For more detail, see my Ingres history post over in the Software Memories blog.) And then product development slowed to a crawl. Tony Gaughan, who ran the product for CA before the latest move, claims that they’ve actually done a good job on advancing the product on the OLTP side, perhaps to the point of comparability with Oracle9i, and certainly ahead of MySQL 5.0. I’m inclined to believe him, after applying some reasonable discount factor for expected puffery, in part because this wasn’t a high hurdle to cross. Over the past decade, the main action in high-end DBMS product enhancement has been in data warehousing and nontabular datatypes, not in OLTP.

Where Ingres definitely seems to lag is in data warehousing. E.g., there are no materialized views, and I bet that even if they have some of the index types such as bitmaps, star schemas, etc., the implementation, optimizer support, administrative support, and so on lag far behind that of Oracle and IBM. So again, the proper comparison for Ingres isn’t Oracle and IBM; it’s fellow open source vendor MySQL. Only — deserved or not, MySQL has a ton of momentum for such a small company, incuding an attractive product plan partially fueled by SAP.

Appliance vendor DATallegro makes a plausibiity argument that Ingres can be adapted for nontrivial data warehouse uses as well. But while that’s cool, and might even become persuasive once DATallegro has some happy, disclosed customers, it’s not the same as saying you want to put a big data warehouse into off-the-shelf Ingres.

So basically, I’m afraid that Ingres is going to appeal mainly to users who either already are making major use of it, or else have a huge problem with paying the license fees demanded by other vendors. I wish them well, and hope they kindle a spark somehow; but right now I don’t see where it would be coming from.

Categories: Actian and Ingres, Data warehouse appliances, MySQL, Open source

2 Comments

November 14, 2005

Defining and surveying “Memory-centric data management”

I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.

I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:

TimesTen (now owned by Oracle): TimesTen is the quintessentional “in-memory DBMS.” It’s a fairly full relational DBMS, but if you want to persist memory to disk it has to be handed off to a conventional DBMS. Historically, that has usually been MySQL or Oracle. TimesTen’s biggest market penetration has been in financial trading.
Solid Information Technology‘s BoostEngine: Solid is a Finnish company (or was — it’s pretty American now) specializing in embedded DBMS sold mainly for telecommunication uses. Big OEM customers include several well-known telecom equipment manufacturers and HP (for OpenView). “Embedded” often means no DBA, no monitor, no keyboard — they box manufacturer installs it and there it stays for the life of the product. Solid has to offer strong replication capabilities, since its products are often used in highly distributed (e.g., multiblade, multibox) environments. So it’s taken the next step and exploited the replication by allowing customers to use some instances of the product disklessly.
Event-stream products from Streambase and Progress: The canonical application for event-stream products is automating financial trading decisions based on the flow of market information. Mike Stonebraker, the brains behind Streambase, has recently popularized the idea; Progress bought Apama, who actually have been in the business longer. These applications require even more speed than the financial trading apps that TimesTen handles, and they discard most of the information they look at. In-memory is the only way to go.
Progress’s ObjectStore: ObjectStore comes from the company Object Design, which merged into Excelon, which was acquired by Progress. It’s really a toolkit for building DBMS and similar systems, which is why it’s at various times been marketed as an OODBMS and an XML DBMS, without a lot of success either way. But there have been a few sterling apps built in ObjectStore even so, including a key part of the Amazon bookstore Despite this limited market success, a significant fraction of Progress’s best engineering talent has moved over to the Real-Time Division to focus on ObjectStore and other memory-centric products. The memory-centric aspect of ObjectStore is this: ObjectStore’s big virtue is that it gets objects from disk to memory and vice-versa very efficiently, then distributes and caches them around a network as needed. This was originally invented for client/server processing, but works fine in a multi-server thin client setup as well. And object processing, of course, relies on a whole lot of pointers. And pointer-chasing is pretty much the worst way to deal with the disk speed barrier, unless you do it in main memory.
Applix‘s TM1: Like many companies in the analytics area, Applix has had trouble deciding whether it sells applications, BI system software, or both. But in any case its core technology is TM1, a memory-centric MOLAP offering. Traditional MOLAP products reside on the horns of a nasty dilemma: They rely on precalculation to give good performance, but that causes ghastly database explosion. Applix gets out of this problem by doing no precalculation whatsoever, loading the data into main memory, and executing all queries on the fly.
SAP’s BI Accelerator: SAP is building out an elaborate technology stack with NetWeaver, especially in the BI area. One important aspect is that the full data warehouse is logically broken (or copied) into a series of data marts called “InfoCubes.” BI Accelerator takes the logical next step, loading an entire InfoCube into main memory. Almost every query is executed via a full table scan, which would be insane on disk but makes perfect sense when the data is already in RAM.

So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.

Categories: Cognos, Data types, In-memory DBMS, Memory-centric data management, MOLAP, Object, OLTP, Oracle TimesTen, Progress, Apama, and DataDirect, SAP AG, solidDB, Streaming and complex event processing (CEP)

2 Comments

November 13, 2005

Breaking through the disk speed barrier

Most aspects of computer performance and capacity grow at Moore’s Law kinds of speeds. Doubling times may be anywhere from 9 months to 2 years, but in any case speeds and storage capacities grow exponentially quickly. Not so, however, with disk rotation speeds. The very first disk drives, over 50 years ago, rotated 1,200 times per minute. Today’s top disk rotation speed is around 15,000 RPM. Indeed, while I recall seeing a reference to one at 15,600 RPM, I can’t now go back and find it. Yes, folks; disk rotational speed in the entire history of computing has increased just by a measly factor of 13.

Why does this matter to DBMS design? Simply put, disk rotation speed is an absolute limit to the speed of random disk-based data access. Today’s fastest disks take 4 milliseconds to rotate once. Thus, multiple heads aside, getting something from a known but random location on the disk will take at least 2 milliseconds. And a naive data management algorithm will, for a single query, result in dozens or even hundreds of random accesses.

Thus, for a DBMS to run at acceptable speed, it needs to get data off disk not randomly, but rather a page at a time (i.e., in large blocks of predetermined size) or better yet sequentially (i.e., in continuous streams of indeterminate size). The indexes needed to assure these goals had best be sized to fit entirely in RAM. Clustering also plays an increasingly large role, so that data needed at the same time is likely to be on the same page, or at least in the same part of the disk.

Right there I’ve described some of the toughest ongoing challenges facing DBMS engineers. The big vendors all do a great job at meeting them (if they didn’t, they’d be out of business). Even so, some small companies find themselves able to beat the big guys, by some egregious cheating.

Data warehouse appliance vendors such as Netezza and especially Datallegro optimize their systems to stream data sequentially off of disk. In doing so, they go deeper into the operating systems, hardware, etc. than Oracle could ever allow itself to do. And the results seem pretty good. But I’ll write about that another time. Instead, I’m focusing right now on memory-centric data management; please see my other posts in that topic category.

Categories: Data warehouse appliances, Memory-centric data management

5 Comments

November 13, 2005

Gartner on “The Death of the Database”

Gartner had a recent conference session on “The Death of the Database,” as described in David Berlind’s and Kathy Somebodyorother’s blogs. The core idea was that data in the future might be stored closest to where it would need to be used, which might not be in a traditional DBMS.

Before getting to the real meat of that, let me push back at some of the extremist boobirds. First, I doubt the analysts really talked about “the intersection of a row and a tuple”; it’s much more likely that that is a misquote due to reporting error. Second, their claim that BI will switch from being an “application” to a “service” is not at all unreasonable. BI should never have been viewed as an application; it’s much more a collection of application-enabling technologies. And the analysts explicitly said that DBMS will continue to be useful for analytics. As for their claim that some data needs to be only briefly persistent — they’re absolutely right, but let me defer that point to a separate post on memory-centric OLTP.

All that said — while a lot of their points ring true, it sounds as if they overstated their case in one important area. They’re making it sound as if some of today’s OLTP databases will no longer be needed, and as if tomorrow’s new kinds of OLTP data won’t need to be at least partly persisted to conventional DBMS. Wrong and wrong. Every important transaction needs to wind up in a DBMS. Those DBMS may not be as centralized as previously thought. The data may be copied to non-DBMS data stores (or, more likely, kept in a lightweight local DBMS and copied from there to serioius OLTP database). These DBMS may use native XML rather than traditional tabular data structures. But at the end of the day, transactional databases will continue to be needed for all the reasons they’ve been necessary in the past.

Categories: Business intelligence, Database diversity, Structured documents, Theory and architecture

TransRelational(TM) — The final debunking

In prior posts, I’ve mentioned the essential dishonesty behind the hoohah around Transrelational(TM) technology from Required Technologies, Inc., and Chris Date’s highly regrettable promotion of same. Now I’ve been able to get more detail from another former executive of the company. Unsurprisingly, it corroborates what I wrote before, and utterly contradicts some of the myths spread by Date and his acolytes. This executive, while requesting that his name be withheld because of the acrimony between the CEO and just about every other company insider, otherwise gave me permission to report fully on what he told me. Read more

Categories: Memory-centric data management, TransRelational

5 Comments

October 29, 2005

Oh, dear — Chris Date is displeased with me

Chris Date is quite annoyed with me, and has taken issue with various things I’ve written. Some of his reasoning is hard to follow. For example, he said something to the effect that it would be silly for him to ever say anything misleading, because he’d immediately be caught out. Uh, Chris – you’re the guy who’s berating the terrible level of education and understanding in a field for which YOU WROTE THE DEFINITIVE TEXTBOOK (which has sold “over 700,000 copies”). If your readers can’t even understand the correct things you say in your book, why should they be able to instantly spot the errors? Read more

Categories: Theory and architecture, TransRelational

26 Comments

October 18, 2005

EII marketing soup

In the comments to another thread, the subject of EII (Enterprise Information Integration) came up. It’s a tricky one, for several reasons.

First, it’s a marketing construction — a blend between between ETL (Extract, Transform, Load) and EAI (Enterprise Application Integration). It’s a legitimate category; all those things are getting smushed together as near-real-time apps become more prominent. Still, it’s also an attempt to grab marketing turf.

Second, it’s commonly associated with a marketing overreach — the claim that an EII “platform” or “suite” will do everything a DBMS does (almost), but fully and heterogeneously distributed as well. Yeah, right.

Third, two of the sharpest proponents have been acquired by behemoths that tend to obscure their acquirees marketing pitches — Ascential by IBM and SeeBeyond by Sun.

Fourth, some of the best grand integrated EII suites (at least the ones that started as ETL, which is the side I’m more familiar with) aren’t complete yet. So vendors didn’t want to be too clear for fear of freezing current sales. I’m referring here mainly to Ascential and Informatica. They told analysts of their grand plans, but they haven’t been so eager to openly publicize the full details.

Fifth, the area is getting integrated with development tools for composite applications. Good examples there are SeeBeyond and Intersystems’ Cache’.

Sixth, no EII vendors’ plans fully work unless they have full relational and XML integration, and nobody really has been doing a great job on that, typically being strong in one area or the other.

Obviously, this is an area I have to research actively; EII is the neuromuscular system that holds DBMS2 together. But all the research in the world won’t change the fact that as of now it’s the weak spot in the story. There’s lots of great database management technology, and lots of excellent reasons to use a variety of kinds of that technology in your enterprise. But the tools to knit the resulting heterogeneous databases together are still sadly deficient.

Categories: EAI, EII, ETL, ELT, ETLT, IBM and DB2, Informatica, Intersystems and Cache'

3 Comments

October 13, 2005

It’s not about a single database

Critics of the DBMS2 idea generally are focused on the design of a single database. That’s somewhat missing the point.

Here are some excerpts and paraphrases from a discussion over on TDAN.

“DBMS2” is NOT primarily a blueprint for how to design a single database or a single DBMS. That said, it does give guidance as to what kinds of DBMS and data architecture choices you should consider and favor for each cluster of applications.
Text, presence, authentication, customer profiling – I don’t think any of these will wind up being handled relationally, although at least in the case of customer profiling that’s currently a minority viewpoint.
In particular, text processing is poised to explode as a fraction of the overall IT burden.
XML is pretty clearly going to be the basis of text data management.
Unless all your apps are built by the same company, and perhaps not even then, there’s no way you’ll have a single integrated database. The way your different databases will talk to each other is XML.
It’s clear that a typical large enterprise’s data structure will evolve to part relational and part XML, and possibly some other data models as well, all tied together by XML. None of the relational-über-alles arguments can or should change that. At best, they are reasons to make the relational piece bigger and more tightly integrated.

Categories: Database diversity, Theory and architecture

3 Comments

October 10, 2005

Limitations of the Relational Model

In my October Computerworld column, I tried to explain some of the reasons why I don’t think the pure Relational Model should be as absolutely dominant as its most fervent proponents assert.

The key points were:

1. Logical and physical modeling will never be completely separable.
2. “True relational” DBMSs are very unlikely ever to be practically useful, except perhaps in narrow niches.
3. Enterprises don’t fully control their data models.
4. Duplicated data is not inherently bad.
5. Saying that the relational model (RM) is based on mathematics proves almost nothing.
6. IT isn’t just concerned with facts.

For details see the link above.

And while I’m at it, here’s a link to my September Computerworld column, on three life-and-death apps that won’t get built with a relational architecture.

Categories: Theory and architecture

30 Comments

October 10, 2005

TransRelational(TM) nonsense

Database guru Christopher J. Date is apparently accepting money from attendees to his seminars on TransRelational(TM) database archicture, so that he can tell them about an as-yet unreleased product from Required Technologies, Inc.

This is regrettable on multiple levels.

1. Required Technologies shut down product development in 2002, after running through $30 million; there’s great acrimony between investors and the CEO; and lawsuits are likely.

2. Required’s product never did most of what Date seems to be claiming it now does. It was a read-oriented columnar data store, much like Sybase IQ or a number of other products from younger companies. Read more

Categories: Benchmarks and POCs, Columnar database management, Memory-centric data management, TransRelational

65 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in