Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

December 9, 2005

SAP’s version of DBMS2

I just spent a couple of days at SAP’s analyst meeting, and realized something I’d somewhat forgotten – much of the DBMS2 concept was inspired by SAP’s technical strategy. That’s not to say that SAP’s techies necessarily agree with me on every last point. But I do think it is interesting to review SAP’s version of DBMS2, to the extent I understand it.

1. SAP’s Enterprise Services Architecture (ESA) is meant to be, among other things, an abstraction layer over relational DBMS. The mantra is that they’re moving to a “message-based architecture” as opposed to a “database architecture.” These messages are in the context of a standards-based SOA, with a strong commitment to remaining open and standards-based, at least on the data and messaging levels. (The main limitation on openness that I’ve detected is that they don’t think much of standards such as BPEL in the business process definition area, which aren’t powerful enough for them.)

2. One big benefit they see to this strategy is that it reduces the need to have grand integrated databases. If one application manages data for an entity that is also important to another application, the two applications can exchange messages about the entity. Anyhow, many of their comments make it clear that, between partner company databases (a bit of a future) and legacy app databases (a very big factor in the present day), SAP is constantly aware of situations in which a single integrated database in infeasible.

3. SAP is still deeply suspicious of redundant transactional data. They feel that with redundant data you can’t have a really clean model – unless, of course, you code up really rigorous synchronization. However, if for some reason synchronization is preferred – e.g., for performance reasons — it can be hidden from users and most developers.

4. One area where SAP definitely favors redundancy and synchronization is data warehousing. Indeed, they have an ever more elaborate staging system to move data from operational to analytic systems.

5. In general, they are far from being relational purists. For example, Shai Agassi referred to doing things that you can’t do in a pure relational approach. And Peter Zencke reminded me that this attitude is nothing new. SAP has long had complex business objects, and even done some of its own memory management to make them performant, when they were structured in a manner that RDBMS weren’t well suited for. (I presume he was referring largely to BAPI.)

6. That said, they’re of course using relational data stores today for most things. One exception is text/content, which they prefer to store in their own text indexing/management system TREX. Another example is their historical support for MOLAP, although they seem to be edging as far away from that as they can without offending the MOLAP-loving part of their customer base.

Incidentally, the whole TREX strategy is subject to considerable doubt too. It’s not a state-of-the-art product, and they currently don’t plan to make it into one. In particular, they have a prejudice against semi-automated ontology creation, and that has clearly become a requirement for top-tier text technologies.

7. One thing that Peter said which confused me a bit is when we were talking about nonrelational data retrieval. The example he used was retrieving information on all of a specific sales reps’ customers, or perhaps on several sales reps’ customers. I got the feeling he was talking about the ability to text search on multiple columns and/or multiple tables/objects/whatever at once, but I can’t honestly claim that I connected all the dots.

And of course, the memory-centric ROLAP tool BI Accelerator — technology that’s based on TREX — is just another example of how SAP is willing to go beyond passively connecting to a single RDBMS. And while their sponsorship of MaxDB isn’t really an example of that, it is another example of how SAP’s strategy is not one to gladden the hearts of the top-tier DBMS vendors.

Categories: EAI, EII, ETL, ELT, ETLT, Memory-centric data management, MOLAP, OLTP, SAP AG, Theory and architecture

9 Comments

December 9, 2005

Relational DBMS versus text data

There seems to be tremendous confusion about “search,” “meaning,” “semantics,” the suitability of relational DBMS to manage text data, and similar subjects. Here are some observations that may help sort some of that out.

1. Relational database theorists like to talk about the “meaning” or “semantics” of data as being in the database (specifically its metadata, and more specifically its constraints). This is at best a very limited use of the words “meaning” or “semantics,” and has little to do with understanding the meaning of plain English (or other language) phrases, sentences, paragraphs, etc. that may be stored in the database. Hugh Darwen is right and his fellow relational theorists are confused.

2. The standard way to manage text is via a full-text index, designed like this: For hundreds of thousands of words, the index maintains a list of which documents the word appears in, and at what positions in the document it appears. This is a columnar, memory-centric approach, that doesn’t work well with the architecture of mainstream relational products. Oracle pulled off a decent single-server integration nonetheless, although performance concerns linger to this day. Others, like Sybase, which attempted a Verity integration, couldn’t make it work reasonably at all. Microsoft, which started from the Sybase architecture, didn’t even try, or if they tried it wasn’t for long; Microsoft’s text search strategy has been multi-server more or less from the getgo.

3. Notwithstanding point #2, Oracle, IBM, Microsoft, and others have SQL DBMS extended to handle text via the SQL3 (or SQL/MM ) standard. (Truth be told, I get the names and sequencing of the SQL standard versions mixed up.) From this standpoint, the full text of a document is in a single column, and one can write WHERE clauses on that column using a rich set of text search operators.

But while such SQL statements formally fit into the relational predicate logic model, the fit is pretty awkward. Text search functions aren’t two-valued binary yes/no types of things; rather, they give scores, e.g. with 101 possible values (the integers from 0 – 100). Compounding them into a two-valued function typically throws away information, especially since that compounding isn’t well understood (which is why it’s so hard to usefully federate text searches across different corpuses).

4. Something even trickier is going on. Text search can be carried out against many different kinds of things. One increasingly useful target is the tables of a relational database. Where a standard SQL query might have trouble finding all the references in a whole database to a particular customer organization or product line or whatever, a text search can do a better job. This kind of use is becoming increasingly frequent. And while it works OK against relational products, it doesn’t fit into the formal relational model at all (at least not without a tremendous amount of contortion).

5. Relational DBMS typically manage the data they index. Text search systems often don’t. But that difference is almost a small one compared with some of the others mentioned above, especially since it’s a checkmark item for leading RDBMS to have some sort of formal federation capability.

Categories: Data types, Database diversity, Memory-centric data management, Text, Theory and architecture

13 Comments

December 9, 2005

More flame war stupidity

Robert Seiner (publisher of TDAN) and Fabian Pascal are now claiming that Computerworld approached Bob and asked him to do something about the false charge that I personally engaged in censorship. To the best of my knowledge, they’re both lying. It was just me, and me alone, who approached Bob, which is exactly what one would think, if for some odd reason one cared about the matter at all. I don’t have the faintest idea why they fabricated this story, or what they think it demonstrates — but they did.

Seiner also picked a title for an article of mine he published, then published one by Fabian attacking me for the title. Classy.

Bob also made two promises in the matter which he didn’t keep. Nor did he have the courtesy to inform me that he’d changed his mind, nor did he really address it when I called him on it.

I wondered why Seiner kept on publishing Pascal’s stuff, even for free, when most of Fabian’s other publishers have dropped him. Now I have a better idea. They’re soulmates.

A pity. Partway through our discussions, Bob sounded eminently reasonable. That’s why I jumped at his suggestion I write an article for him. Oh well; live and learn.

And for the record — no, I won’t respond to Pascal’s critiques point by point. He typically attacks straw men, rather than restricting his barbs to my actual opinions. In those areas where we do actually disagree, I haven’t hesitated to publish follow-on arguments, repeatedly and at length, here and elsewhere. I’ve given that relative nonentity much more attention than he deserves.

Also for the record — even though I don’t respond to every nasty shot Pascal and his associates take at me, I’m of course not conceding that his other libels and opinions are actually correct. I just think that by and large he’s a waste of bandwidth, because even his coherent ideas are quickly sidetracked by highly illogical fulminations. Even in articles where he’s otherwise making enough sense to respond to, he usually goes off on some extremist semantics-related kick that doesn’t mesh well with his own imperfect command of the English language.

(I really want to respond to his film contracts example from a three-year-old anti-XML diatribe. But the article gets bogged down with various “definitions” that are not easily reconciled to normal usage of the words, and it’s too much trouble to sort through them all. Maybe I’ll respond to the idea without linking to the article itself, when I get around to it.)

Exception to the above slam at Pascal — he recently posted a good interchange he had with Hugh Darwen, which I’m referencing in another post in this blog. His side was wrong, but both sides were well-presented.

Categories: About this blog, Data models and architecture

2 Comments

November 18, 2005

Two purely theoretical problems with TransRelational(TM)

There’s a vigorous discussion of TransRelational over on Alf Pedersen’s blog (Edit: Link died), although it’s completely polluted by some usual-suspects flame war BS.

Alf did poke through the dreck, however, to make a reasonable challenge, which can be paraphrased as:

OK. Suppose you’re right that no implementation has ever provided evidence of TransRelational’s usefulness for building a True Relational DBMS. It’s still theoretically fascinating.

My response was as follows:

Here are two big problems with TransRelational that are perfectly theoretical.

First, it assumes that values can be concisely stated, presumably as numbers or character strings. That isn’t a good match to complex datatypes such as, say, documents that should be full-text indexed.

Second, it assumes that there’s a natural sort order. That could be a bit of a problem even for, say, geospatial. One would think there’s a workaround in the geospatial case, e.g. like Oracle’s old hhencode. But hhencode was a fiasco, I think because it didn’t actually measure proximity very effectively.

Admittedly, both of my objections also apply to good old b-trees. Still, they speak against the potential of a TransRelational implementation to achieve the kind of generality I think modern applications do and will increasingly demand.

Basically, I think a “True Relational” DBMS that was only useful for columns with natural sort orders wouldn’t be particularly interesting. And “The Third Manifesto” notwithstanding, that’s the only kind anybody seems to have even hinted at trying to bring to market.

Categories: Data types, GIS and geospatial, Theory and architecture, TransRelational

1 Comment

November 13, 2005

Gartner on “The Death of the Database”

Gartner had a recent conference session on “The Death of the Database,” as described in David Berlind’s and Kathy Somebodyorother’s blogs. The core idea was that data in the future might be stored closest to where it would need to be used, which might not be in a traditional DBMS.

Before getting to the real meat of that, let me push back at some of the extremist boobirds. First, I doubt the analysts really talked about “the intersection of a row and a tuple”; it’s much more likely that that is a misquote due to reporting error. Second, their claim that BI will switch from being an “application” to a “service” is not at all unreasonable. BI should never have been viewed as an application; it’s much more a collection of application-enabling technologies. And the analysts explicitly said that DBMS will continue to be useful for analytics. As for their claim that some data needs to be only briefly persistent — they’re absolutely right, but let me defer that point to a separate post on memory-centric OLTP.

All that said — while a lot of their points ring true, it sounds as if they overstated their case in one important area. They’re making it sound as if some of today’s OLTP databases will no longer be needed, and as if tomorrow’s new kinds of OLTP data won’t need to be at least partly persisted to conventional DBMS. Wrong and wrong. Every important transaction needs to wind up in a DBMS. Those DBMS may not be as centralized as previously thought. The data may be copied to non-DBMS data stores (or, more likely, kept in a lightweight local DBMS and copied from there to serioius OLTP database). These DBMS may use native XML rather than traditional tabular data structures. But at the end of the day, transactional databases will continue to be needed for all the reasons they’ve been necessary in the past.

Categories: Business intelligence, Database diversity, Structured documents, Theory and architecture

Oh, dear — Chris Date is displeased with me

Chris Date is quite annoyed with me, and has taken issue with various things I’ve written. Some of his reasoning is hard to follow. For example, he said something to the effect that it would be silly for him to ever say anything misleading, because he’d immediately be caught out. Uh, Chris – you’re the guy who’s berating the terrible level of education and understanding in a field for which YOU WROTE THE DEFINITIVE TEXTBOOK (which has sold “over 700,000 copies”). If your readers can’t even understand the correct things you say in your book, why should they be able to instantly spot the errors? Read more

Categories: Theory and architecture, TransRelational

26 Comments

October 13, 2005

It’s not about a single database

Critics of the DBMS2 idea generally are focused on the design of a single database. That’s somewhat missing the point.

Here are some excerpts and paraphrases from a discussion over on TDAN.

“DBMS2” is NOT primarily a blueprint for how to design a single database or a single DBMS. That said, it does give guidance as to what kinds of DBMS and data architecture choices you should consider and favor for each cluster of applications.
Text, presence, authentication, customer profiling – I don’t think any of these will wind up being handled relationally, although at least in the case of customer profiling that’s currently a minority viewpoint.
In particular, text processing is poised to explode as a fraction of the overall IT burden.
XML is pretty clearly going to be the basis of text data management.
Unless all your apps are built by the same company, and perhaps not even then, there’s no way you’ll have a single integrated database. The way your different databases will talk to each other is XML.
It’s clear that a typical large enterprise’s data structure will evolve to part relational and part XML, and possibly some other data models as well, all tied together by XML. None of the relational-über-alles arguments can or should change that. At best, they are reasons to make the relational piece bigger and more tightly integrated.

Categories: Database diversity, Theory and architecture

3 Comments

October 10, 2005

Limitations of the Relational Model

In my October Computerworld column, I tried to explain some of the reasons why I don’t think the pure Relational Model should be as absolutely dominant as its most fervent proponents assert.

The key points were:

1. Logical and physical modeling will never be completely separable.
2. “True relational” DBMSs are very unlikely ever to be practically useful, except perhaps in narrow niches.
3. Enterprises don’t fully control their data models.
4. Duplicated data is not inherently bad.
5. Saying that the relational model (RM) is based on mathematics proves almost nothing.
6. IT isn’t just concerned with facts.

For details see the link above.

And while I’m at it, here’s a link to my September Computerworld column, on three life-and-death apps that won’t get built with a relational architecture.

Categories: Theory and architecture

30 Comments

October 10, 2005

TransRelational(TM) nonsense

Database guru Christopher J. Date is apparently accepting money from attendees to his seminars on TransRelational(TM) database archicture, so that he can tell them about an as-yet unreleased product from Required Technologies, Inc.

This is regrettable on multiple levels.

1. Required Technologies shut down product development in 2002, after running through $30 million; there’s great acrimony between investors and the CEO; and lawsuits are likely.

2. Required’s product never did most of what Date seems to be claiming it now does. It was a read-oriented columnar data store, much like Sybase IQ or a number of other products from younger companies. Read more

Categories: Benchmarks and POCs, Columnar database management, Memory-centric data management, TransRelational

65 Comments

October 10, 2005

The Amazon.com bookstore is a huge, modern OLTP app. So is it relational?

I don’t know for a fact that the Amazon.com bookstore is the world’s biggest OLTP application — but if it isn’t, it’s close.

And the thing is — that’s never been an entirely relational application. Oh, the ordering part surely is. But the inventory lookup is currently driven by an OODBMS (from Progress). The personalization used to be done in Red Brick (I knew which software replaced it, but I’m forgetting at the moment — it may even be one of the relational warehouse appliance vendors). And of course the full-text search is a custom in-house system.

Categories: Amazon and its cloud, Cache, Data types, Database diversity, Memory-centric data management, Object, OLTP, Progress, Apama, and DataDirect, Specific users

4 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

SAP’s version of DBMS2

Relational DBMS versus text data

More flame war stupidity

Two purely theoretical problems with TransRelational(TM)

Gartner on “The Death of the Database”

Oh, dear — Chris Date is displeased with me

It’s not about a single database

Limitations of the Relational Model

TransRelational(TM) nonsense

The Amazon.com bookstore is a huge, modern OLTP app. So is it relational?

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin