November 21, 2005

Is Oracle losing its edge?

Over in the Monash Report, I posed the question: Is Oracle losing its edge in DBMS? Here are some of the data points that make me suspect it has. A number of these points also apply to the other large mainstream DBMS vendors; a number, however, do not.

That’s a lot of evidence, even without mentioning threats from the open sourcers and the data warehouse appliance guys. So why am I not wholly convinced yet? Well, reasons include a variety of scalability features, extensibility features that are rivaled only by IBM’s, market share dominance on Linux, and Andy Mendelsohn. That’s a pretty compelling list too. Still, the Oracle colossus is teetering a little bit, and it’s not beyond imagination that some future earthquake could bring it crashing down.

November 18, 2005

Two purely theoretical problems with TransRelational(TM)

There’s a vigorous discussion of TransRelational over on Alf Pedersen’s blog (Edit: Link died), although it’s completely polluted by some usual-suspects flame war BS.

Alf did poke through the dreck, however, to make a reasonable challenge, which can be paraphrased as:

OK.  Suppose you’re right that no implementation has ever provided evidence of TransRelational’s usefulness for building a True Relational DBMS. It’s still theoretically fascinating.

My response was as follows:

Here are two big problems with TransRelational that are perfectly theoretical.

First, it assumes that values can be concisely stated, presumably as numbers or character strings. That isn’t a good match to complex datatypes such as, say, documents that should be full-text indexed.

Second, it assumes that there’s a natural sort order. That could be a bit of a problem even for, say, geospatial. One would think there’s a workaround in the geospatial case, e.g. like Oracle’s old hhencode. But hhencode was a fiasco, I think because it didn’t actually measure proximity very effectively.

Admittedly, both of my objections also apply to good old b-trees. Still, they speak against the potential of a TransRelational implementation to achieve the kind of generality I think modern applications do and will increasingly demand.

Basically, I think a “True Relational” DBMS that was only useful for columns with natural sort orders wouldn’t be particularly interesting. And “The Third Manifesto” notwithstanding, that’s the only kind anybody seems to have even hinted at trying to bring to market.

November 17, 2005

Native XML Storage, Part 2 (apps)

The introduction and technical-implementation part of this discussion was in Part 1.

It seems likely that widespread adoption of native XML storage is, at best, several years off, if for no other reason than that the DML (Data Manipulation Language) situation is still rather primitive. But looking beyond that nontrivial problem, it does seem as if there are broad classes of application that might go better in native XML. Here’s a survey.

First of all, there’s what might be called custom document composition – technical publishing, customized technical manuals, etc. If you make complex products, or sell information, this is obviously an important specialty application for you. Otherwise, it probably is rather peripheral, at least for now. If you do have an interest in this area, by the way, you shouldn’t only look at the big guys’ XML offerings; you should even talk to specialists like Mark Logic. (Mark Logic sells an XML-only DBMS with a strong text-search orientation.)

Second, there are complex documents with low update rates. Medical records are a prime example – and, by the way, may of those are stored in InterSystems’ OODBMS Cache rather than in a relational system. Other examples might include insurance claims, media assets, etc. – basically, the areas that have been thought of as the purview of document management systems. In many cases, these apps ain’t broke and shouldn’t be fixed, such as when they exist mainly to satisfy slow-changing regulatory requirements. Besides, it’s not obvious that native XML is particularly useful for these apps anyway. Often, the information is in a DBMS for three main reasons: General manageability (e.g., backup), ad-hoc searchability, and management of metadata. If the metadata is simple enough to fit comfortably into a tabular structure, extended-relational DBMS may be satisfactory as underpinnings for these apps indefinitely.

Third, and here’s where it really begins to get interesting, is complex transactional documents. One of the flagship apps in Viper’s alpha test was financial derivatives trading, with complex, number-laden, term-laden contracts being processed very quickly, and it’s easy to envision that kind of functionality spreading across the trading sector. Governments – wisely or not – may want to require new complex forms to be filled out, or to make older ones easier to process. (E.g., tax returns, or applications for various kinds of permits.) If privacy concerns allow, medical information might be collected and processed centrally by governments or large insurance providers. Complex service-level agreements could be negotiated for a broad variety of product and service categories. Customers might demand radically faster processing of insurance claims than has historically been necessary. Indeed, it’s hard to think of an industry sector where complex transactional documents might not gain a foothold. And if you’re looking for high performance access to portions of documents, native XML may well be the best storage choice.

Finally, there’s a fourth category, which I’ll give the trendy-looking name Profiles 2.0, in imitation of Web 2.0, Identity 2.0, and so on. Here’s what I mean by it. A number of the hottest buzzconcepts in computing focus on collecting, organizing, and using information about individual people – presence, identity, personalization/customization, narrowcasting/market-of-one, data mining/predictive analytics, weblog analysis, social software, and so on. Put all those together, and you have a humongous hairball of a user profile that no current systems come close to handling properly.

Let’s think about some characteristics of this data. Some of it is transient. Some of it is unreliable. Some of it indeed is guesswork – albeit educated guesswork – rather than fact (e.g., the results of data mining analyses). Much of it exists for some profilees but not others. Much of it is naturally tree- or graph-shaped (e.g., information about website traversal, product category interests, relationship networks, role-based authorizations, etc.) There are many kinds of it; pulling it all together relationally can lead to Joins From Hell.

And this isn’t just for individuals; similar kinds of stories can be told for information about organizations, battleships, and so on. Those are objects with rich internal structures. True, those can usually be modeled hierarchically – but at each node, some of the complications mentioned in the prior paragraph occur. Profiling an enterprise is even messier than profiling a single individual who shops or works there.

Applications using this kind of information are typically extremely primitive, even though the beginnings of the personalization hype are now 7-8 years in the past. I don’t think we’re going to get these systems kind right until we take a true, holistic view of individuals and their profiles – and until we learn how to think about apps whose fundamental objects keep changing in shape. But as hard as the problem is, it has to be worked on immediately, because what I’m talking about here are some of the major classes of competitive-advantage app.

So Profiles 2.0 isn’t something we can just ignore. And when we do pay attention to it, I don’t think we’ll find that it looks very natural dressed in rows and columns.

November 17, 2005

Native XML storage, Part 1 (technology)

IBM’s “Viper” version of DB2 is in open beta test, whatever that means, and Microsoft’s SQL Server 2005, nee Yukon, is in general release. Both have native XML capabilities surpassing Oracle’s – which is interesting in its own right, because it’s rare for either of those vendors to pull ahead of Oracle in an OLTP feature, and almost unprecedented for both to do so at once.

So let’s talk about native XML support, what it is, and who might or should care about it. (Well, the apps part is actually in a separate Part 2 post.) Most of this is based on research that’s several months old, but except for a scarcity of actual user interviews, that shouldn’t matter much.

There are two main non-native ways to put XML into a SQL database such as Oracle – shredding and LOBs (BLOBs or CLOBs – i.e., Binary or Character Large OBjects). Both can perform poorly, for different reasons. Shredding takes XML documents and distributes them among a bunch of tables. So one update in XML can become many updates when shredded, and one lookup in XML can become a complex join from shredded storage. LOB storage obviates those problems, but creates another – even when you’re only looking for part of a document, you have to retrieve and handle the whole thing, and the same goes for updates.

So native storage can be a good thing when you can afford neither the performance hit of shredding, nor of LOB storage, nor of any available hybrid. It also could be good if getting good performance from non-native storage, while possible, would create undue burdens on application development, or if there’s some other reason one or both of the shredding and LOB approaches isn’t viable.

One nice feature is that native-XML storage has almost no downside, at least if you get it from the high-end DBMS vendors. IBM, Oracle, and Microsoft have all worked out ways to have integrated query parsing and query optimization, while letting storage be more or less separate. More precisely, Oracle actually still sticks everything into one data store (hence the lack of native XML support), but allows near-infinite flexibility in how it is accessed. Microsoft has already had separate servers for tabular data, text, and MOLAP, although like Sybase, it doesn’t have general datatype extensibility that it can expose to customers, or exploit itself to provide a great variety of datatypes. IBM has had Oracle-like extensibility all along, although it hasn’t been quite as aggressive at exploiting it; now it’s introduced a separate-server option for XML. Both Microsoft and IBM claim that their administrative tools are slick enough that the DBA has little extra work from their offerings than would be present in a true single-server solution.

So how does the storage actually work? The basic idea is exactly what you’d think. Data is stored in name-value pairs, with pointers connecting parents to children. The secret sauce (and here I have less detail than I’d like) is the extra information that’s stored, either at the nodes directly, or in an overarching index. Obviously, there’s a tradeoff between update and retrieval speed. And equally obviously, I need to learn more of the particulars.

And on that somewhat lame note, let me point you at Part 2 of this post, which discusses whether and how this stuff will actually be used. (Preview: It will, big time – I think.)

November 14, 2005

So how robust is Ingres?

CA is spinning off Ingres, more or less, to an investment fund led by Terry Garnett, who will also be interim of CEO. Now, I’ve given Terry a lot of grief over the decades. It started by accident, when I bashed his presentation of Lightyear at a 1984 party in Rosann Stach’s house (where we also used Jerry Kaplan as a subject for the Mindprober psychological analysis product — those were the days of goofy software!). Years later, I didn’t even recall that had been Terry until I was reminded. But in the early 1990s, when Terry and Jerry Baker were dueling at Oracle, I was firmly in the Jerry Baker camp, and believe I was right to this day. Still — be all that as it may, Terry knows DBMS and knows promotion, and if the company falls flat it won’t be because he screwed it up. He’s no dunce, and he’s been around DBMS a loooong time.

But how stands the product? Let’s flash back a decade, to when CA bought it. Ingres was a solid general-purpose RDBMS. But it was beginning to fall behind the technology power curve, especially on the data warehousing side. (For more detail, see my Ingres history post over in the Software Memories blog.) And then product development slowed to a crawl. Tony Gaughan, who ran the product for CA before the latest move, claims that they’ve actually done a good job on advancing the product on the OLTP side, perhaps to the point of comparability with Oracle9i, and certainly ahead of MySQL 5.0. I’m inclined to believe him, after applying some reasonable discount factor for expected puffery, in part because this wasn’t a high hurdle to cross. Over the past decade, the main action in high-end DBMS product enhancement has been in data warehousing and nontabular datatypes, not in OLTP.

Where Ingres definitely seems to lag is in data warehousing. E.g., there are no materialized views, and I bet that even if they have some of the index types such as bitmaps, star schemas, etc., the implementation, optimizer support, administrative support, and so on lag far behind that of Oracle and IBM. So again, the proper comparison for Ingres isn’t Oracle and IBM; it’s fellow open source vendor MySQL. Only — deserved or not, MySQL has a ton of momentum for such a small company, incuding an attractive product plan partially fueled by SAP.

Appliance vendor DATallegro makes a plausibiity argument that Ingres can be adapted for nontrivial data warehouse uses as well. But while that’s cool, and might even become persuasive once DATallegro has some happy, disclosed customers, it’s not the same as saying you want to put a big data warehouse into off-the-shelf Ingres.

So basically, I’m afraid that Ingres is going to appeal mainly to users who either already are making major use of it, or else have a huge problem with paying the license fees demanded by other vendors. I wish them well, and hope they kindle a spark somehow; but right now I don’t see where it would be coming from.

November 14, 2005

Defining and surveying “Memory-centric data management”

I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.

I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:

So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.

November 13, 2005

Breaking through the disk speed barrier

Most aspects of computer performance and capacity grow at Moore’s Law kinds of speeds. Doubling times may be anywhere from 9 months to 2 years, but in any case speeds and storage capacities grow exponentially quickly. Not so, however, with disk rotation speeds. The very first disk drives, over 50 years ago, rotated 1,200 times per minute. Today’s top disk rotation speed is around 15,000 RPM. Indeed, while I recall seeing a reference to one at 15,600 RPM, I can’t now go back and find it. Yes, folks; disk rotational speed in the entire history of computing has increased just by a measly factor of 13.

Why does this matter to DBMS design? Simply put, disk rotation speed is an absolute limit to the speed of random disk-based data access. Today’s fastest disks take 4 milliseconds to rotate once. Thus, multiple heads aside, getting something from a known but random location on the disk will take at least 2 milliseconds. And a naive data management algorithm will, for a single query, result in dozens or even hundreds of random accesses.

Thus, for a DBMS to run at acceptable speed, it needs to get data off disk not randomly, but rather a page at a time (i.e., in large blocks of predetermined size) or better yet sequentially (i.e., in continuous streams of indeterminate size). The indexes needed to assure these goals had best be sized to fit entirely in RAM. Clustering also plays an increasingly large role, so that data needed at the same time is likely to be on the same page, or at least in the same part of the disk.

Right there I’ve described some of the toughest ongoing challenges facing DBMS engineers. The big vendors all do a great job at meeting them (if they didn’t, they’d be out of business). Even so, some small companies find themselves able to beat the big guys, by some egregious cheating.

Data warehouse appliance vendors such as Netezza and especially Datallegro optimize their systems to stream data sequentially off of disk. In doing so, they go deeper into the operating systems, hardware, etc. than Oracle could ever allow itself to do. And the results seem pretty good. But I’ll write about that another time. Instead, I’m focusing right now on memory-centric data management; please see my other posts in that topic category.

November 13, 2005

Gartner on “The Death of the Database”

Gartner had a recent conference session on “The Death of the Database,” as described in David Berlind’s and Kathy Somebodyorother’s blogs. The core idea was that data in the future might be stored closest to where it would need to be used, which might not be in a traditional DBMS.

Before getting to the real meat of that, let me push back at some of the extremist boobirds. First, I doubt the analysts really talked about “the intersection of a row and a tuple”; it’s much more likely that that is a misquote due to reporting error. Second, their claim that BI will switch from being an “application” to a “service” is not at all unreasonable. BI should never have been viewed as an application; it’s much more a collection of application-enabling technologies. And the analysts explicitly said that DBMS will continue to be useful for analytics. As for their claim that some data needs to be only briefly persistent — they’re absolutely right, but let me defer that point to a separate post on memory-centric OLTP.

All that said — while a lot of their points ring true, it sounds as if they overstated their case in one important area. They’re making it sound as if some of today’s OLTP databases will no longer be needed, and as if tomorrow’s new kinds of OLTP data won’t need to be at least partly persisted to conventional DBMS. Wrong and wrong. Every important transaction needs to wind up in a DBMS. Those DBMS may not be as centralized as previously thought. The data may be copied to non-DBMS data stores (or, more likely, kept in a lightweight local DBMS and copied from there to serioius OLTP database). These DBMS may use native XML rather than traditional tabular data structures. But at the end of the day, transactional databases will continue to be needed for all the reasons they’ve been necessary in the past.

November 12, 2005

TransRelational(TM) — The final debunking

In prior posts, I’ve mentioned the essential dishonesty behind the hoohah around Transrelational(TM) technology from Required Technologies, Inc., and Chris Date’s highly regrettable promotion of same. Now I’ve been able to get more detail from another former executive of the company. Unsurprisingly, it corroborates what I wrote before, and utterly contradicts some of the myths spread by Date and his acolytes. This executive, while requesting that his name be withheld because of the acrimony between the CEO and just about every other company insider, otherwise gave me permission to report fully on what he told me. Read more

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.