Is Oracle losing its edge?
Over in the Monash Report, I posed the question: Is Oracle losing its edge in DBMS? Here are some of the data points that make me suspect it has. A number of these points also apply to the other large mainstream DBMS vendors; a number, however, do not.
- Oracle lags behind both IBM and Microsoft on native XML. I can’t recall Oracle trailing both those vendors at once on anything as major before.
- I’ve come to believe that memory-centric technology is a really powerful improvement to DBMS. Oracle’s done nothing effective inhouse in this area. Buying TimesTen was a good move, however.
- I’m growing less inclined to excuse the ongoing complexity of Oracle database administration. From the classic Progress DBMS to Solid’s truly zero-DBA system (embed it in telecom equipment without a keyboard or monitor, and it runs for years) to ANTs’ new challenge, there are a whole lot of fairly full-featured RDBMS with a whole lot less knobs and dials. Do they match Oracle’s full functionality? No. But could they handle a lot of the apps which run on Oracle today? In some cases, quite possibly. So why hasn’t Oracle itself come up with a plug-compatible Oracle Lite that truly is lite?
- High-end data warehouse scaling continues to be a problem for Oracle. Yes, the effective upper bound on Oracle database size keeps increasing. But so does the size of the largest data warehouses, and the latter number is definitely greater than the former.
- Microsoft Analysis Services is quite cool. Why hasn’t Oracle done the same thing?
- There are many seemingly major feature advantages that Oracle has failed to exploit. Robust text processing in the DBMS and Virtual Private Database (a cool security feature) are two of the biggest that come to mind. They’ve sold well to customers who were predisposed to want them, but Oracle hasn’t had the stomach to broaden the market for them. Yes, that’s more of a marketing problem than a technical one, but after a decade of squandering your technical advantages, you can find that your people lose the enthusiasm to create more of them. What else has Oracle done in years that is as functionally innovative? Nothing comes to mind.
- The whole apps-vendor-consolidation strategy is often presented along with a huge white flag regarding DBMS market share growth.
- Oracle has never, ever solved its applications product problems. Ditto its BI development problems. Both these fiascos are plausibility arguments for the thesis that the company can NOT reverse bad product philosophy choices, no matter how much market evidence slaps them in the face, and no matter how adaptable the company is in other ways.
- Oracle’s analyst relations have gotten very defensive and insular. To some extent, that happens at most big companies. But I cannot recall a previous time when IBM’s and Microsoft’s DBMS operations were both more accessible than Oracle’s, and I’ve been a DBMS analyst since the industry’s very early days. Often when a company acts this bizarrely, it does so because it has a great deal to hide.
That’s a lot of evidence, even without mentioning threats from the open sourcers and the data warehouse appliance guys. So why am I not wholly convinced yet? Well, reasons include a variety of scalability features, extensibility features that are rivaled only by IBM’s, market share dominance on Linux, and Andy Mendelsohn. That’s a pretty compelling list too. Still, the Oracle colossus is teetering a little bit, and it’s not beyond imagination that some future earthquake could bring it crashing down.
| Categories: Oracle, Oracle TimesTen | 1 Comment |
Two purely theoretical problems with TransRelational(TM)
There’s a vigorous discussion of TransRelational over on Alf Pedersen’s blog (Edit: Link died), although it’s completely polluted by some usual-suspects flame war BS.
Alf did poke through the dreck, however, to make a reasonable challenge, which can be paraphrased as:
OK. Suppose you’re right that no implementation has ever provided evidence of TransRelational’s usefulness for building a True Relational DBMS. It’s still theoretically fascinating.
My response was as follows:
Here are two big problems with TransRelational that are perfectly theoretical.
First, it assumes that values can be concisely stated, presumably as numbers or character strings. That isn’t a good match to complex datatypes such as, say, documents that should be full-text indexed.
Second, it assumes that there’s a natural sort order. That could be a bit of a problem even for, say, geospatial. One would think there’s a workaround in the geospatial case, e.g. like Oracle’s old hhencode. But hhencode was a fiasco, I think because it didn’t actually measure proximity very effectively.
Admittedly, both of my objections also apply to good old b-trees. Still, they speak against the potential of a TransRelational implementation to achieve the kind of generality I think modern applications do and will increasingly demand.
Basically, I think a “True Relational” DBMS that was only useful for columns with natural sort orders wouldn’t be particularly interesting. And “The Third Manifesto” notwithstanding, that’s the only kind anybody seems to have even hinted at trying to bring to market.
| Categories: Data types, GIS and geospatial, Theory and architecture, TransRelational | 1 Comment |
Native XML Storage, Part 2 (apps)
The introduction and technical-implementation part of this discussion was in Part 1.
It seems likely that widespread adoption of native XML storage is, at best, several years off, if for no other reason than that the DML (Data Manipulation Language) situation is still rather primitive. But looking beyond that nontrivial problem, it does seem as if there are broad classes of application that might go better in native XML. Here’s a survey.
First of all, there’s what might be called custom document composition – technical publishing, customized technical manuals, etc. If you make complex products, or sell information, this is obviously an important specialty application for you. Otherwise, it probably is rather peripheral, at least for now. If you do have an interest in this area, by the way, you shouldn’t only look at the big guys’ XML offerings; you should even talk to specialists like Mark Logic. (Mark Logic sells an XML-only DBMS with a strong text-search orientation.)
Second, there are complex documents with low update rates. Medical records are a prime example – and, by the way, may of those are stored in InterSystems’ OODBMS Cache rather than in a relational system. Other examples might include insurance claims, media assets, etc. – basically, the areas that have been thought of as the purview of document management systems. In many cases, these apps ain’t broke and shouldn’t be fixed, such as when they exist mainly to satisfy slow-changing regulatory requirements. Besides, it’s not obvious that native XML is particularly useful for these apps anyway. Often, the information is in a DBMS for three main reasons: General manageability (e.g., backup), ad-hoc searchability, and management of metadata. If the metadata is simple enough to fit comfortably into a tabular structure, extended-relational DBMS may be satisfactory as underpinnings for these apps indefinitely.
Third, and here’s where it really begins to get interesting, is complex transactional documents. One of the flagship apps in Viper’s alpha test was financial derivatives trading, with complex, number-laden, term-laden contracts being processed very quickly, and it’s easy to envision that kind of functionality spreading across the trading sector. Governments – wisely or not – may want to require new complex forms to be filled out, or to make older ones easier to process. (E.g., tax returns, or applications for various kinds of permits.) If privacy concerns allow, medical information might be collected and processed centrally by governments or large insurance providers. Complex service-level agreements could be negotiated for a broad variety of product and service categories. Customers might demand radically faster processing of insurance claims than has historically been necessary. Indeed, it’s hard to think of an industry sector where complex transactional documents might not gain a foothold. And if you’re looking for high performance access to portions of documents, native XML may well be the best storage choice.
Finally, there’s a fourth category, which I’ll give the trendy-looking name Profiles 2.0, in imitation of Web 2.0, Identity 2.0, and so on. Here’s what I mean by it. A number of the hottest buzzconcepts in computing focus on collecting, organizing, and using information about individual people – presence, identity, personalization/customization, narrowcasting/market-of-one, data mining/predictive analytics, weblog analysis, social software, and so on. Put all those together, and you have a humongous hairball of a user profile that no current systems come close to handling properly.
Let’s think about some characteristics of this data. Some of it is transient. Some of it is unreliable. Some of it indeed is guesswork – albeit educated guesswork – rather than fact (e.g., the results of data mining analyses). Much of it exists for some profilees but not others. Much of it is naturally tree- or graph-shaped (e.g., information about website traversal, product category interests, relationship networks, role-based authorizations, etc.) There are many kinds of it; pulling it all together relationally can lead to Joins From Hell.
And this isn’t just for individuals; similar kinds of stories can be told for information about organizations, battleships, and so on. Those are objects with rich internal structures. True, those can usually be modeled hierarchically – but at each node, some of the complications mentioned in the prior paragraph occur. Profiling an enterprise is even messier than profiling a single individual who shops or works there.
Applications using this kind of information are typically extremely primitive, even though the beginnings of the personalization hype are now 7-8 years in the past. I don’t think we’re going to get these systems kind right until we take a true, holistic view of individuals and their profiles – and until we learn how to think about apps whose fundamental objects keep changing in shape. But as hard as the problem is, it has to be worked on immediately, because what I’m talking about here are some of the major classes of competitive-advantage app.
So Profiles 2.0 isn’t something we can just ignore. And when we do pay attention to it, I don’t think we’ll find that it looks very natural dressed in rows and columns.
| Categories: Intersystems and Cache', MarkLogic, Structured documents | 2 Comments |
Native XML storage, Part 1 (technology)
IBM’s “Viper” version of DB2 is in open beta test, whatever that means, and Microsoft’s SQL Server 2005, nee Yukon, is in general release. Both have native XML capabilities surpassing Oracle’s – which is interesting in its own right, because it’s rare for either of those vendors to pull ahead of Oracle in an OLTP feature, and almost unprecedented for both to do so at once.
So let’s talk about native XML support, what it is, and who might or should care about it. (Well, the apps part is actually in a separate Part 2 post.) Most of this is based on research that’s several months old, but except for a scarcity of actual user interviews, that shouldn’t matter much.
There are two main non-native ways to put XML into a SQL database such as Oracle – shredding and LOBs (BLOBs or CLOBs – i.e., Binary or Character Large OBjects). Both can perform poorly, for different reasons. Shredding takes XML documents and distributes them among a bunch of tables. So one update in XML can become many updates when shredded, and one lookup in XML can become a complex join from shredded storage. LOB storage obviates those problems, but creates another – even when you’re only looking for part of a document, you have to retrieve and handle the whole thing, and the same goes for updates.
So native storage can be a good thing when you can afford neither the performance hit of shredding, nor of LOB storage, nor of any available hybrid. It also could be good if getting good performance from non-native storage, while possible, would create undue burdens on application development, or if there’s some other reason one or both of the shredding and LOB approaches isn’t viable.
One nice feature is that native-XML storage has almost no downside, at least if you get it from the high-end DBMS vendors. IBM, Oracle, and Microsoft have all worked out ways to have integrated query parsing and query optimization, while letting storage be more or less separate. More precisely, Oracle actually still sticks everything into one data store (hence the lack of native XML support), but allows near-infinite flexibility in how it is accessed. Microsoft has already had separate servers for tabular data, text, and MOLAP, although like Sybase, it doesn’t have general datatype extensibility that it can expose to customers, or exploit itself to provide a great variety of datatypes. IBM has had Oracle-like extensibility all along, although it hasn’t been quite as aggressive at exploiting it; now it’s introduced a separate-server option for XML. Both Microsoft and IBM claim that their administrative tools are slick enough that the DBA has little extra work from their offerings than would be present in a true single-server solution.
So how does the storage actually work? The basic idea is exactly what you’d think. Data is stored in name-value pairs, with pointers connecting parents to children. The secret sauce (and here I have less detail than I’d like) is the extra information that’s stored, either at the nodes directly, or in an overarching index. Obviously, there’s a tradeoff between update and retrieval speed. And equally obviously, I need to learn more of the particulars.
And on that somewhat lame note, let me point you at Part 2 of this post, which discusses whether and how this stuff will actually be used. (Preview: It will, big time – I think.)
| Categories: IBM and DB2, Microsoft and SQL*Server, Oracle, Structured documents | 9 Comments |
So how robust is Ingres?
CA is spinning off Ingres, more or less, to an investment fund led by Terry Garnett, who will also be interim of CEO. Now, I’ve given Terry a lot of grief over the decades. It started by accident, when I bashed his presentation of Lightyear at a 1984 party in Rosann Stach’s house (where we also used Jerry Kaplan as a subject for the Mindprober psychological analysis product — those were the days of goofy software!). Years later, I didn’t even recall that had been Terry until I was reminded. But in the early 1990s, when Terry and Jerry Baker were dueling at Oracle, I was firmly in the Jerry Baker camp, and believe I was right to this day. Still — be all that as it may, Terry knows DBMS and knows promotion, and if the company falls flat it won’t be because he screwed it up. He’s no dunce, and he’s been around DBMS a loooong time.
But how stands the product? Let’s flash back a decade, to when CA bought it. Ingres was a solid general-purpose RDBMS. But it was beginning to fall behind the technology power curve, especially on the data warehousing side. (For more detail, see my Ingres history post over in the Software Memories blog.) And then product development slowed to a crawl. Tony Gaughan, who ran the product for CA before the latest move, claims that they’ve actually done a good job on advancing the product on the OLTP side, perhaps to the point of comparability with Oracle9i, and certainly ahead of MySQL 5.0. I’m inclined to believe him, after applying some reasonable discount factor for expected puffery, in part because this wasn’t a high hurdle to cross. Over the past decade, the main action in high-end DBMS product enhancement has been in data warehousing and nontabular datatypes, not in OLTP.
Where Ingres definitely seems to lag is in data warehousing. E.g., there are no materialized views, and I bet that even if they have some of the index types such as bitmaps, star schemas, etc., the implementation, optimizer support, administrative support, and so on lag far behind that of Oracle and IBM. So again, the proper comparison for Ingres isn’t Oracle and IBM; it’s fellow open source vendor MySQL. Only — deserved or not, MySQL has a ton of momentum for such a small company, incuding an attractive product plan partially fueled by SAP.
Appliance vendor DATallegro makes a plausibiity argument that Ingres can be adapted for nontrivial data warehouse uses as well. But while that’s cool, and might even become persuasive once DATallegro has some happy, disclosed customers, it’s not the same as saying you want to put a big data warehouse into off-the-shelf Ingres.
So basically, I’m afraid that Ingres is going to appeal mainly to users who either already are making major use of it, or else have a huge problem with paying the license fees demanded by other vendors. I wish them well, and hope they kindle a spark somehow; but right now I don’t see where it would be coming from.
| Categories: Actian and Ingres, Data warehouse appliances, MySQL, Open source | 2 Comments |
Defining and surveying “Memory-centric data management”
I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.
I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:
- TimesTen (now owned by Oracle): TimesTen is the quintessentional “in-memory DBMS.” It’s a fairly full relational DBMS, but if you want to persist memory to disk it has to be handed off to a conventional DBMS. Historically, that has usually been MySQL or Oracle. TimesTen’s biggest market penetration has been in financial trading.
- Solid Information Technology‘s BoostEngine: Solid is a Finnish company (or was — it’s pretty American now) specializing in embedded DBMS sold mainly for telecommunication uses. Big OEM customers include several well-known telecom equipment manufacturers and HP (for OpenView). “Embedded” often means no DBA, no monitor, no keyboard — they box manufacturer installs it and there it stays for the life of the product. Solid has to offer strong replication capabilities, since its products are often used in highly distributed (e.g., multiblade, multibox) environments. So it’s taken the next step and exploited the replication by allowing customers to use some instances of the product disklessly.
- Event-stream products from Streambase and Progress: The canonical application for event-stream products is automating financial trading decisions based on the flow of market information. Mike Stonebraker, the brains behind Streambase, has recently popularized the idea; Progress bought Apama, who actually have been in the business longer. These applications require even more speed than the financial trading apps that TimesTen handles, and they discard most of the information they look at. In-memory is the only way to go.
- Progress’s ObjectStore: ObjectStore comes from the company Object Design, which merged into Excelon, which was acquired by Progress. It’s really a toolkit for building DBMS and similar systems, which is why it’s at various times been marketed as an OODBMS and an XML DBMS, without a lot of success either way. But there have been a few sterling apps built in ObjectStore even so, including a key part of the Amazon bookstore Despite this limited market success, a significant fraction of Progress’s best engineering talent has moved over to the Real-Time Division to focus on ObjectStore and other memory-centric products. The memory-centric aspect of ObjectStore is this: ObjectStore’s big virtue is that it gets objects from disk to memory and vice-versa very efficiently, then distributes and caches them around a network as needed. This was originally invented for client/server processing, but works fine in a multi-server thin client setup as well. And object processing, of course, relies on a whole lot of pointers. And pointer-chasing is pretty much the worst way to deal with the disk speed barrier, unless you do it in main memory.
- Applix‘s TM1: Like many companies in the analytics area, Applix has had trouble deciding whether it sells applications, BI system software, or both. But in any case its core technology is TM1, a memory-centric MOLAP offering. Traditional MOLAP products reside on the horns of a nasty dilemma: They rely on precalculation to give good performance, but that causes ghastly database explosion. Applix gets out of this problem by doing no precalculation whatsoever, loading the data into main memory, and executing all queries on the fly.
- SAP’s BI Accelerator: SAP is building out an elaborate technology stack with NetWeaver, especially in the BI area. One important aspect is that the full data warehouse is logically broken (or copied) into a series of data marts called “InfoCubes.” BI Accelerator takes the logical next step, loading an entire InfoCube into main memory. Almost every query is executed via a full table scan, which would be insane on disk but makes perfect sense when the data is already in RAM.
So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.
Breaking through the disk speed barrier
Most aspects of computer performance and capacity grow at Moore’s Law kinds of speeds. Doubling times may be anywhere from 9 months to 2 years, but in any case speeds and storage capacities grow exponentially quickly. Not so, however, with disk rotation speeds. The very first disk drives, over 50 years ago, rotated 1,200 times per minute. Today’s top disk rotation speed is around 15,000 RPM. Indeed, while I recall seeing a reference to one at 15,600 RPM, I can’t now go back and find it. Yes, folks; disk rotational speed in the entire history of computing has increased just by a measly factor of 13.
Why does this matter to DBMS design? Simply put, disk rotation speed is an absolute limit to the speed of random disk-based data access. Today’s fastest disks take 4 milliseconds to rotate once. Thus, multiple heads aside, getting something from a known but random location on the disk will take at least 2 milliseconds. And a naive data management algorithm will, for a single query, result in dozens or even hundreds of random accesses.
Thus, for a DBMS to run at acceptable speed, it needs to get data off disk not randomly, but rather a page at a time (i.e., in large blocks of predetermined size) or better yet sequentially (i.e., in continuous streams of indeterminate size). The indexes needed to assure these goals had best be sized to fit entirely in RAM. Clustering also plays an increasingly large role, so that data needed at the same time is likely to be on the same page, or at least in the same part of the disk.
Right there I’ve described some of the toughest ongoing challenges facing DBMS engineers. The big vendors all do a great job at meeting them (if they didn’t, they’d be out of business). Even so, some small companies find themselves able to beat the big guys, by some egregious cheating.
Data warehouse appliance vendors such as Netezza and especially Datallegro optimize their systems to stream data sequentially off of disk. In doing so, they go deeper into the operating systems, hardware, etc. than Oracle could ever allow itself to do. And the results seem pretty good. But I’ll write about that another time. Instead, I’m focusing right now on memory-centric data management; please see my other posts in that topic category.
Gartner on “The Death of the Database”
Gartner had a recent conference session on “The Death of the Database,” as described in David Berlind’s and Kathy Somebodyorother’s blogs. The core idea was that data in the future might be stored closest to where it would need to be used, which might not be in a traditional DBMS.
Before getting to the real meat of that, let me push back at some of the extremist boobirds. First, I doubt the analysts really talked about “the intersection of a row and a tuple”; it’s much more likely that that is a misquote due to reporting error. Second, their claim that BI will switch from being an “application” to a “service” is not at all unreasonable. BI should never have been viewed as an application; it’s much more a collection of application-enabling technologies. And the analysts explicitly said that DBMS will continue to be useful for analytics. As for their claim that some data needs to be only briefly persistent — they’re absolutely right, but let me defer that point to a separate post on memory-centric OLTP.
All that said — while a lot of their points ring true, it sounds as if they overstated their case in one important area. They’re making it sound as if some of today’s OLTP databases will no longer be needed, and as if tomorrow’s new kinds of OLTP data won’t need to be at least partly persisted to conventional DBMS. Wrong and wrong. Every important transaction needs to wind up in a DBMS. Those DBMS may not be as centralized as previously thought. The data may be copied to non-DBMS data stores (or, more likely, kept in a lightweight local DBMS and copied from there to serioius OLTP database). These DBMS may use native XML rather than traditional tabular data structures. But at the end of the day, transactional databases will continue to be needed for all the reasons they’ve been necessary in the past.
| Categories: Business intelligence, Database diversity, Structured documents, Theory and architecture | Leave a Comment |
TransRelational(TM) — The final debunking
In prior posts, I’ve mentioned the essential dishonesty behind the hoohah around Transrelational(TM) technology from Required Technologies, Inc., and Chris Date’s highly regrettable promotion of same. Now I’ve been able to get more detail from another former executive of the company. Unsurprisingly, it corroborates what I wrote before, and utterly contradicts some of the myths spread by Date and his acolytes. This executive, while requesting that his name be withheld because of the acrimony between the CEO and just about every other company insider, otherwise gave me permission to report fully on what he told me. Read more
| Categories: Memory-centric data management, TransRelational | 5 Comments |
