MOLAP

Analysis of MOLAP (Multidimensional OnLine Analytic Processing) products and vendors. Related subjects include:

September 24, 2006

Data warehouse and mart uses – a tentative taxonomy

I’ve been posting a lot recently about the diverse database technologies used to support data warehousing. With the marketplace supporting such a broad range of architectures, it seems clear that a lot of those architectures actually deserve to thrive, presumable each in a different kind of usage scenario. So in this post I’ll take a pass at dividing up use cases for data warehouses, and suggesting which kinds of data warehouse management technologies might do the best job of supporting them. To start with, I’ve divided things into a number of buckets:

Read more

May 10, 2006

White paper on memory-centric data management — excerpt

Here’s an excerpt from the introduction to my new white paper on memory-centric data management. I don’t know why WordPress insists on showing the table gridlines, but I won’t try to fix that now. Anyhow, if you’re interested enough to read most of this excerpt, I strongly suggest downloading the full paper.

Introduction

Conventional DBMS don’t always perform adequately.

Ideally, IT managers would never need to think about the details of data management technology. Market-leading, general-purpose DBMS (DataBase Management Systems) would do a great job of meeting all information management needs. But we don’t live in an ideal world. Even after decades of great technical advances, conventional DBMS still can’t give your users all the information they need, when and where they need it, at acceptable cost. As a result, specialty data management products continue to be needed, filling the gaps where more general DBMS don’t do an adequate job.

Memory-centric technology is a powerful alternative.

One category on the upswing is memory-centric data management technology. While conventional DBMS are designed to get data on and off disk quickly, memory-centric products (which may or may not be full DBMS) assume all the data is in RAM in the first place. The implications of this design choice can be profound. RAM access speeds are up to 1,000,000 times faster than random reads on disk. Consequently, whole new classes of data access methods can be used when the disk speed bottleneck is ignored. Sequential access is much faster in RAM, too, allowing yet another group of efficient data access approaches to be implemented.

It does things disk-based systems can’t.

If you want to query a used-book database a million times a minute, that’s hard to do in a standard relational DBMS. But Progress’ ObjectStore gets it done for Amazon. If you want to recalculate a set of OLAP (OnLine Analytic Processing) cubes in real-time, don’t look to a disk-based system of any kind. But Applix’s TM1 can do just that. And if you want to stick DBMS instances on 99 nodes of a telecom network, all persisting data to a 100th node, a disk-centric system isn’t your best choice – but Solid’s BoostEngine should get the job done.

Memory-centric data managers fill the gap, in various guises.

Those products are some leading examples of a diverse group of specialist memory-centric data management products. Such products can be optimized for OLAP or OLTP (OnLine Transaction Processing) or event-stream processing. They may be positioned as DBMS, quasi-DBMS, BI (Business Intelligence) features, or some utterly new kind of middleware. They may come from top-tier software vendors or from the rawest of startups. But they all share a common design philosophy: Optimize the use of ever-faster semiconductors, rather than focusing on (relatively) slow-spinning disks.

They have a rich variety of benefits.

For any technology that radically improves price/performance (or any other measure of IT efficiency), the benefits can be found in three main categories:

  • Doing the same things you did before, only more cheaply;
  • Doing the same things you did before, only better and/or faster;
  • Doing things that weren’t technically or economically feasible before at all.

For memory-centric data management, the “things that you couldn’t do before at all” are concentrated in areas that are highly real-time or that use non-relational data structures. Conversely, for many relational and/or OLTP apps, memory-centric technology is essentially a much cheaper/better/faster way of doing what you were already struggling through all along.

Memory-centric technology has many applications.

Through both OEM and direct purchases, many enterprises have already adopted memory-centric technology. For example:

  • Financial services vendors use memory-centric data management throughout their trading systems.
  • Telecom service vendors use memory-centric data management in multiple provisioning, billing, and routing applications.
  • Memory-centric data management is used to accelerate web transactions, including in what may be the most demanding OLTP app of all — Amazon.com’s online bookstore.
  • Memory-centric data management technology is OEMed in a variety of major enterprise network management products, including HP Openview.
  • Memory-centric data management is used to accelerate analytics across a broad variety of industries, especially in such areas as planning, scenarios, customer analytics, and profitability analysis.

May 8, 2006

Memory-centric data management whitepaper

I have finally finished and uploaded the long-awaited white paper on memory-centric data management.

This is the project for which I origially coined the term “memory-centric data management,” after realizing that the prevalent “in-memory DBMS” creates all sorts of confusion about how and whether data persists on disk. The white paper clarifies and updates points I have been making about memory-centric data management since last summer. Sponsors included:

If there’s one area in my research I’m not 100% satisfied with, it may be the question of where the true hardware bottlenecks to memory-centric data management lie (it’s obvious that the bottleneck to disk-centric data management is random disk access). Is it processor interconnect (around 1 GB/sec)? Is it processor-to-cache connections (around 5 GB/sec)? My prior pronouncements, the main body of the white paper, and the Intel Q&A appendix to the white paper may actually have slightly different spins on these points.

And by the way — the current hard limit on RAM/board isn’t 2^64 bytes, but a “mere” 2^40. But don’t worry; it will be up to 2^48 long before anybody actually puts 256 gigabytes under the control of a single processor.

January 27, 2006

Why I use the word “MOLAP”

“MOLAP” stands for “Multidimensional OLAP.” It’s almost exactly what Ted Codd was referring to in the white paper where he introduced the term “OLAP.” Relational advocates correctly point out that relational tables are NOT “two-dimensional;” rather, every column in a table represents a dimension.

(If that’s not obvious, think of rows in a table as n-tuples, and n-tuples as akin to vectors. Then think back to the linear algebra segment at the beginning of your Calculus of Several Variables class. Vector spaces? Dimensions? I rest my case.)

Despite all that, I’m comfortable with the “M” in MOLAP, because a dimension in a MOLAP hypercube is a lot more complex than a dimension in a relational table. The latter is itself — well, if there’s a sort order, it’s typically one dimensional. But the analog in a MOLAP cube can be a whole rich and complex hierarchy.

So yes — MOLAP is inherently more multidimensional than ROLAP, atlhough one can of course do something equivalent to a single hypercube by creating a whole lot of different tables.

January 27, 2006

Detailed webinar on memory-centric technology

I did a webinar on memory-centric data management for Applix. It was the standard hour in length, but they had me do the vast majority of the talking, so I laid out my ideas in some detail.

In line with their business focus, I emphasized OLAP in general and MOLAP in particular. But I did have a chance to lay out pretty much the whole story.

There’s a lot of material in it I haven’t published yet in written form, and some nuances I may never get around to writing down. So if you’re sufficiently interested in the area, I recommend watching the webinar.

January 13, 2006

Memory-centric research — hear the latest!

What I’ve written so far in this blog (and in Computerworld) about memory-centric data management technology is just the tip of the iceberg. A detailed white paper is forthcoming, sponsored by most of the industry leaders: Applix, Progress, SAP, Intel (in association with SAP), and Solid. (But for some odd reason Oracle declined to participate …)

A lot of the material will be rolled out publically for the first time in a webinar on Wednesday, January 25, at 11 EST. Applix is the host. To participate, please follow this link.

I’m also holding forth online, in webinars and even video, on other subjects these days. More details may be found over in the Monash Report.

December 9, 2005

SAP’s version of DBMS2

I just spent a couple of days at SAP’s analyst meeting, and realized something I’d somewhat forgotten – much of the DBMS2 concept was inspired by SAP’s technical strategy. That’s not to say that SAP’s techies necessarily agree with me on every last point. But I do think it is interesting to review SAP’s version of DBMS2, to the extent I understand it.

1. SAP’s Enterprise Services Architecture (ESA) is meant to be, among other things, an abstraction layer over relational DBMS. The mantra is that they’re moving to a “message-based architecture” as opposed to a “database architecture.” These messages are in the context of a standards-based SOA, with a strong commitment to remaining open and standards-based, at least on the data and messaging levels. (The main limitation on openness that I’ve detected is that they don’t think much of standards such as BPEL in the business process definition area, which aren’t powerful enough for them.)

2. One big benefit they see to this strategy is that it reduces the need to have grand integrated databases. If one application manages data for an entity that is also important to another application, the two applications can exchange messages about the entity. Anyhow, many of their comments make it clear that, between partner company databases (a bit of a future) and legacy app databases (a very big factor in the present day), SAP is constantly aware of situations in which a single integrated database in infeasible.

3. SAP is still deeply suspicious of redundant transactional data. They feel that with redundant data you can’t have a really clean model – unless, of course, you code up really rigorous synchronization. However, if for some reason synchronization is preferred – e.g., for performance reasons — it can be hidden from users and most developers.

4. One area where SAP definitely favors redundancy and synchronization is data warehousing. Indeed, they have an ever more elaborate staging system to move data from operational to analytic systems.

5. In general, they are far from being relational purists. For example, Shai Agassi referred to doing things that you can’t do in a pure relational approach. And Peter Zencke reminded me that this attitude is nothing new. SAP has long had complex business objects, and even done some of its own memory management to make them performant, when they were structured in a manner that RDBMS weren’t well suited for. (I presume he was referring largely to BAPI.)

6. That said, they’re of course using relational data stores today for most things. One exception is text/content, which they prefer to store in their own text indexing/management system TREX. Another example is their historical support for MOLAP, although they seem to be edging as far away from that as they can without offending the MOLAP-loving part of their customer base.

Incidentally, the whole TREX strategy is subject to considerable doubt too. It’s not a state-of-the-art product, and they currently don’t plan to make it into one. In particular, they have a prejudice against semi-automated ontology creation, and that has clearly become a requirement for top-tier text technologies.

7. One thing that Peter said which confused me a bit is when we were talking about nonrelational data retrieval. The example he used was retrieving information on all of a specific sales reps’ customers, or perhaps on several sales reps’ customers. I got the feeling he was talking about the ability to text search on multiple columns and/or multiple tables/objects/whatever at once, but I can’t honestly claim that I connected all the dots.

And of course, the memory-centric ROLAP tool BI Accelerator — technology that’s based on TREX — is just another example of how SAP is willing to go beyond passively connecting to a single RDBMS. And while their sponsorship of MaxDB isn’t really an example of that, it is another example of how SAP’s strategy is not one to gladden the hearts of the top-tier DBMS vendors.

November 14, 2005

Defining and surveying “Memory-centric data management”

I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.

I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:

So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.

August 13, 2005

The end of the single-server DBMS vendor

For all practical purposes, there are no DBMS vendors left advocating single-server strategies. Oracle was the last one, but it just acquired in-memory data management vendor TimesTen, which will be used as a cache in front of high-performance Oracle databases. (It will also continue to be sold for stand-alone uses, especially in the financial trading and defense/intelligence markets.)

IBM’s Viper is a server-and-a-half story, with lots of integration over a dual-server (one relational, one native XML) base. IBM also is moving aggressively in data integration/federation, with Ascential and many other acquisitions. It also sells a broad range of database products itself, including two DB2s, several Informix products, and so on.

Microsoft also has a multi-server strategy. In its case, relational, text, and MOLAP storage are more separate than in Oracle’s or even IBM’s products; again, there’s a thick layer of technology on top integrating them. An eventual move to native XML storage will, one must imagine, be handled in the same way.

Smaller vendors Sybase and Progress also offer multiple DBMS each.

Teradata is a pretty big player with only one DBMS — but it’s specialized for data warehousing. Teradata is the first to tell you you should use something else for your classical transaction processing.

The Grand Unified Integrated Database theory is, so far as I can tell, quite dead. Some people just refuse to admit that fact.

August 8, 2005

Down with database consolidation!

As with all changes in information technology, the move to DBMS2 will largely be one of evolution. But it does have a couple of revolutionary aspects.

Short-term, the biggest change is a renunciation of database and DBMS vendor consolidation. Consolidation never has worked, it never will work, and as data integration technologies keep improving it’s not that important anyway.

IBM and Oracle offer really great, brilliantly complex data warehousing technology. But if you want the most bang for the buck, forget about them, and go instead with a specialty vendor. Depending on the specifics of your situation, Teradata, Netezza, Datallego, WhiteCross, or SAP may offer the best choice, and that list could be even longer.

Similarly, for generic OLTP data management, cheap and/or open source options are getting ever more attractive. Microsoft is a serious contender for applications that previously only Oracle and IBM could handle, while MySQL and maybe Ingres are moving up the food chain right behind.

In many cases, these alternative technologies are lower-cost across the board: Lower purchase price, lower ongoing maintenance fees, and lower administrative costs.

So what, again, is the case for consolidation?

← Previous Page

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.