September 6th, 2007 Curt Monash
If I weren’t on a snorkeling vacation,* this might be a good time to write about why I once called Cognos “The Gang That Couldn’t Shoot Straight,” how Ron Zambonini used that label to help him gain the company’s top spot, why he’s such a big fan of mine, why I got my highest ever per-minute speaking fee to attend a Cognos sales kickoff event, why I went for a midnight touristing stroll in downtown Ottawa in zero degree Fahrenheit weather, or how I managed, while attending the aforementioned Cognos sales kickoff, to get snowed in for three days in, of all places, Dallas, Texas. But the wrasses and jacks await, so I’ll get straight to the point.
*Albeit fairly snorkel-free so far, thanks to Hurricane Felix.
As I discussed at considerable length in a white paper, Applix’s core technology is fully-featured, memory-centric MOLAP. This is certainly cool technology, and I think it is actually unique. That it’s historically been positioned as the engine for a mid-range set of performance management tools is a travesty, a shame, the result of a prior merger – and also the quite understandable consequence of RAM limitations. However, RAM is ever cheaper and Applix’s technology is now 64-bit, so the RAM barriers have been relaxed. Cognos can take Applix’s TM1 engine high-end if it wants to. And boy, should Cognos ever want to. Indeed, there are three different great ways Cognos could package and position TM1:
- As a no-data-warehouse-design quick-start analytics engine analogous to QlikView (the fastest-growing and most important newish BI suite, open source perhaps excepted);
- As the most sophisticated and versatile planning tool this side of SAP’s APO (and while APO’s sophistication is not in dispute, its versatility is questionable anyway);
-
As the processing hub for dashboards-done-right.
Read the rest of this entry »
Posted in Analytics and analytic technologies, Business intelligence, Cognos and Applix TM1, MOLAP, Memory-centric data management | 4 Comments »
March 1st, 2007 Curt Monash
Oracle is evidently buying Hyperion Software. Much like Gaul, Hyperion can be divided into three parts:
- Budgeting and consolidation applications, descended from the original Hyperion and Pillar.
- Essbase, the definitive MOLAP engine, descended from Arbor Software.
- A business intelligence suite, descended from Brio.
The most important part is budgeting/planning, because it could help Oracle change the rules for application software. But Essbase could be just the nudge Oracle needs to finally renounce its one-server-fits-all dogma.
Read the rest of this entry »
Posted in Data warehousing, Hierarchies, networks, graphs, and trees, MOLAP, Microsoft and SQL*Server, Oracle | 2 Comments »
October 4th, 2006 Curt Monash
SAS has its own data store, called SAS Intelligence Storage. It’s a relational system running on SMP boxes, whose unique feature is that it has fixed-length records and hence is a perfect array, for speedy lookup. This is highly analogous to classical MOLAP systems. However, SAS reports that customers store up to several hundred terabytes of data in SAS Intelligence Storage, which is definitely not very analogous to what goes on in the MOLAP world.
It sounds as if the product is optimized for data mining and generic OLAP alike. Indeed, SAS Intelligence Storage is used to power both SAS’s data mining and other advanced analytics, and also its more conventional BI suite.
Posted in Data warehousing, MOLAP, Relational database management systems, SAS Institute | 1 Comment »
September 24th, 2006 Curt Monash
I’ve been posting a lot recently about the diverse database technologies used to support data warehousing. With the marketplace supporting such a broad range of architectures, it seems clear that a lot of those architectures actually deserve to thrive, presumable each in a different kind of usage scenario. So in this post I’ll take a pass at dividing up use cases for data warehouses, and suggesting which kinds of data warehouse management technologies might do the best job of supporting them. To start with, I’ve divided things into a number of buckets:
- Pinpoint data lookup
- Constrained query and reporting
- Cube-filling calculations
- Hardcore tabular data crunching
- Text and media search
- Specialty areas, such as relationship analytics
Read the rest of this entry »
Posted in DATAllegro, Data warehouse appliances, Data warehousing, IBM and DB2, MOLAP, Netezza, Relational database management systems, Teradata | 1 Comment »
May 10th, 2006 Curt Monash
Here’s an excerpt from the introduction to my new white paper on memory-centric data management. I don’t know why Wordpress insists on showing the table gridlines, but I won’t try to fix that now. Anyhow, if you’re interested enough to read most of this excerpt, I strongly suggest downloading the full paper.
|
|
Introduction
|
|
Conventional DBMS don’t always perform adequately.
|
Ideally, IT managers would never need to think about the details of data management technology. Market-leading, general-purpose DBMS (DataBase Management Systems) would do a great job of meeting all information management needs. But we don’t live in an ideal world. Even after decades of great technical advances, conventional DBMS still can’t give your users all the information they need, when and where they need it, at acceptable cost. As a result, specialty data management products continue to be needed, filling the gaps where more general DBMS don’t do an adequate job.
|
|
Memory-centric technology is a powerful alternative.
|
One category on the upswing is memory-centric data management technology. While conventional DBMS are designed to get data on and off disk quickly, memory-centric products (which may or may not be full DBMS) assume all the data is in RAM in the first place. The implications of this design choice can be profound. RAM access speeds are up to 1,000,000 times faster than random reads on disk. Consequently, whole new classes of data access methods can be used when the disk speed bottleneck is ignored. Sequential access is much faster in RAM, too, allowing yet another group of efficient data access approaches to be implemented.
|
|
It does things disk-based systems can’t.
|
If you want to query a used-book database a million times a minute, that’s hard to do in a standard relational DBMS. But Progress’ ObjectStore gets it done for Amazon. If you want to recalculate a set of OLAP (OnLine Analytic Processing) cubes in real-time, don’t look to a disk-based system of any kind. But Applix’s TM1 can do just that. And if you want to stick DBMS instances on 99 nodes of a telecom network, all persisting data to a 100th node, a disk-centric system isn’t your best choice – but Solid’s BoostEngine should get the job done.
|
|
Memory-centric data managers fill the gap, in various guises.
|
Those products are some leading examples of a diverse group of specialist memory-centric data management products. Such products can be optimized for OLAP or OLTP (OnLine Transaction Processing) or event-stream processing. They may be positioned as DBMS, quasi-DBMS, BI (Business Intelligence) features, or some utterly new kind of middleware. They may come from top-tier software vendors or from the rawest of startups. But they all share a common design philosophy: Optimize the use of ever-faster semiconductors, rather than focusing on (relatively) slow-spinning disks.
|
|
They have a rich variety of benefits.
|
For any technology that radically improves price/performance (or any other measure of IT efficiency), the benefits can be found in three main categories:
- Doing the same things you did before, only more cheaply;
- Doing the same things you did before, only better and/or faster;
- Doing things that weren’t technically or economically feasible before at all.
For memory-centric data management, the “things that you couldn’t do before at all” are concentrated in areas that are highly real-time or that use non-relational data structures. Conversely, for many relational and/or OLTP apps, memory-centric technology is essentially a much cheaper/better/faster way of doing what you were already struggling through all along.
|
|
Memory-centric technology has many applications.
|
Through both OEM and direct purchases, many enterprises have already adopted memory-centric technology. For example:
|
|
|
- Financial services vendors use memory-centric data management throughout their trading systems.
- Telecom service vendors use memory-centric data management in multiple provisioning, billing, and routing applications.
- Memory-centric data management is used to accelerate web transactions, including in what may be the most demanding OLTP app of all — Amazon.com’s online bookstore.
- Memory-centric data management technology is OEMed in a variety of major enterprise network management products, including HP Openview.
- Memory-centric data management is used to accelerate analytics across a broad variety of industries, especially in such areas as planning, scenarios, customer analytics, and profitability analysis.
|
Posted in Data types, Hierarchies, networks, graphs, and trees, MOLAP, Memory-centric data management, OLTP database management, Objects, Open source RDBMS, Progress, Apama, and DataDirect, Relational database management systems | 3 Comments »
May 8th, 2006 Curt Monash
I have finally finished and uploaded the long-awaited white paper on memory-centric data management.
This is the project for which I origially coined the term “memory-centric data management,” after realizing that the prevalent “in-memory DBMS” creates all sorts of confusion about how and whether data persists on disk. The white paper clarifies and updates points I have been making about memory-centric data management since last summer. Sponsors included:
- Applix, vendors of in-memory/memory-centric MOLAP tool TM1
- Progress Software, vendors of ObjectStore, an OODBMS that has more impressive references in-memory or otherwise memory-centric than it does in classical disk-based configurations, and also of the Apama stream processing products
- SAP, vendors of the BI Accelerator functionality of SAP NetWeaver, or whatever tortured name they want to give it this month — basically, that’s a very cool in-memory columnar data mart technology
- Solid Information Technology, vendor of hybrid in-memory/disk-based OLTP RDBMS. Historically focused on the embedded systems market, especially telecom and networking, they’ve recently been in the news because of a deal with MySQL that is designed to extend their reach.
- Intel, makers of the processors used to run a lot of the other sponsors’ products (including all BI Accelerator installations to date).
If there’s one area in my research I’m not 100% satisfied with, it may be the question of where the true hardware bottlenecks to memory-centric data management lie (it’s obvious that the bottleneck to disk-centric data management is random disk access). Is it processor interconnect (around 1 GB/sec)? Is it processor-to-cache connections (around 5 GB/sec)? My prior pronouncements, the main body of the white paper, and the Intel Q&A appendix to the white paper may actually have slightly different spins on these points.
And by the way — the current hard limit on RAM/board isn’t 2^64 bytes, but a “mere” 2^40. But don’t worry; it will be up to 2^48 long before anybody actually puts 256 gigabytes under the control of a single processor.
Posted in Cognos and Applix TM1, Intel, MOLAP, Memory-centric data management, Open source RDBMS, Products and vendors, Progress, Apama, and DataDirect, Relational database management systems, SAP, BI Accelerator, and MaxDB, solidDB | 1 Comment »
January 27th, 2006 Curt Monash
“MOLAP” stands for “Multidimensional OLAP.” It’s almost exactly what Ted Codd was referring to in the white paper where he introduced the term “OLAP.” Relational advocates correctly point out that relational tables are NOT “two-dimensional;” rather, every column in a table represents a dimension.
(If that’s not obvious, think of rows in a table as n-tuples, and n-tuples as akin to vectors. Then think back to the linear algebra segment at the beginning of your Calculus of Several Variables class. Vector spaces? Dimensions? I rest my case.)
Despite all that, I’m comfortable with the “M” in MOLAP, because a dimension in a MOLAP hypercube is a lot more complex than a dimension in a relational table. The latter is itself — well, if there’s a sort order, it’s typically one dimensional. But the analog in a MOLAP cube can be a whole rich and complex hierarchy.
So yes — MOLAP is inherently more multidimensional than ROLAP, atlhough one can of course do something equivalent to a single hypercube by creating a whole lot of different tables.
Posted in MOLAP | No Comments »
January 27th, 2006 Curt Monash
I did a webinar on memory-centric data management for Applix. It was the standard hour in length, but they had me do the vast majority of the talking, so I laid out my ideas in some detail.
In line with their business focus, I emphasized OLAP in general and MOLAP in particular. But I did have a chance to lay out pretty much the whole story.
There’s a lot of material in it I haven’t published yet in written form, and some nuances I may never get around to writing down. So if you’re sufficiently interested in the area, I recommend watching the webinar.
Posted in MOLAP, Memory-centric data management | No Comments »
January 13th, 2006 Curt Monash
What I’ve written so far in this blog (and in Computerworld) about memory-centric data management technology is just the tip of the iceberg. A detailed white paper is forthcoming, sponsored by most of the industry leaders: Applix, Progress, SAP, Intel (in association with SAP), and Solid. (But for some odd reason Oracle declined to participate …)
A lot of the material will be rolled out publically for the first time in a webinar on Wednesday, January 25, at 11 EST. Applix is the host. To participate, please follow this link.
I’m also holding forth online, in webinars and even video, on other subjects these days. More details may be found over in the Monash Report.
Posted in MOLAP, Memory-centric data management | No Comments »
December 9th, 2005 Curt Monash
I just spent a couple of days at SAP’s analyst meeting, and realized something I’d somewhat forgotten – much of the DBMS2 concept was inspired by SAP’s technical strategy. That’s not to say that SAP’s techies necessarily agree with me on every last point. But I do think it is interesting to review SAP’s version of DBMS2, to the extent I understand it.
1. SAP’s Enterprise Services Architecture (ESA) is meant to be, among other things, an abstraction layer over relational DBMS. The mantra is that they’re moving to a “message-based architecture” as opposed to a “database architecture.” These messages are in the context of a standards-based SOA, with a strong commitment to remaining open and standards-based, at least on the data and messaging levels. (The main limitation on openness that I’ve detected is that they don’t think much of standards such as BPEL in the business process definition area, which aren’t powerful enough for them.)
2. One big benefit they see to this strategy is that it reduces the need to have grand integrated databases. If one application manages data for an entity that is also important to another application, the two applications can exchange messages about the entity. Anyhow, many of their comments make it clear that, between partner company databases (a bit of a future) and legacy app databases (a very big factor in the present day), SAP is constantly aware of situations in which a single integrated database in infeasible.
3. SAP is still deeply suspicious of redundant transactional data. They feel that with redundant data you can’t have a really clean model – unless, of course, you code up really rigorous synchronization. However, if for some reason synchronization is preferred – e.g., for performance reasons — it can be hidden from users and most developers.
4. One area where SAP definitely favors redundancy and synchronization is data warehousing. Indeed, they have an ever more elaborate staging system to move data from operational to analytic systems.
5. In general, they are far from being relational purists. For example, Shai Agassi referred to doing things that you can’t do in a pure relational approach. And Peter Zencke reminded me that this attitude is nothing new. SAP has long had complex business objects, and even done some of its own memory management to make them performant, when they were structured in a manner that RDBMS weren’t well suited for. (I presume he was referring largely to BAPI.)
6. That said, they’re of course using relational data stores today for most things. One exception is text/content, which they prefer to store in their own text indexing/management system TREX. Another example is their historical support for MOLAP, although they seem to be edging as far away from that as they can without offending the MOLAP-loving part of their customer base.
Incidentally, the whole TREX strategy is subject to considerable doubt too. It’s not a state-of-the-art product, and they currently don’t plan to make it into one. In particular, they have a prejudice against semi-automated ontology creation, and that has clearly become a requirement for top-tier text technologies.
7. One thing that Peter said which confused me a bit is when we were talking about nonrelational data retrieval. The example he used was retrieving information on all of a specific sales reps’ customers, or perhaps on several sales reps’ customers. I got the feeling he was talking about the ability to text search on multiple columns and/or multiple tables/objects/whatever at once, but I can’t honestly claim that I connected all the dots.
And of course, the memory-centric ROLAP tool BI Accelerator — technology that’s based on TREX — is just another example of how SAP is willing to go beyond passively connecting to a single RDBMS. And while their sponsorship of MaxDB isn’t really an example of that, it is another example of how SAP’s strategy is not one to gladden the hearts of the top-tier DBMS vendors.
Posted in Database theory and practice, EII, ETL, and/or EAI, Hierarchies, networks, graphs, and trees, MOLAP, OLTP database management, Relational database management systems, SAP, BI Accelerator, and MaxDB | 7 Comments »
November 14th, 2005 Curt Monash
I’m writing more and more about memory-centric data management technology these days, including in my latest Computerworld column. You may be wondering what that term refers to. Well, I’ve basically renamed what are commonly called “in-memory DBMS,” for what I think is a very good reason: Most of the products in the category aren’t true DBMS, aren’t wholly in-memory, or both! Indeed, if you catch me in a grouchy mood I might argue that “in-memory DBMS” is actually a contradiction in terms.
I’ll give a quick summary of the vendors and products I am focusing on in this newly-named category, and it should be clearer what I mean:
- TimesTen (now owned by Oracle): TimesTen is the quintessentional “in-memory DBMS.” It’s a fairly full relational DBMS, but if you want to persist memory to disk it has to be handed off to a conventional DBMS. Historically, that has usually been MySQL or Oracle. TimesTen’s biggest market penetration has been in financial trading.
- Solid Information Technology’s BoostEngine: Solid is a Finnish company (or was — it’s pretty American now) specializing in embedded DBMS sold mainly for telecommunication uses. Big OEM customers include several well-known telecom equipment manufacturers and HP (for OpenView). “Embedded” often means no DBA, no monitor, no keyboard — they box manufacturer installs it and there it stays for the life of the product. Solid has to offer strong replication capabilities, since its products are often used in highly distributed (e.g., multiblade, multibox) environments. So it’s taken the next step and exploited the replication by allowing customers to use some instances of the product disklessly.
- Event-stream products from Streambase and Progress: The canonical application for event-stream products is automating financial trading decisions based on the flow of market information. Mike Stonebraker, the brains behind Streambase, has recently popularized the idea; Progress bought Apama, who actually have been in the business longer. These applications require even more speed than the financial trading apps that TimesTen handles, and they discard most of the information they look at. In-memory is the only way to go.
- Progress’s ObjectStore: ObjectStore comes from the company Object Design, which merged into Excelon, which was acquired by Progress. It’s really a toolkit for building DBMS and similar systems, which is why it’s at various times been marketed as an OODBMS and an XML DBMS, without a lot of success either way. But there have been a few sterling apps built in ObjectStore even so, including a key part of the Amazon bookstore Despite this limited market success, a significant fraction of Progress’s best engineering talent has moved over to the Real-Time Division to focus on ObjectStore and other memory-centric products. The memory-centric aspect of ObjectStore is this: ObjectStore’s big virtue is that it gets objects from disk to memory and vice-versa very efficiently, then distributes and caches them around a network as needed. This was originally invented for client/server processing, but works fine in a multi-server thin client setup as well. And object processing, of course, relies on a whole lot of pointers. And pointer-chasing is pretty much the worst way to deal with the disk speed barrier, unless you do it in main memory.
- Applix’s TM1: Like many companies in the analytics area, Applix has had trouble deciding whether it sells applications, BI system software, or both. But in any case its core technology is TM1, a memory-centric MOLAP offering. Traditional MOLAP products reside on the horns of a nasty dilemma: They rely on precalculation to give good performance, but that causes ghastly database explosion. Applix gets out of this problem by doing no precalculation whatsoever, loading the data into main memory, and executing all queries on the fly.
- SAP’s BI Accelerator: SAP is building out an elaborate technology stack with NetWeaver, especially in the BI area. One important aspect is that the full data warehouse is logically broken (or copied) into a series of data marts called “InfoCubes.” BI Accelerator takes the logical next step, loading an entire InfoCube into main memory. Almost every query is executed via a full table scan, which would be insane on disk but makes perfect sense when the data is already in RAM.
So there you have it. There are a whole lot of technologies out there that manage data in RAM, in ways that would make little or no sense if disks were more intimately involved. Conventional DBMS also try to exploit RAM and limit disk access, via caching; but generally the data access methods they use in RAM are pretty similar to those they use when going out to disk. So memory-centric systems can have a major advantage.
Posted in Cognos and Applix TM1, Complex event/stream processing (CEP), Data types, MOLAP, Memory-centric data management, OLTP database management, Objects, Oracle TimesTen, Progress, Apama, and DataDirect, SAP, BI Accelerator, and MaxDB, solidDB | 2 Comments »
August 13th, 2005 Curt Monash
For all practical purposes, there are no DBMS vendors left advocating single-server strategies. Oracle was the last one, but it just acquired in-memory data management vendor TimesTen, which will be used as a cache in front of high-performance Oracle databases. (It will also continue to be sold for stand-alone uses, especially in the financial trading and defense/intelligence markets.)
IBM’s Viper is a server-and-a-half story, with lots of integration over a dual-server (one relational, one native XML) base. IBM also is moving aggressively in data integration/federation, with Ascential and many other acquisitions. It also sells a broad range of database products itself, including two DB2s, several Informix products, and so on.
Microsoft also has a multi-server strategy. In its case, relational, text, and MOLAP storage are more separate than in Oracle’s or even IBM’s products; again, there’s a thick layer of technology on top integrating them. An eventual move to native XML storage will, one must imagine, be handled in the same way.
Smaller vendors Sybase and Progress also offer multiple DBMS each.
Teradata is a pretty big player with only one DBMS — but it’s specialized for data warehousing. Teradata is the first to tell you you should use something else for your classical transaction processing.
The Grand Unified Integrated Database theory is, so far as I can tell, quite dead. Some people just refuse to admit that fact.
Technorati Tags: DBMS, DBMS2, database, Oracle, DB2, Microsoft, In-memory data management, Dead parrot
Posted in Database diversity, Database theory and practice, IBM and DB2, MOLAP, Memory-centric data management, Microsoft and SQL*Server, Oracle, Progress, Apama, and DataDirect, Relational database management systems | No Comments »