Analysis of MOLAP (Multidimensional OnLine Analytic Processing) products and vendors. Related subjects include:
As is the case for most important categories of technology, discussions of BI can get confused. I’ve remarked in the past that there are numerous kinds of BI, and that the very origin of the term “business intelligence” can’t even be pinned down to the nearest century. But the most fundamental confusion of all is that business intelligence technology really is two different things, which in simplest terms may be categorized as user interface (UI) and platform* technology. And so:
- The UI aspect is why BI tends to be sold to business departments; the platform aspect is why it also makes sense to sell BI to IT shops attempting to establish enterprise standards.
- The UI aspect is why it makes sense to sell and market BI much as one would applications; the platform aspect is why it makes sense to sell and market BI much as one would database technology.
- The UI aspect is why vendors want to integrate BI with transaction-processing applications; the platform aspect is, I suppose, why they have so much trouble making the integration work.
- The UI aspect is why BI is judged on … well, on snazzy UIs and demos. The platform aspect is a big reason why the snazziest UI doesn’t always win.
*I wanted to say “server” or “server-side” instead of “platform”, as I dislike the latter word. But it’s too inaccurate, for example in the case of the original Cognos PowerPlay, and also in various thin-client scenarios.
Key aspects of BI platform technology can include:
- Query and data management. That’s the area I most commonly write about, for example in the cases of Platfora, QlikView, or Metamarkets. It goes back to the 1990s — notably the Business Objects semantic layer and Cognos PowerPlay MOLAP (MultiDimensional OnLine Analytic Processing) engine — and indeed before that to the report writers and fourth-generation languages of the 1970s. This overlaps somewhat with …
- … data integration and metadata management. Business Objects, Qlik, and other BI vendors have bought data integration vendors. Arguably, there was a period when Information Builders’ main business was data connectivity and integration. And sometimes the main value proposition for a BI deal is “We need some way to get at all that data and bring it together.”
- Security and access control – authentication, authorization, and all the additional As.
- Scheduling and delivery. When 10s of 1000s of desktops are being served, these aren’t entirely trivial. Ditto when dealing with occasionally-connected mobile devices.
|Categories: Business intelligence, Business Objects, ClearStory Data, Cognos, Data warehousing, Endeca, Information Builders, Metamarkets and Druid, MOLAP, Platfora, Predictive modeling and advanced analytics, QlikTech and QlikView||11 Comments|
It’s hard to make data easy to analyze. While everybody seems to realize this — a few marketeers perhaps aside — some remarks might be useful even so.
Many different technologies purport to make data easy, or easier, to an analyze; so many, in fact, that cataloguing them all is forbiddingly hard. Major claims, and some technologies that make them, include:
- “We get data into a form in which it can be analyzed.” This is the story behind, among others:
- Most of the data integration and ETL (Extract/Transform/Load) industries, software vendors and consulting firms alike.
- Many things that purport to be “analytic applications” or data warehouse “quick starts”.
- “Data reduction” use cases in event processing.*
- Text analytics tools.
- “Forget all that transformation foofarah — just load (or write) data into our thing and start analyzing it immediately.” This at various times has been much of the story behind:
- Relational DBMS, according to their inventor E. F. Codd.
- MOLAP (Multidimensional OnLine Analytic Processing), also according to RDBMS inventor E. F. Codd.
- Any kind of analytic DBMS, or general purpose DBMS used for data warehousing.
- Newer kinds of analytic DBMS that are faster than older kinds.
- The “data mart spin-out” feature of certain analytic DBMS.
- In-memory analytic data stores.
- NoSQL DBMS that have a few analytic features.
- TokuDB, similarly.
- Electronic spreadsheets, from VisiCalc to Datameer.
- “Our tools help you with specific kinds of analyses or analytic displays.” This is the story underlying, among others:
- The business intelligence industry.
- The predictive analytics industry.
- Algorithmic trading use cases in complex event processing.*
- Some analytic applications.
*Complex event/stream processing terminology is always problematic.
My thoughts on all this start: Read more
I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.
The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:
- SaaS (Software as a Service).
- A low-volume static website.
- A network focused on office software.
- A single cheap server, likely running a single instance of a general-purpose RDBMS.
Those enterprises, however, are generally not who I write for or about.
The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:
- A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
- MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
- In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
- Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
- Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
- Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.
|Categories: Business intelligence, Data warehousing, Games and virtual worlds, Hadoop, In-memory DBMS, MOLAP||10 Comments|
I write a lot about whether or not to use relational DBMS. For example:
- In May I surveyed relational vs. non-relational pros and cons at some length.
- Last November I mused about when it might be OK to do without joins.
- The question is implicit in a variety of posts about, say, document-oriented or object-oriented DBMS.
Before going further in that vein, I’d like to do a quick review of what E. F. “Ted” Codd was getting at with the relational model in the first place. Read more
In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. “Ted” Codd himself once suggested that relational DBMS weren’t best for analytics.* Analysis of machine-generated log data doesn’t always have a naturally relational aspect. And I could go on with more examples yet.
*Actually, he didn’t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one — but he was. And he was wrong anyway about the necessity for MOLAP. But let’s overlook those details.
Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):
- Data re-use. Ted Codd’s famed original paper referred to shared data banks for a reason.
- The benefits of normalization, which include:
- You only have to do programming work of writing something once …
- … and you don’t have to do the programming work of keeping multiple versions of the information consistent.
- You only have to do processing work of writing something once.
- You only have to buy storage to hold each fact once.
- Separation of concerns.
- Different people can worry about programming and “database stuff.”
- Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).
- Maturity and momentum, as reflected in the availability of:
- A broad variety of mature relational DBMS.
- Vast amounts of packaged software that “talks” SQL.
Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as: Read more
|Categories: Analytic technologies, Data models and architecture, Database diversity, MOLAP, NoSQL, Object, Theory and architecture||18 Comments|
It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more
|Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture||6 Comments|
Ray Wang made a terrific post based on SAP’s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a “crisis”, and sums up his views as
The Bottom Line – SAP’s Turning The Corner
Credit must be given to SAP for charting a new course. A shift in the management philosophy and product direction will take years to realize, however, its not too late for change. SAP must remember its roots and become more German and less American. The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy. The emphasis must focus on the relationship. When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its time to get to work and deliver. Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.
I recall the 1980s, when SAP’s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.
Anyhow, the reason I’m highlighting Ray’s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives. Read more
|Categories: Analytic technologies, Business intelligence, Memory-centric data management, MOLAP, SAP AG, Solid-state memory||1 Comment|
An enterprise user wrote in with a question that boils down to:
What are reasonable MDX performance expectations?
MDX doesn’t come up in my life very much, and I don’t have much intuition about it. E.g., I don’t know whether one can slap an MDX-to-SQL converter on top of a fast analytic RDBMS and go to town. What’s more, I’m heading off on vacation and don’t feel like researching the matter myself in the immediate future.
So here’s the long form of the question. Any thoughts?
I have a general question on assessing the performance of an OLAP technology using a set of MDX queries. I would be interested to know if there are any benchmark MDX performance tests/results comparing different OLAP technologies (which may be based on different underlying DBMS’s if appropriate) on similar hardware setup, or even comparisons of complete appliance solutions. More generally, I want to determine what performance limits I could reasonably expect on what I think are fairly standard servers.
In my own work, I have set up a star schema model centered on a Fact table of 100 million rows (approx 60 columns), with dimensions ranging in cardinality from 5 to 10,000. In ad hoc analytics, is it expected that any query against such a dataset should return a result within a minute or two (i.e. before a user gets impatient), regardless of whether that query returns 100 cells or 50,000 cells (without relying on any aggregate table or caching mechanism)? Or is that level of performance only expected with a high end massively parallel software/hardware solution? The server specs I’m testing with are: 32-bit 4 core, 4GB RAM, 7.2k RPM SATA drive, running Windows Server 2003; 64-bit 8 core, 32GB RAM, 3 Gb/s SAS drive, running Windows Server 2003 (x64).
I realise that caching of query results and pre-aggregation mechanisms can significantly improve performance, but I’m coming from the viewpoint that in purely exploratory analytics, it is not possible to have all combinations of dimensions calculated in advance, in addition to being maintained.