Analysis of MOLAP (Multidimensional OnLine Analytic Processing) products and vendors. Related subjects include:
I’m skeptical of data federation. I’m skeptical of all-things-to-all-people claims about logical data layers, and in particular of Gartner’s years-premature “Logical Data Warehouse” buzzphrase. Still, a reasonable number of my clients are stealthily trying to do some kind of data layer middleware, as are other vendors more openly, and I don’t think they’re all crazy.
Here are some thoughts as to why, and also as to challenges that need to be overcome.
There are many things a logical data layer might be trying to facilitate — writing, querying, batch data integration, real-time data integration and more. That said:
- When you’re writing data, you want it to be banged into a sufficiently-durable-to-acknowledge condition fast. If acknowledgements are slow, performance nightmares can ensue. So writing is the last place you want an extra layer, perhaps unless you’re content with the durability provided by an in-memory data grid.
- Queries are important. Also, they formally are present in other tasks, such as data transformation and movement. That’s why data manipulation packages (originally Pig, now Hive and fuller SQL) are so central to Hadoop.
After visiting California recently, I made a flurry of posts, several of which generated considerable discussion.
- My claim that Spark will replace Hadoop MapReduce got much Twitter attention — including some high-profile endorsements — and also some responses here.
- My MemSQL post led to a vigorous comparison of MemSQL vs. VoltDB.
- My post on hardware and storage spawned a lively discussion of Hadoop hardware pricing; even Cloudera wound up disagreeing with what I reported Cloudera as having said. Sadly, there was less response to the part about the partial (!) end of Moore’s Law.
- My Cloudera/SQL/Impala/Hive apparently was well-balanced, in that it got attacked from multiple sides via Twitter & email. Apparently, I was too hard on Impala, I was too hard on Hive, and I was too hard on boxes full of cardboard file cards as well.
- My post on the Intel/Cloudera deal garnered a comment reminding us Dell had pushed the Intel distro.
- My CitusDB post picked up a few clarifying comments.
Here is a catch-all post to complete the set. Read more
As is the case for most important categories of technology, discussions of BI can get confused. I’ve remarked in the past that there are numerous kinds of BI, and that the very origin of the term “business intelligence” can’t even be pinned down to the nearest century. But the most fundamental confusion of all is that business intelligence technology really is two different things, which in simplest terms may be categorized as user interface (UI) and platform* technology. And so:
- The UI aspect is why BI tends to be sold to business departments; the platform aspect is why it also makes sense to sell BI to IT shops attempting to establish enterprise standards.
- The UI aspect is why it makes sense to sell and market BI much as one would applications; the platform aspect is why it makes sense to sell and market BI much as one would database technology.
- The UI aspect is why vendors want to integrate BI with transaction-processing applications; the platform aspect is, I suppose, why they have so much trouble making the integration work.
- The UI aspect is why BI is judged on … well, on snazzy UIs and demos. The platform aspect is a big reason why the snazziest UI doesn’t always win.
*I wanted to say “server” or “server-side” instead of “platform”, as I dislike the latter word. But it’s too inaccurate, for example in the case of the original Cognos PowerPlay, and also in various thin-client scenarios.
Key aspects of BI platform technology can include:
- Query and data management. That’s the area I most commonly write about, for example in the cases of Platfora, QlikView, or Metamarkets. It goes back to the 1990s — notably the Business Objects semantic layer and Cognos PowerPlay MOLAP (MultiDimensional OnLine Analytic Processing) engine — and indeed before that to the report writers and fourth-generation languages of the 1970s. This overlaps somewhat with …
- … data integration and metadata management. Business Objects, Qlik, and other BI vendors have bought data integration vendors. Arguably, there was a period when Information Builders’ main business was data connectivity and integration. And sometimes the main value proposition for a BI deal is “We need some way to get at all that data and bring it together.”
- Security and access control – authentication, authorization, and all the additional As.
- Scheduling and delivery. When 10s of 1000s of desktops are being served, these aren’t entirely trivial. Ditto when dealing with occasionally-connected mobile devices.
|Categories: Business intelligence, Business Objects, ClearStory Data, Cognos, Data warehousing, Endeca, Information Builders, Metamarkets and Druid, MOLAP, Platfora, Predictive modeling and advanced analytics, QlikTech and QlikView||11 Comments|
It’s hard to make data easy to analyze. While everybody seems to realize this — a few marketeers perhaps aside — some remarks might be useful even so.
Many different technologies purport to make data easy, or easier, to an analyze; so many, in fact, that cataloguing them all is forbiddingly hard. Major claims, and some technologies that make them, include:
- “We get data into a form in which it can be analyzed.” This is the story behind, among others:
- Most of the data integration and ETL (Extract/Transform/Load) industries, software vendors and consulting firms alike.
- Many things that purport to be “analytic applications” or data warehouse “quick starts”.
- “Data reduction” use cases in event processing.*
- Text analytics tools.
- “Forget all that transformation foofarah — just load (or write) data into our thing and start analyzing it immediately.” This at various times has been much of the story behind:
- Relational DBMS, according to their inventor E. F. Codd.
- MOLAP (Multidimensional OnLine Analytic Processing), also according to RDBMS inventor E. F. Codd.
- Any kind of analytic DBMS, or general purpose DBMS used for data warehousing.
- Newer kinds of analytic DBMS that are faster than older kinds.
- The “data mart spin-out” feature of certain analytic DBMS.
- In-memory analytic data stores.
- NoSQL DBMS that have a few analytic features.
- TokuDB, similarly.
- Electronic spreadsheets, from VisiCalc to Datameer.
- “Our tools help you with specific kinds of analyses or analytic displays.” This is the story underlying, among others:
- The business intelligence industry.
- The predictive analytics industry.
- Algorithmic trading use cases in complex event processing.*
- Some analytic applications.
*Complex event/stream processing terminology is always problematic.
My thoughts on all this start: Read more
I can think of seven major reasons not to use an analytic RDBMS. One is good; but the other six seem pretty questionable, niche circumstances excepted, especially at this time.
The good reason to not have an analytic RDBMS is that most organizations can run perfectly well on some combination of:
- SaaS (Software as a Service).
- A low-volume static website.
- A network focused on office software.
- A single cheap server, likely running a single instance of a general-purpose RDBMS.
Those enterprises, however, are generally not who I write for or about.
The six bad reasons to not have an analytic RDBMS all take the form “Can’t some other technology do the job better?”, namely:
- A data warehouse that’s just another instance of your OLTP (OnLine Transaction Processing) RDBMS. If your problem is that big, it’s likely that a specialized analytic RDBMS will be more cost-effective and generally easier to deal with.
- MOLAP (Multi-Dimensional OnLine Analytic Processing). That ship has sailed … and foundered … and been towed to drydock.
- In-memory BI. QlikView, SAP HANA, Oracle Exalytics, and Platfora are just four examples of many. But few enterprises will want to confine their analytics to such data as fits affordably in RAM.
- Non-tabular* approaches to investigative analytics. There are many examples in the Hadoop world — including the recent wave of SQL add-ons to Hadoop — and some in the graph area as well. But those choices will rarely suffice for the whole job, as most enterprises will want better analytic SQL performance for (big) parts of their workloads.
- Tighter integration of analytics and OLTP (OnLine Transaction Processing). Workday worklets illustrate that business intelligence/OLTP integration is a really good idea. And it’s an idea that Oracle and SAP can be expected to push heavily, when they finally get their product acts together. But again, that’s hardly all the analytics you’re going to want to do.
- Tighter integration of analytics and other short-request processing. An example would be maintaining a casual game’s leaderboard via a NoSQL write-optimized database. Yet again, that’s hardly all the analytics a typical enterprise will want to do.
|Categories: Business intelligence, Data warehousing, Games and virtual worlds, Hadoop, In-memory DBMS, MOLAP||10 Comments|
I write a lot about whether or not to use relational DBMS. For example:
- In May I surveyed relational vs. non-relational pros and cons at some length.
- Last November I mused about when it might be OK to do without joins.
- The question is implicit in a variety of posts about, say, document-oriented or object-oriented DBMS.
Before going further in that vein, I’d like to do a quick review of what E. F. “Ted” Codd was getting at with the relational model in the first place. Read more
In Part 1 of this two-part series, I outlined four variants on the traditional enterprise data warehouse/data mart dichotomy, and suggested what kinds of DBMS products you might use for each. In Part 2 I’ll cover four more kinds of analytic database — even newer, for the most part, with a use case/product short list match that is even less clear. Read more
Analytic data management technology has blossomed, leading to many questions along the lines of “So which products should I use for which category of problem?” The old EDW/data mart dichotomy is hopelessly outdated for that purpose, and adding a third category for “big data” is little help.
Let’s try eight categories instead. While no categorization is ever perfect, these each have at least some degree of technical homogeneity. Figuring out which types of analytic database you have or need — and in most cases you’ll need several — is a great early step in your analytic technology planning. Read more
There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. “Ted” Codd himself once suggested that relational DBMS weren’t best for analytics.* Analysis of machine-generated log data doesn’t always have a naturally relational aspect. And I could go on with more examples yet.
*Actually, he didn’t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one — but he was. And he was wrong anyway about the necessity for MOLAP. But let’s overlook those details.
Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):
- Data re-use. Ted Codd’s famed original paper referred to shared data banks for a reason.
- The benefits of normalization, which include:
- You only have to do programming work of writing something once …
- … and you don’t have to do the programming work of keeping multiple versions of the information consistent.
- You only have to do processing work of writing something once.
- You only have to buy storage to hold each fact once.
- Separation of concerns.
- Different people can worry about programming and “database stuff.”
- Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).
- Maturity and momentum, as reflected in the availability of:
- A broad variety of mature relational DBMS.
- Vast amounts of packaged software that “talks” SQL.
Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as: Read more
|Categories: Analytic technologies, Data models and architecture, Database diversity, MOLAP, NoSQL, Object, Theory and architecture||20 Comments|
It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more
|Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture||6 Comments|