Analysis of MOLAP (Multidimensional OnLine Analytic Processing) products and vendors. Related subjects include:
There are plenty of viable alternatives to relational database management systems. For short-request processing, both document stores and fully object-oriented DBMS can make sense. Text search engines have an important role to play. E. F. “Ted” Codd himself once suggested that relational DBMS weren’t best for analytics.* Analysis of machine-generated log data doesn’t always have a naturally relational aspect. And I could go on with more examples yet.
*Actually, he didn’t admit that what he was advocating was a different kind of DBMS, namely a MOLAP one — but he was. And he was wrong anyway about the necessity for MOLAP. But let’s overlook those details.
Nonetheless, relational DBMS dominate the market. As I see it, the reasons for relational dominance cluster into four areas (which of course overlap):
- Data re-use. Ted Codd’s famed original paper referred to shared data banks for a reason.
- The benefits of normalization, which include:
- You only have to do programming work of writing something once …
- … and you don’t have to do the programming work of keeping multiple versions of the information consistent.
- You only have to do processing work of writing something once.
- You only have to buy storage to hold each fact once.
- Separation of concerns.
- Different people can worry about programming and “database stuff.”
- Indeed, even performance optimization can sometimes be separated from programming (i.e., when all you have to do to get speed is implement the correct indexes).
- Maturity and momentum, as reflected in the availability of:
- A broad variety of mature relational DBMS.
- Vast amounts of packaged software that “talks” SQL.
Generally speaking, I find the reasons for sticking with relational technology compelling in cases such as: Read more
|Categories: Analytic technologies, Data models and architecture, Database diversity, MOLAP, NoSQL, Object, Theory and architecture||21 Comments|
It seems my prediction of a limited blogging schedule in December came emphatically true. I shall re-start with a collection of quick thoughts, clearing the decks for more detailed posts to follow. Read more
|Categories: Analytic technologies, Data types, Data warehousing, DBMS product categories, MOLAP, Theory and architecture||6 Comments|
Ray Wang made a terrific post based on SAP’s annual influencer love-in, an event which I no longer attend. Ray believes SAP has been in a “crisis”, and sums up his views as
The Bottom Line – SAP’s Turning The Corner
Credit must be given to SAP for charting a new course. A shift in the management philosophy and product direction will take years to realize, however, its not too late for change. SAP must remember its roots and become more German and less American. The renewed focus must put customer requests and priorities ahead of SAP’s bureaucracy. The emphasis must focus on the relationship. When that reemerges in how SAP works with customers, partners, influencers, and its own employees, SAP will be back in good graces. In the meantime, its time to get to work and deliver. Oracle’s Fusions Apps are coming soon and competitors such as IBM, Microsoft, Epicor, IFS, and SalesForce.com will not relent.
I recall the 1980s, when SAP’s main differentiator, at least in the English-speaking US, was a total commitment to customer success, and when it could be taken for granted that SAP would do business ethically. Things change, and not always for the better.
Anyhow, the reason I’m highlighting Ray’s post is that he makes reference to a number of interesting SAP-cetric technology trends or initiatives. Read more
|Categories: Analytic technologies, Business intelligence, Memory-centric data management, MOLAP, SAP AG, Solid-state memory||1 Comment|
An enterprise user wrote in with a question that boils down to:
What are reasonable MDX performance expectations?
MDX doesn’t come up in my life very much, and I don’t have much intuition about it. E.g., I don’t know whether one can slap an MDX-to-SQL converter on top of a fast analytic RDBMS and go to town. What’s more, I’m heading off on vacation and don’t feel like researching the matter myself in the immediate future.
So here’s the long form of the question. Any thoughts?
I have a general question on assessing the performance of an OLAP technology using a set of MDX queries. I would be interested to know if there are any benchmark MDX performance tests/results comparing different OLAP technologies (which may be based on different underlying DBMS’s if appropriate) on similar hardware setup, or even comparisons of complete appliance solutions. More generally, I want to determine what performance limits I could reasonably expect on what I think are fairly standard servers.
In my own work, I have set up a star schema model centered on a Fact table of 100 million rows (approx 60 columns), with dimensions ranging in cardinality from 5 to 10,000. In ad hoc analytics, is it expected that any query against such a dataset should return a result within a minute or two (i.e. before a user gets impatient), regardless of whether that query returns 100 cells or 50,000 cells (without relying on any aggregate table or caching mechanism)? Or is that level of performance only expected with a high end massively parallel software/hardware solution? The server specs I’m testing with are: 32-bit 4 core, 4GB RAM, 7.2k RPM SATA drive, running Windows Server 2003; 64-bit 8 core, 32GB RAM, 3 Gb/s SAS drive, running Windows Server 2003 (x64).
I realise that caching of query results and pre-aggregation mechanisms can significantly improve performance, but I’m coming from the viewpoint that in purely exploratory analytics, it is not possible to have all combinations of dimensions calculated in advance, in addition to being maintained.
I have a large number of posts still in backlog. For starters, there are ones based on recent visits with Aster, Greenplum, Sybase, Vertica, and a Very Large User. I suspect I’ll write more soon on Oracle as well. Plus there’s my whole future-of-online-media area. And quite a bit more will grow out of planned research.
So there are a whole lot of other worthy subjects I doubt I’ll be getting to any time soon. In some cases, of course, other people are doing great jobs of writing about same. Here are pointers to a few links that I am glad to recommend:
- I wrote recently that I’ve discovered a number of different in-memory OLAP engines. Cindi Howson far outdid that, writing at length for Intelligent Enterprise on in-memory analytics, in an article that seems to itself be a teaser for a longer, free white paper on the subject.
- CouchDB posted an eye-catching, risque slide presentation promoting CouchDB and, more generally, key-value stores, at least for internet applications. And yes, they’ve integrated MapReduce.
- Merv Adrian posted favorably about Birst, with special reference to its OEM efforts. As previously noted, I was highly unimpressed with Birst’s end-user BI story at the time of its September roll-out, and Jerome Pineau’s recent examination did nothing to reassure me. But perhaps OEM is a different matter.
- Merv also offers an interesting post about data integration upstart Expressor, and a highly favorable one about “visualization” vendor Tableau.
- Ann All interviewed Nigel Pendse, who grumped that BI features are overrated, and what end users really want is great query performance. I’m not so sure about the features side of that, but I’m hugely in agreement about the performance. That’s a big part of why the analytic DBMS industry is so vibrant. It’s also why in-memory OLAP is suddenly so hot.
|Categories: Analytic technologies, Business intelligence, CouchDB, Data warehousing, EAI, EII, ETL, ELT, ETLT, Expressor, MapReduce, Memory-centric data management, MOLAP, Presentations, Tableau Software, Theory and architecture||Leave a Comment|
My skeptical remarks on the Aleri/Coral8 merger generated some pushback. Today I actually got around to talking with John Morell, who was marketing chief at Coral8 and has remained with the combined company. First, some quick metrics:
- The combined Aleri has around 100 employees, 60-40 from Aleri vs. Coral8.
- The combined Aleri has around 80 customers. All of Aleri’s, with one sort-of exception at Banks.com, were in financial services. A large minority of Coral8’s were in financial services too.
- However, half of Aleri’s marketing spend going forward is budgeted outside the financial services markets. Not unreasonably, John presents this as a proof point Aleri is serious about selling to other markets.
- Aleri had 12-14 people in the UK pre-merger. Coral8 had none in Europe.
- Coral8 had 15 OEMs pre-merger, some actually generating revenue. Aleri had substantially none.
- Coral8 had been closing a “couple” of customers/quarter in online commerce. But recently, that rate ramped up to a “few.”
- Aleri’s engine is used to handle “many” hundreds of thousands of messages per second. Coral8’s highest-throughput user processes 100-150,000 messages/second.
John is sticking by the company line that there will be an integrated Aleri/Coral8 engine in around 12 months, with all the performance optimization of Aleri and flexibility of Coral8, that compiles and runs code from any of the development tools either Aleri or Coral8 now has. While this is a lot faster than, say, the Informix/Illustra or Oracle/IRI Express integrations, John insists that integrating CEP engines is a lot easier. We’ll see.
I focused most of the conversation on Aleri’s forthcoming efforts outside the financial services market. John sees these as being focused around Coral8’s old “Continuous (Business) Intelligence” message, enhanced by Aleri’s Live OLAP. Aleri Live OLAP is an in-memory OLAP engine, real-time/event-driven, fed by CEP. Queries can be submitted via ODBO/MDX today. XMLA is coming. John reports that quite a few Coral8 customers are interested in Live OLAP, and positions the capability as one Coral8 would have had to develop had the company remained independent. Read more
|Categories: Aleri and Coral8, Analytic technologies, Application areas, Complex event processing (CEP), Games and virtual worlds, Investment research and trading, MOLAP, Web analytics||4 Comments|
I chatted yesterday with the general business side (as opposed to the trading operation) of a household-name brokerage firm, one that’s in no immediate financial peril. It seems their #1 analytic-technology priority right now is changing planning from an annual to a monthly cycle.* That’s a smart idea. While it’s especially important in their business, larger enterprises of all kinds should consider following suit. Read more
|Categories: Analytic technologies, Application areas, Business intelligence, Cognos, Data warehousing, IBM and DB2, MOLAP||Leave a Comment|
I just ran across a December 10 blog post by Chuck Hollis outlining some of EMC’s — or at least Chuck’s — views on data warehousing and business intelligence. It’s worth scanning, a certain “Where you stand depends upon where you sit” flavor to it notwithstanding. In a contrast to my usual blogging style, Chuck’s post is excerpted at length below, with comments from me interspersed. Read more
|Categories: Analytic technologies, Data warehousing, EMC, MOLAP, Solid-state memory, Storage||2 Comments|
When I went to Oracle in October, the main purpose of the visit was to discuss Exadata. And so my initial post based on the visit was focused accordingly. But there were a number of other interesting points I’ve never gotten around to writing up. Let me now remedy that, at least in part. Read more
|Categories: Complex event processing (CEP), Data types, Data warehousing, Database compression, GIS and geospatial, MOLAP, Oracle, SAP AG, Theory and architecture, Web analytics||9 Comments|
- Frankly, I’ve come to think that disk-based OLAP cubes and materialized views are both cop-outs, indicative of a relational data warehouse architecture that can’t answer queries quickly enough straight-up. But if you disagree, then you might like Oracle’s new OLAP cube materialized views, which sound like a worthy competitor to Microsoft Analysis Services. (Further confusing things, I’ve seen reports that Oracle is increasing its commitment to Essbase, a separate MOLAP engine. I hope those are incorrect.)
- A few weeks ago, I came to realize that Oracle’s data mining database features actually mattered — perhaps not quite as much as Charlie Berger might think, but to say that is to praise with faint damns. SPSS seems to be getting large performance gains from leveraging the scoring part, and perhaps the transformation part as well. I haven’t focused on getting my details right yet, so I haven’t been writing about it. But heck, with all the other Oracle data warehousing discussion, it seems right to at least mention this part too.