SAS Institute – DBMS 2 : DataBase Management System Services

Notes on analytic technology, May 13, 2015

Curt Monash — Thu, 14 May 2015 02:38:50 +0000

1. There are multiple ways in which analytics is inherently modular. For example:

Business intelligence tools can reasonably be viewed as application development tools. But the “applications” may be developed one report at a time.
The point of a predictive modeling exercise may be to develop a single scoring function that is then integrated into a pre-existing operational application.
Conversely, a recommendation-driven website may be developed a few pages — and hence also a few recommendations — at a time.

Also, analytics is inherently iterative.

Everything I just called “modular” can reasonably be called “iterative” as well.
So can any work process of the nature “OK, we got an insight. Let’s pursue it and get more accuracy.”

If I’m right that analytics is or at least should be modular and iterative, it’s easy to see why people hate multi-year data warehouse creation projects. Perhaps it’s also easy to see why I like the idea of schema-on-need.

2. In 2011, I wrote, in the context of agile predictive analytics, that

… the “business analyst” role should be expanded beyond BI and planning to include lightweight predictive analytics as well.

I gather that a similar point is at the heart of Gartner’s new term citizen data scientist. I am told that the term resonates with at least some enterprises.

3. Speaking of Gartner, Mark Beyer tweeted

In data management’s future “hybrid” becomes a useless term. Data management is mutable, location agnostic and services oriented.

I replied

And that’s why I launched DBMS2 a decade ago, for “DataBase Management System SERVICES”.

A post earlier this year offers a strong clue as to why Mark’s tweet was at least directionally correct: The best structures for writing data are the worst for query, and vice-versa.

4. The foregoing notwithstanding, I continue to believe that there’s a large place in the world for “full-stack” analytics. Of course, some stacks are fuller than others, with SaaS (Software as a Service) offerings probably being the only true complete-stack products.

5. Speaking of full-stack vendors, some of the thoughts in this post were sparked by a recent conversation with Platfora. Platfora, of course, is full-stack except for the Hadoop underneath. They’ve taken to saying “data lake” instead of Hadoop, because they believe:

It’s a more benefits-oriented than geek-oriented term.
It seems to be more popular than the roughly equivalent terms “data hub” or “data reservoir”.

6. Platfora is coy about metrics, but does boast of high growth, and had >100 employees earlier this year. However, they are refreshingly precise about competition, saying they primarily see four competitors — Tableau, SAS Visual Analytics, Datameer (“sometimes”), and Oracle Data Discovery (who they view as flatteringly imitative of them).

Platfora seems to have a classic BI “land-and-expand” kind of model, with initial installations commonly being a few servers and a few terabytes. Applications cited were the usual suspects — customer analytics, clickstream, and compliance/governance. But they do have some big customer/big database stories as well, including:

100s of terabytes or more (but with a “lens” typically being 5 TB or less).
4-5 customers who pressed them to break a previous cap of 2 billion discrete values.

7. Another full-stack vendor, ScalingData, has been renamed to Rocana, for “root cause analysis”. I’m hearing broader support for their ideas about BI/predictive modeling integration. For example, Platfora has something similar on its roadmap.

Related links

I did a kind of analytics overview last month, which had a whole lot of links in it. This post is meant to be additive to that one.

What matters in investigative analytics?

Curt Monash — Sun, 06 Oct 2013 12:10:21 +0000

In a general pontification on positioning, I wrote:

every product in a category is positioned along the same set of attributes,

and went on to suggest that summary attributes were more important than picky detailed ones. So how does that play out for investigative analytics?

First, summary attributes that matter for almost any kind of enterprise software include:

Performance and scalability. I write about analytic performance and scalability a lot. Usually that’s in the context of analytic DBMS, but it also arises in analytic stacks such as Platfora, Metamarkets or even QlikView, and also in the challenges of making predictive modeling scale.
Reliability, availability and security.* This is more crucial for short-request applications than analytic ones, but even your analytic systems shouldn’t leak data or crash.
Goodness of fit with legacy systems. I hate that one, because enterprises often sacrifice way too much in favor of that benefit.
Price. Duh.

*I picked up that phrase when — abbreviated as RAS — it was used to characterize the emphasis for Oracle 8. I like it better than a general and ambiguous concept of “enterprise-ready”.

The reason I’m writing this post, however, is to call out two summary attributes of special importance in investigative analytics — which regrettably which often conflict with each other — namely:

Agility. People don’t want to submit requests for reports or statistical analyses; they want to get answers as soon as the questions come to mind.
Completeness of feature set — for a particular use case, that is. There’s no such thing as an investigative analytics offering with a feature set that’s close to complete for all purposes; even SAS, IBM and other behemoths fall short.

Much of what I work on boils down to those two subjects. For example:

I recently suggested that navigation is a huge part of business intelligence differentiation. That’s because good navigation pretty much equates to BI agility. With luck, a BI tool that has the right navigation on the right data will get you to the result set you want, all within a few minutes.
There’s an obvious demand for agile predictive analytics. But if agility were all that mattered, KXEN — which excels in agility — would probably have done a lot better; KXEN’s problem was that it didn’t offer enough algorithmic breadth to meet enough users’ demands or needs.
Conversely, SAS has an exceptionally broad feature set. But few parts of the SAS product line offer much in the way of agility.
I’ve argued that analytic apps need to be continually customized, which is about as strong a pitch for agility as one can make. And that’s one of the major reasons that packaged analytic apps can’t really be feature-complete.
On the other hand, if you view incomplete predictive modeling apps as agility-enhancing application quick-starts — well, you’ve just described some of the most agile and also some of the most important parts of the SAS product line.
From an agility standpoint, the integration of predictive modeling into business intelligence would seem like pure goodness. Unfortunately, the most natural ways to do such integration would have very limited predictive features.
My clients at Teradata Aster probably see things differently — Edit: indeed they do — but I don’t think their library of pre-built analytic packages has been a big success. The same goes for other analytic platform vendors who have done similar (generally lesser) things. I believe that this is because such limited libraries don’t do enough of what users want.
I noted in July that complex, multi-stage predictive modeling is increasingly in vogue. Well, if predictive modeling is much more complicated than before, then things have to happen to make each step — or at least the average step — a lot easier and faster. I think that’s a core part of the value proposition for startups such as Ayasdi.

And finally: It is easier to be feature-complete — or at least feature-rich — for particular markets than across-the-board. That’s why I’ve steered a number of full-stack BI or predictive modeling technology clients toward vertical strategies.

Trends in predictive modeling

Curt Monash — Fri, 20 Sep 2013 12:10:36 +0000

I talked with Teradata about a bunch of stuff yesterday, including this week’s announcements in in-database predictive modeling. The specific news was about partnerships with Fuzzy Logix and Revolution Analytics. But what I found more interesting was the surrounding discussion. In a nutshell:

Teradata is finally seeing substantial interest in in-database modeling, rather than just in-database scoring (which has been important for years) and in-database data preparation (which is a lot like ELT — Extract/Load/transform).
Teradata is seeing substantial interest in R.
It seems as if similar groups of customers are interested in both parts of that, such as:
- Usual-suspect consumer marketing sectors (telecom, credit card, retail).*
- Semiconductor manufacturing.**
- Parallelized SAS modeling on Teradata seems to be limited by the small number of algorithms that are parallelized. (SAS scoring, I presume, is a different matter.)

This is the strongest statement of perceived demand for in-database modeling I’ve heard. (Compare Point #3 of my July predictive modeling post.) And fits with what I’ve been hearing about R.

*That’s very similar to the list of sectors for SAS HPA.

**To support their extremely high focus on product quality, semiconductor manufacturers have been using state-of-the-art analytic tools for at least 30 years.

In-database modeling is a performance feature, and performance can have several kinds of benefit, which may be summarized as “cheaper”, “better”, and “previously impractical”. My impression is that in-database modeling is pretty far toward the “previously impractical” end of the spectrum; enterprises don’t adopt a new way of predictive modeling until they want to create models that the old way can’t get done.

Basically, I think that models are increasingly:

Richer and more diverse than before. (see for example Point #5 of my July predictive modeling post.)
Developed in a more experimental and quickly-iterative way than before.

I think the first point pretty much implies the second, but the converse isn’t as clear; one can tweak old-style models in quick-turnaround fashion even more easily than one can develop the more complex newer styles.

And finally: I’m not hearing that modeling — even when it’s parallel and in-database fast — is commonly done on a complete many-terabyte dataset. It’s not a question I always remember to ask; for example, I didn’t bring it up with Teradata. But when I do, I rarely hear of models being trained on more than a few terabytes of data each.

SAP is buying KXEN

Curt Monash — Wed, 11 Sep 2013 16:30:17 +0000

First, some quick history.

I first heard of KXEN 7-8 years ago from Roman Bukary, then of SAP. He positioned KXEN as an easy-to-embed predictive modeling tool, which was getting various interesting partnerships and OEM deals.
Returning those near-roots, KXEN is being bought (Q4 expected close) by SAP.
I say “near roots” because KXEN’s original story had something to do with SVMs (Support Vector Machines).
But that was already old news back in 2006, and KXEN had pivoted to a simpler and more automated modeling approach. Presumably, this ease of modeling was part of the reason for KXEN’s OEM/partnership appeal.

However, I don’t want to give the impression that KXEN is the second coming of Crystal Reports. Most of what I heard about KXEN’s partnership chops, after Roman’s original heads-up, came from Teradata. Even KXEN itself didn’t seem to see that as a major part of their strategy.

And by the way, KXEN is yet another example of my observation that fancy math rarely drives great enterprise software success.

KXEN’s most recent strategies are perhaps best described by contrasting it to the vastly larger SAS.

SAS is built around a programming language for statisticians. KXEN tries to automate away many of the steps that SAS experts would program.
This goes to the extreme that statistically-astute businesspeople are supposed to be able to use KXEN themselves. (However, it’s a general rule — dating back to the 1970s — that marketing claims of “programmers/technologists/experts aren’t needed” tend to be more aspirational than accurate.)
SAS tries to offer every statistical and machine learning algorithm under the sun. KXEN is pretty focused on a single statistical approach.
KXEN has followed SAS into offering applications. (It’s also a general rule that predictive modeling “apps” tend to be more in the way of quick-starts than complete products.)
KXEN has recently tried to sell into markets where SAS isn’t strong, for example internet companies.

That all sounds a bit like a disruption narrative, but KXEN CEO John Ball never gave me the impression he thought strongly in those terms. And indeed KXEN never disrupted much of anything.

So what will SAP do with KXEN? Integrating predictive modeling and business intelligence is both important and difficult. So I imagine they’ll try, but I won’t hold my breath for great short-term success.

The bigger win could come on the application side. I’m skeptical about “analytic applications”, because it’s so tough to build complete ones. But let’s imagine an application that had elements of:

Database query and update.
Workflow.
Reporting and perhaps other BI.
Predictive modeling.

That would seem more plausible, because it allows the analytic aspects to be smaller and more circumscribed.

As for which specific application areas could use predictive components, the usual suspects are:

Above all, marketing and CRM (Customer Relationship Management).
Fraud.
Risk, although KXEN is nowhere near handling hardcore Basel III compliance, Monte Carlo techniques, or anything like that.
Quality, especially if we include maintenance as part of quality.

I imagine SAP will start trying to integrate KXEN in some of those areas.

Cloudera Hadoop strategy and usage notes

Curt Monash — Sun, 25 Aug 2013 15:40:07 +0000

When we scheduled a call to talk about Sentry, Cloudera’s Charles Zedlewski and I found time to discuss other stuff as well. One interesting part of our discussion was around the processing “frameworks” Cloudera sees as most important.

The four biggies are:
- MapReduce. Duh.
- SQL, specifically Impala. This is as opposed to the uneasy Hive/MapReduce layering.
- Search.
- “Math” , which seems to mainly be through partnerships with SAS and Revolution Analytics. I don’t know a lot about how these work, but I presume they bypass MapReduce, in which case I could imagine them greatly outperforming Mahout.
Stream processing (Storm) is next in line.
Graph — e.g. Giraph — rises to at least the proof-of-concept level. Again, the hope would be that this well outperforms graph-on-MapReduce.
Charles is also seeing at least POC interest in Spark.
But MPI (Message Passing Interface) on Hadoop isn’t going anywhere fast, except to the extent it’s baked into SAS or other “math” frameworks. Generic MPI use cases evidently turn out to be a bad fit for Hadoop, due to factors such as:
- Low data volumes.
- Latencies in various parts of the system

HBase was artificially omitted from this “frameworks” discussion because Cloudera sees it as a little bit more of a “storage” system than a processing one.

Another good subject was offloading work to Hadoop, in a couple different senses of “offload”:

From general-purpose data stores, mainly RDBMS, analytic or otherwise. This sounds similar to Hortonworks’ views about efficiency-oriented offloading; batch work can be moved to Hadoop, saving costs and/or getting more mileage from costs that are already sunk into expensive legacy installations. The top targets here are large, centralized systems, with Teradata being a clear #1 and IBM mainframes a probable #2, but anything from Oracle to newer parallel analytic RDBMS is fair game.
From the specialized data stores associated with fuller technology stacks. The example I had in mind was Splunk; Charles added Palantir, HP Arcsight and, in the past, Endeca. The idea here is that Hadoop is used to organize and/or index data the way those products’ native data stores would, but in higher volumes than they are (cost-)effective for.

On a pickier note, I encouraged Charles to push back against Hortonworks’ arguments for ORC vs. Parquet. His first claim was that ORC at this time only works under Hive, while Parquet can also be used for Hive, MapReduce, etc. (Edit: But see Arun Murthy’s comment below.) I suspect this is a case where Hortonworks and Cloudera should just get over themselves, and either agree on a file format or wind up each supporting both of them. There’s a lot of DBMS-like tooling in Hadoop’s future, and I have to think it will work better — or at least run faster — if it can make reliable assumptions about how data is actually stored.

Related links

In connection with its 0.1 version, Jakob Homan of LinkedIn contrasted Giraph to MapReduce-based graph processing.
I wrote a series about graph processing in May, 2012.
MPI used to be a higher Hadoop priority (August, 2011). That’s why I’ve kept bringing it up.

More notes on predictive modeling

Curt Monash — Fri, 12 Jul 2013 08:37:25 +0000

My July 2 comments on predictive modeling were far from my best work. Let’s try again.

1. Predictive analytics has two very different aspects.

Developing models, aka “modeling”:

Is a big part of investigative analytics.
May or may not be difficult to parallelize and/or integrate into an analytic RDBMS.
May or may not require use of your whole database.
Generally is done by humans.
Often is done by people with special skills, e.g. “statisticians” or “data scientists”.

More precisely, some modeling algorithms are straightforward to parallelize and/or integrate into RDBMS, but many are not.

Using models, most commonly:

Is done by machines …
… that “score” data according to the models.
May be done in batch or at run-time.
Is embarrassingly parallel, and is much more commonly integrated into analytic RDBMS than modeling is.

2. Some people think that all a modeler needs are a few basic algorithms. (That’s why, for example, analytic RDBMS vendors are proud of integrating a few specific modeling routines.) Other people think that’s ridiculous. Depending on use case, either group can be right.

3. If adoption of DBMS-integrated modeling is high, I haven’t noticed.

4. The term predictive analytics was invented or at least popularized by SPSS, some years before IBM bought the company. The industry eventually adopted the term. I prefer predictive modeling. It is fair to say that predictive modeling subsumes both statistical modeling and machine learning.

Nobody really knows exactly what data mining does or doesn’t include — the term is a poster child for Monash’s Third Law — but whatever it is, it seems central to the SAS and SPSS product lines. Simply using “data mining” as a synomyn for “predictive modeling” won’t lead you too far astray.

5. In that July 2 post I wrote:

I think the predictive modeling state of the art has become:

Cluster in some way.

Model separately on each cluster.

“Cluster in some way” can actually mean several things, for example:

K-means or whatever.
Ayasdi’s exotic approach to (quasi-)clustering.
Decision trees.

The one thing it doesn’t mean is “scale out”, and I apologize for the ambiguity to whoever read it the wrong way.

This is often called ensemble modeling, except that I think — what a shock! — different people use the term somewhat differently.

6. Much of the difficulty and delay-to-value in predictive modeling comes from data preparation/feature selection — not so much the scripting of the ETL (Extract/Transform/Load), but rather choices about which variables to model on and, often, how to describe them. So it’s unsurprising that vendors sometimes tell me “Our tool is great because the data preparation is automagically handled”; I’ve heard that from companies as big as KXEN and as small as Simularity.

Typically, what’s going on is that they’ve come up with a particular approach to modeling that, among other virtues, has the short-time-to-value benefit. Well:

Users may not want to replace their current modeling tools and associated business processes with a new one-trick pony.
On the other hand, the new tools could accelerate the use of the old, just because of what they provide in feature selection and data prep. I.e., you do the best you can with the new tool, and that tells you what data to put into the your old one.

I think some KXEN users follow just such an approach.

7. I’ve spent a few hours talking with Ayasdi, and I’m still confused. But here are a few notes as best I understand things.

Company basics include:

Innovative approach to predictive modeling, based on some advanced mathematics.
~50 people.
~20 paying customers.
Verticals of financial services, bio/pharma, oil/gas (!), and government.
Downtown Palo Alto. (I parked next to ClearStory and walked over to Ayasdi for my meeting.)

Buzz says Ayasdi has a heavy component of professional service in what it does. Ayasdi disputes this. Buzz also says Ayasdi is hot. I doubt Ayasdi disputes that part.

There’s some serious math involved in Ayasdi, but I’m skeptical about that aspect, for several reasons:

I haven’t understood it yet.
Ayasdi occasionally says things that are mathematically incorrect. (No, a topology does NOT assume an underlying metric space.)
Advanced math and software rarely mix well. Even when the company does OK, the original advanced math claim tends to fade into the background. (E.g., support vector machines at KXEN, rough sets at Infobright.)

So I’ll just summarize Ayasdi’s math, as best I understand it, this way:

Ayasdi uses a variety of different scoring functions to group your data into buckets.
Ayasdi looks at which data points wind up in the same bucket several times, or in nearby ones.
Users are encouraged to model separately — in most cases to date using tools and techniques from outside Ayasdi — on the most interesting of those sets of especially friendly data points.
The whole thing could be viewed as inducing a covering of, say, the real line. Pretty pictures ensue based on the nerve of that covering and/or a kind of Reeb graph of any one of the scoring functions. (One of the many things I don’t understand about the math of Ayasdi is how those two possibilities dovetail together.)

8. I’m hearing a few more mentions of Mahout than I used to.

9. Skytree is accumulating some resources (money, people), but I haven’t talked with them.

Notes on Teradata systems

Curt Monash — Mon, 15 Apr 2013 06:53:39 +0000

Teradata is announcing its new high-end systems, the Teradata 6700 series. Notes on that include:

Teradata tends to get 35-55% (roughly speaking) annual performance improvements, as measured by its internal blended measure Tperf. A big part of this is exploiting new-generation Intel processors.
This year the figure is around 40%.
The 6700 is based on Intel’s Sandy Bridge.
Teradata previously told me that Ivy Bridge — the next one after Sandy Bridge — could offer a performance “discontinuity”. So, while this is just a guess, I expect that next year’s Teradata performance improvement will beat this year’s.
Teradata has now largely switched over to InfiniBand.

Teradata is also talking about data integration and best-of-breed systems, with buzzwords such as:

Teradata Unified Data Architecture.
Fabric-based computing, even though this isn’t really about storage.
Teradata SQL-H.

The upshot is that Teradata has at least 6 kinds of rack or cabinet it wants to sell you — along with software to connect them — of which it really thinks you should get at least 3:

The 4 main Teradata-software appliances:
- Active Enterprise Data Warehouse (the new 6700). Teradata thinks every sufficiently large enterprise should have one of these.
- Extreme Performance Appliance (Teradata 4xxx), based on solid-state drives (which are also used in the 6xxx systems). At least I think so; the 4xxx wasn’t in the most recent slide deck I saw.
- Data Warehouse Appliance (Teradata 2700).
- Extreme Data Appliance (Teradata 1650).
The Teradata Aster Big Analytics Appliance, running Aster and Hadoop software. Teradata basically thinks everybody should have one of these too.
A separate cabinet for special-purpose “Teradata Managed Servers”. While there’s some space for Managed Servers in other Teradata appliances, Teradata now offers so many such capabilities that it thinks you will likely need a separate rack for those as well. These include (partial list):
- Viewpoint system management.
- Backup.
- Teradata Unity.
- Data movement, which is not the same thing as Teradata Unity.
- Data loading, which is yet something else.
- Generic compute (notably, to run SAS).

Even that doesn’t exhaust the possibilities:

The 36 InfiniBand ports Teradata can fit into a cabinet aren’t enough, it suggests and presumably will sell you free-standing Mellanox switches as an alternative.
That slide deck split the Big Analytics Appliance back out into Aster and Hadoop options.
There also seems to be a SAS-specific modeling appliance.

And you can have — or in some cases must have — Teradata Managed Server nodes in other kinds of Teradata appliance.

Finally, Teradata also offers a stand-alone single- or several-node Teradata 670 Data Mart Appliance, notes on which include:

The Teradata 670’s entry price is under $1/2 million, if you want to use it as your first Teradata system (something that evidently is happening, mainly outside the Americas).
Another use for the Teradata 670 is for physical — as opposed to virtual — data mart spin-out.
The primary use for the Teradata Data Mart Appliance, however, seems to be test/development for larger Teradata systems.
The Teradata Data Mart Appliance is one of the options for placing in a separate managed-server Teradata rack.

Related links

My recent musings on the variety of clusters and appliances an enterprise could have.
A March, 2012 post on various vendors’ admissions that multiple analytic database systems are needed.

The 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms — company-by-company comments

Curt Monash — Tue, 21 Feb 2012 12:38:07 +0000

This is one of a series of posts on business intelligence and related analytic technology subjects, keying off the 2011/2012 version of the Gartner Magic Quadrant for Business Intelligence Platforms. The four posts in the series cover:

Overview comments about the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms, as well as a link to the actual document.
Business intelligence industry trends — some of Gartner’s thoughts but mainly my own.
(This post) Company-by-company comments based on the 2011/2012 Gartner Magic Quadrant for Business Intelligence Platforms.
Third-party analytics, pulling together and expanding on some points I made in the first three posts.

The heart of Gartner Group’s 2011/2012 Magic Quadrant for Business Intelligence Platforms was the company comments. I shall expound upon some, roughly in declining order of Gartner’s “Completeness of Vision” scores, dubious though those rankings may be.

IBM/Cognos
- Gartner gives IBM credit for its broad variety of analytics-related product lines and services, whether or not they currently have much to do with each other. Examples include Cognos, SPSS, Netezza, and apparently DataStage.
- Gartner basically says that Cognos 8 users are unhappy and Cognos 10 users are much happier.
- Gartner apparently has drunk the IBM Watson Kool-Aid, notwithstanding IBM’s difficulties in articulating how the Watson Jeopardy-playing machine has any more to do with real world problems than the Deep Blue chess-playing machine did. (Deep Gene is nice, but I don’t see it as a major enterprise computing direction.)
SAP/Business Objects
- Gartner says that SAP’s BI “customer experience” rating is the worst in the survey for the fifth consecutive year.
- Gartner further criticizes SAP for BI functionality (below average in every one of 14 categories), performance, cost, and roadmap confusion.
- Gartner ranks SAP extremely high for BI “Completeness of Vision”, apparently because it is a big company that talks a great game.
- Gartner ranks SAP much lower for “Ability to Execute”.
Microsoft
- Gartner praises Microsoft’s business intelligence product line for being cheap and in some respects well-integrated.
- Gartner criticizes Microsoft’s BI products for in other respects being poorly-integrated.
- Gartner speaks well of Microsoft’s in-memory BI approaches, and ties that into the success of SQL Server Analysis services.
- Gartner criticizes Microsoft business intelligence efforts for being focused on tabular databases and on traditional (as opposed to mobile) clients.
MicroStrategy
- Gartner praises MicroStrategy for its mobile offerings.
- Gartner praises MicroStrategy for its integrated architecture, then criticizes it for the consequences of its architectural choices.
- Gartner criticizes MicroStrategy for not being an all-things-to-all-people megavendor.
Oracle
- Gartner says Oracle’s business intelligence customers give it below-average ranks in functionality, support, and product quality.
- Even so, Gartner says that the top reason for selecting Oracle BI is functionality.
- Gartner says that ease of use and cost do not factor significantly into the process when Oracle BI is chosen.
SAS Institute
- Gartner has little good to say about SAS’ BI, which it ranks dead last in ease of implementation, second to last in ease of use for business users, and dead last in dashboard capabilities.
- Gartner also cites SAS’ cost as being high.
Information Builders
- Gartner praises Information Builders’ BI for its functionality, integration, and scalability.
- Nonetheless, Gartner positions Information Builders BI as having relatively simple use cases (lots of parameterized reports and drilldown in same).
- Gartner cites Information Builders as not just losing market share, but having actually declining revenue.

The other three vendors in Gartner Magic Quadrant for Business Intelligence Platforms “Leaders” or “Challengers” quadrants are advanced visualization poster children QlikTech, Tibco Spotfire, and Tableau. That’s in decreasing order of “Completeness of Vision” — i.e., of sales/marketing maturity — but increasing order of “Ability to Execute”. Gartner gushes over Tableau:

For the third year in a row, Tableau is the “sweetheart” of the Magic Quadrant, with customers even more enamored with it this year than in the last two. It gained overwhelmingly positive customer survey feedback across the board in all measures in the survey, including ease of use, functionality, product quality, product performance, support, customer relationship, success, achievement of business benefits and view of the vendor’s future. Indeed, it earned a top or near top score in most of these key categories — even with its high revenue growth (94% in 2011), when growing pains might be expected. These stellar results in part contributed to Tableau’s strong Ability to Execute position, despite its relatively small size.

but is more measured about QlikView:

QlikTech’s growing pains are more evident. The note of realism that first appeared in 2010 grew in 2011 and became a genuine concern for 2012. For the first time, QlikTech’s customers reported having a poor overall customer experience (of the vendors on the Magic Quadrant only SAP, IBM, Targit and Microsoft fared worse), and below average ratings for product quality and support. Furthermore, more QlikTech customers than for any other vendor (with the exception of Oracle) said that QlikView became less successful in the previous year (that is, the product is being used by fewer users, or is being replaced by other tools).

and makes a good point about the whole category in a Tibco bullet:

Tibco Spotfire has among the highest complexity of user analysis score than any vendor on the Magic Quadrant, while at the same time customers rate it above average on ease of use, particularly for end users. This paradox typifies why data discovery tools in general, and Tibco Spotfire in particular, are so compelling and are proliferating.

Beyond that:

Gartner says QlikTech’s customers find QlikView very functional and useful.
Gartner notes that other kinds of BI vendors, seeing the success of interactive visualization vendors QlikTech, Tibco Spotfire, and Tableau, want to attack that market.
Gartner points out various missing pieces in those vendors’ product lines and sales coverage.
Gartner is a bigger fan of analytic-application stories than I am.

Gartner places 11 other BI vendors in the “Niche” quadrant. Of those, the ones I know most about are the three open source vendors. Gartner’s views on Jaspersoft seem to boil down to:

Strong, successful choice for OEM reporting (the old Crystal Reports positioning), including in cloud/SaaS deployments and in what I would call stakeholder-facing analytics.
Used by enterprises mainly for reporting, at which Jaspersoft is low cost but not particularly functional.
Early in supporting non-relational (Hadoop, NoSQL) data sets.

Somehow this adds up to the lowest “Ability to Execute” rank in the whole Gartner BI Magic Quadrant.

Gartner sees Pentaho similarly to how it sees Jaspersoft, and gives it the second-lowest “Ability to Execute” score in the quadrant. Actuate is placed higher in the Quadrant than Jaspersoft or Pentaho, but it’s tough to see why from Gartner’s comments, unless Gartner was giving credit for the best aspects of each of Actuate’s two alternative reporting software product lines.

Applications of an analytic kind

Curt Monash — Sun, 12 Feb 2012 01:32:17 +0000

The most straightforward approach to the applications business is:

Take general-purpose technology and think through how to apply it to a specific application domain.
Produce packaged application software accordingly.

However, this strategy is not as successful in analytics as in the transactional world, for two main reasons:

Analytic applications of that kind are rarely complete.
Incomplete applications rarely sell well.

I first realized all this about a decade ago, after Henry Morris coined the term analytic applications and business intelligence companies thought it was their future. In particular, when Dave Kellogg ran marketing for Business Objects, he rattled off an argument to the effect that Business Objects had generated more analytic app revenue over the lifetime of the company than Cognos had. I retorted, with only mild hyperbole, that the lifetime numbers he was citing amounted to “a bad week for SAP”. Somewhat hoist by his own petard, Dave quickly conceded that he agreed with my skepticism, and we changed the subject accordingly.

Reasons that analytic applications are commonly less complete than the transactional kind include:

Transactional apps often serve to automate rigid business processes. Analytic technology use is inherently more flexible and varied.
Transactional apps are often used by cheaper/lower-status people. Analytic technology may be used by managers who treasure the right of individualized decision making.

There are indeed scenarios in which incomplete analytic applications can be useful. For example:

If a user has sufficiently simple needs, cookie-cutter analytic apps — perhaps offered on a SaaS (Software as a Service) — basis might suffice.
Small teams of technical workers can kick-start their analytic efforts with pre-built booster kits. Two examples come to mind:
- SAS Institute has done quite well with statistical “applications” that really are just accelerators for custom statistical work of the usual kind.
- Starter-kit data models for data warehousing have some value as well.

But otherwise, I think the best opportunities for application-specific analytic technology aren’t really classical “analytic apps”. Rather, they arise in three sometimes-overlapping areas, adjacent to the analytic application core:

Operational applications enhanced with some analytics so as to improve routine business processes.
Information services enhanced with some analytic technology that retrieves (and perhaps also helps analyze) the information.
Analytic-application-specific “platform” technology.

Operational applications have been enhanced with analytics for as long as we have had reports. Indeed, meeting that reporting need was the core business for Crystal Reports, the only business intelligence company ever to build a large OEM/VAR business (it was eventually merged into Business Objects). Analytic enhancement is also a major direction for application behemoths Oracle and SAP, but I won’t address that aspect in this post.

If you offer a service whose essence is tabular-structured information — e.g. a third-party data source or some stakeholder-facing analytics — then you also need to provide business intelligence capability to the information’s consumers. Too often, however, those BI capabilities are unimpressive, and there’s an “easy” improvement in upgrading them that should happen before more serious analytic-app capabilities are addressed.

What I’m most excited about right now is analytic-application-specific “platform” technology, an area in which I’ve sensed a groundswell of interest over the past 6-12 months. It’s at the heart of a significant fraction of the new startup ideas I’m hearing, and rightly so; on the other hand, it’s also been going on for decades. Here is a grab-bag of examples.

Simulation and optimization have been around since the 1970s, if not before. One cool effort was by River Logic, which developed a visual programming language especially geared to profitability/logistics kinds of simulations in the 1990s. (While still around, the company unfortunately doesn’t seem to have done much for the past 1 – 1 1/2 decades.)
Much more established is SAP’s APO (Advanced Planner and Optimizer), dating back to at least the 1990s. Given the magnitude of the mixed-integer programming problems it tackles, I would conjecture it includes some built-in domain-specific heuristics you might not find in a generic set of mathematical packages.
The financial services industry has long featured domain-specific technology. From the 1800s through the 1970s, this was focused on communications, from stock tickers (one of Thomas Edison’s first important inventions) to networks of stock quote machines. In the 1980s, that expanded to include what we’d recognize even today as real-time business intelligence tools, and then also to complex security-valuation analytics.
What’s more, the whole area of CEP/streaming has traditionally been focused on financial trading, for reasons including low latency, time series orientation, and the opportunity to parameterize queries across a broad set of ticker symbols.
Despite a lot of application potential, general-purpose text analytics technology has floundered. But when text analytics technology is specifically extended for marketing applications, it does better. Indeed, marketing applications don’t use general-purpose text mining technology to its fullest power. But they do add the relatively new analytic techniques of sentiment analysis. They further add capabilities to analyze short, ungrammatical “verbatims”, such as text messages.
My clients at Metamarkets — Mike Driscoll et al. — have built a pretty cool technology stack focused on real-time/in-memory BI, well-suited for digital advertising and similar markets. I question whether it has much applicability outside of that space, however, because every industry that I can think of that needs real-time BI needs something rather different.
WibiData is focused in a similar area, but on actually personalizing things rather than on monitoring personalization’s effects. WibiData believes this requires aggressive use of derived data and the associated schema evolution.
Log analyzer Sumo Logic probably doesn’t rely on an off-the-shelf machine learning engine.
Other apparent examples showed up in the comment thread to my November, 2011 post on agile predictive analytics.

It will be fascinating to see how this all plays out.

Comments on SAS

Curt Monash — Wed, 08 Feb 2012 22:51:11 +0000

A reporter interviewed me via IM about how CIOs should view SAS Institute and its products. Naturally, I have edited my comments (lightly) into a blog post. They turned out to be clustered into three groups, as follows:

SAS faces a number of challenges, not unlike those faced by other high-priced legacy technology vendors.
- It is used by organizations who have large budgets to pay for the product and to pay people to be expert on the product’s intricacies.
- SAS has not integrated with scale-out analytic DBMS technologies as well or quickly as had been hoped, or as earlier marketing suggested was likely.
- SAS has not been strong in helping its users do agile predictive analytics.
SAS’ strengths are concentrated in product breadth:
- Lots of statistical algorithms.
- Various vertical products that make the modeling techniques more accessible in specific application domains.
- Various approaches to engineering for scalability — no one of those has been a table-thumping success to date, but SAS has the resources to keep trying.
- Some level of integration with its own business intelligence and text analytics products.
For any particular use case, the burden of proof is on SAS alternatives to show that they have enough pieces in the toolkit to meet the needs.
- SPSS (now owned by IBM) also has legacy issues.
- KXEN is focused on marketing use cases.
- Mahout has been one of the less successful Hadoop-related open source projects.
- R-based technology is still maturing.
- The modeling capabilities (as opposed to just scoring) bundled into RDBMS and well-parallelized tend to be pretty limited. Apparent exceptions tend to just be R repackaged.