Predictive modeling and advanced analytics

Discussion of technologies and vendors in the overlapping areas of predictive analytics, predictive modeling, data mining, machine learning, Monte Carlo analysis, and other “advanced” analytics.

April 13, 2017

Analyzing the right data

0. A huge fraction of what’s important in analytics amounts to making sure that you are analyzing the right data. To a large extent, “the right data” means “the right subset of your data”.

1. In line with that theme:

2. Business intelligence interfaces today don’t look that different from what we had in the 1980s or 1990s. The biggest visible* changes, in my opinion, have been in the realm of better drilldown, ala QlikView and then Tableau. Drilldown, of course, is the main UI for business analysts and end users to subset data themselves.

*I used the word “visible” on purpose. The advances at the back end have been enormous, and much of that redounds to the benefit of BI.

3. I wrote 2 1/2 years ago that sophisticated predictive modeling commonly fit the template:

That continues to be tough work. Attempts to productize shortcuts have not caught fire.

Read more

March 26, 2017

Monitoring

A huge fraction of analytics is about monitoring. People rarely want to frame things in those terms; evidently they think “monitoring” sounds boring or uncool. One cost of that silence is that it’s hard to get good discussions going about how monitoring should be done. But I’m going to try anyway, yet again. :)

Business intelligence is largely about monitoring, and the same was true of predecessor technologies such as green paper reports or even pre-computer techniques. Two of the top uses of reporting technology can be squarely described as monitoring, namely:

Yes, monitoring-oriented BI needs investigative drilldown, or else it can be rather lame. Yes, purely investigative BI is very important too. But monitoring is still the heart of most BI desktop installations.

Predictive modeling is often about monitoring too. It is common to use statistics or machine learning to help you detect and diagnose problems, and many such applications have a strong monitoring element.

I.e., you’re predicting trouble before it happens, when there’s still time to head it off.

As for incident response, in areas such as security — any incident you respond to has to be noticed first Often, it’s noticed through analytic monitoring.

Hopefully, that’s enough of a reminder to establish the great importance of analytics-based monitoring. So how can the practice be improved? At least three ways come to mind, and only one of those three is getting enough current attention.

Read more

March 19, 2017

Cloudera’s Data Science Workbench

0. Matt Brandwein of Cloudera briefed me on the new Cloudera Data Science Workbench. The problem it purports to solve is:

Cloudera’s idea for a third way is:

In theory, that’s pure goodness … assuming that the automagic works sufficiently well. I gather that Cloudera Data Science Workbench has been beta tested by 5 large organizations and many 10s of users. We’ll see what is or isn’t missing as more customers take it for a spin.

Read more

February 28, 2017

Coordination, the underused “C” word

I’d like to argue that a single frame can be used to view a lot of the issues that we think about. Specifically, I’m referring to coordination, which I think is a clearer way of characterizing much of what we commonly call communication or collaboration.

It’s easy to argue that computing, to an overwhelming extent, is really about communication. Most obviously:

Indeed, it’s reasonable to claim:

A little less obvious is the much of this communication could be alternatively described as coordination. Some communication has pure consumer value, such as when we talk/email/Facebook/Snapchat/FaceTime with loved ones. But much of the rest is for the purpose of coordinating business or technical processes.

Among the technical categories that boil down to coordination are:

That’s a lot of the value in “platform” IT right there.  Read more

October 10, 2016

Notes on anomaly management

Then felt I like some watcher of the skies
When a new planet swims into his ken

— John Keats, “On First Looking Into Chapman’s Homer”

1. In June I wrote about why anomaly management is hard. Well, not only is it hard to do; it’s hard to talk about as well. One reason, I think, is that it’s hard to define what an anomaly is. And that’s a structural problem, not just a semantic one — if something is well enough understood to be easily described, then how much of an anomaly is it after all?

Artificial intelligence is famously hard to define for similar reasons.

“Anomaly management” and similar terms are not yet in the software marketing mainstream, and may never be. But naming aside, the actual subject matter is important.

2. Anomaly analysis is clearly at the heart of several sectors, including:

Each of those areas features one or both of the frameworks:

So if you want to identify, understand, avert and/or remediate bad stuff, data anomalies are the first place to look.

3. The “insights” promised by many analytics vendors — especially those who sell to marketing departments — are also often heralded by anomalies. Already in the 1970s, Walmart observed that red clothing sold particularly well in Omaha, while orange flew off the shelves in Syracuse. And so, in large college towns, they stocked their stores to the gills with clothing in the colors of the local football team. They also noticed that fancy dresses for little girls sold especially well in Hispanic communities … specifically for girls at the age of First Communion.

Read more

September 6, 2016

“Real-time” is getting real

I’ve been an analyst for 35 years, and debates about “real-time” technology have run through my whole career. Some of those debates are by now pretty much settled. In particular:

A big issue that does remain open is: How fresh does data need to be? My preferred summary answer is: As fresh as is needed to support the best decision-making. I think that formulation starts with several advantages:

Straightforward applications of this principle include: Read more

August 28, 2016

Are analytic RDBMS and data warehouse appliances obsolete?

I used to spend most of my time — blogging and consulting alike — on data warehouse appliances and analytic DBMS. Now I’m barely involved with them. The most obvious reason is that there have been drastic changes in industry structure:

Simply reciting all that, however, begs the question of whether one should still care about analytic RDBMS at all.

My answer, in a nutshell, is:

Analytic RDBMS — whether on premises in software, in the form of data warehouse appliances, or in the cloud – are still great for hard-core business intelligence, where “hard-core” can refer to ad-hoc query complexity, reporting/dashboard concurrency, or both. But they aren’t good for much else.

Read more

August 21, 2016

More about Databricks and Spark

Databricks CEO Ali Ghodsi checked in because he disagreed with part of my recent post about Databricks. Ali’s take on Databricks’ position in the Spark world includes:

Ali also walked me through customer use cases and adoption in wonderful detail. In general:

The story on those sectors, per Ali, is:  Read more

July 31, 2016

Notes on Spark and Databricks — technology

During my recent visit to Databricks, I of course talked a lot about technology — largely with Reynold Xin, but a bit with Ion Stoica as well. Spark 2.0 is just coming out now, and of course has a lot of enhancements. At a high level:

The majority of Databricks’ development efforts, however, are specific to its cloud service, rather than being donated to Apache for the Spark project. Some of the details are NDA, but it seems fair to mention at least:

Two of the technical initiatives Reynold told me about seemed particularly cool. Read more

July 31, 2016

Notes on Spark and Databricks — generalities

I visited Databricks in early July to chat with Ion Stoica and Reynold Xin. Spark also comes up in a large fraction of the conversations I have. So let’s do some catch-up on Databricks and Spark. In a nutshell:

I shall explain below. I also am posting separately about Spark evolution, especially Spark 2.0. I’ll also talk a bit in that post about Databricks’ proprietary/closed-source technology.

Spark is the replacement for Hadoop MapReduce.

This point is so obvious that I don’t know what to say in its support. The trend is happening, as originally decreed by Cloudera (and me), among others. People are rightly fed up with the limitations of MapReduce, and — niches perhaps aside — there are no serious alternatives other than Spark.

The greatest use for Spark seems to be the same as the canonical first use for MapReduce: data transformation. Also in line with the Spark/MapReduce analogy:  Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.