Web analytics

Discussion of how data warehousing and analytic technologies are applied to clickstream analysis and other web analytics challenges. Related subjects include:

October 21, 2016

Rapid analytics

“Real-time” technology excites people, and has for decades. Yet the actual, useful technology to meet “real-time” requirements remains immature, especially in cases which call for rapid human decision-making. Here are some notes on that conundrum.

1. I recently posted that “real-time” is getting real. But there are multiple technology challenges involved, including:

2. In early 2011, I coined the phrase investigative analytics, about which I said three main things: Read more

October 10, 2016

Notes on anomaly management

Then felt I like some watcher of the skies
When a new planet swims into his ken

— John Keats, “On First Looking Into Chapman’s Homer”

1. In June I wrote about why anomaly management is hard. Well, not only is it hard to do; it’s hard to talk about as well. One reason, I think, is that it’s hard to define what an anomaly is. And that’s a structural problem, not just a semantic one — if something is well enough understood to be easily described, then how much of an anomaly is it after all?

Artificial intelligence is famously hard to define for similar reasons.

“Anomaly management” and similar terms are not yet in the software marketing mainstream, and may never be. But naming aside, the actual subject matter is important.

2. Anomaly analysis is clearly at the heart of several sectors, including:

Each of those areas features one or both of the frameworks:

So if you want to identify, understand, avert and/or remediate bad stuff, data anomalies are the first place to look.

3. The “insights” promised by many analytics vendors — especially those who sell to marketing departments — are also often heralded by anomalies. Already in the 1970s, Walmart observed that red clothing sold particularly well in Omaha, while orange flew off the shelves in Syracuse. And so, in large college towns, they stocked their stores to the gills with clothing in the colors of the local football team. They also noticed that fancy dresses for little girls sold especially well in Hispanic communities … specifically for girls at the age of First Communion.

Read more

October 3, 2016

Notes on the transition to the cloud

1. The cloud is super-hot. Duh. And so, like any hot buzzword, “cloud” means different things to different marketers. Four of the biggest things that have been called “cloud” are:

Further, there’s always the idea of hybrid cloud, in which a vendor peddles private cloud systems (usually appliances) running similar technology stacks to what they run in their proprietary public clouds. A number of vendors have backed away from such stories, but a few are still pushing it, including Oracle and Microsoft.

This is a good example of Monash’s Laws of Commercial Semantics.

2. Due to economies of scale, only a few companies should operate their own data centers, aka true on-prem(ises). The rest should use some combination of colo, SaaS, and public cloud.

This fact now seems to be widely understood.

Read more

September 6, 2016

“Real-time” is getting real

I’ve been an analyst for 35 years, and debates about “real-time” technology have run through my whole career. Some of those debates are by now pretty much settled. In particular:

A big issue that does remain open is: How fresh does data need to be? My preferred summary answer is: As fresh as is needed to support the best decision-making. I think that formulation starts with several advantages:

Straightforward applications of this principle include: Read more

August 21, 2016

More about Databricks and Spark

Databricks CEO Ali Ghodsi checked in because he disagreed with part of my recent post about Databricks. Ali’s take on Databricks’ position in the Spark world includes:

Ali also walked me through customer use cases and adoption in wonderful detail. In general:

The story on those sectors, per Ali, is:  Read more

May 30, 2016

Adversarial analytics and other topics

Five years ago, in a taxonomy of analytic business benefits, I wrote:

A large fraction of all analytic efforts ultimately serve one or more of three purposes:

  • Marketing
  • Problem and anomaly detection and diagnosis
  • Planning and optimization

That continues to be true today. Now let’s add a bit of spin.

1. A large fraction of analytics is adversarial. In particular: Read more

May 18, 2016

Governments vs. tech companies — it’s complicated

Numerous tussles fit the template:

As a general rule, what’s best for any kind of company is — pricing and so on aside — whatever is best or most pleasing for their customers or users. This would suggest that it is in tech companies’ best interest to favor privacy, but there are two important quasi-exceptions: Read more

October 15, 2015

Couchbase 4.0 and related subjects

I last wrote about Couchbase in November, 2012, around the time of Couchbase 2.0. One of the many new features I mentioned then was secondary indexing. Ravi Mayuram just checked in to tell me about Couchbase 4.0. One of the important new features he mentioned was what I think he said was Couchbase’s “first version” of secondary indexing. Obviously, I’m confused.

Now that you’re duly warned, let me remind you of aspects of Couchbase timeline.

Technical notes on Couchbase 4.0 — and related riffs :) — start: Read more

September 17, 2015

Rocana’s world

For starters:

Rocana portrays itself as offering next-generation IT operations monitoring software. As you might expect, this has two main use cases:

Rocana’s differentiation claims boil down to fast and accurate anomaly detection on large amounts of log data, including but not limited to:

Read more

August 3, 2015

Data messes

A lot of what I hear and talk about boils down to “data is a mess”. Below is a very partial list of examples.

To a first approximation, one would expect operational data to be rather clean. After all, it drives and/or records business transactions. So if something goes awry, the result can be lost money, disappointed customers, or worse, and those are outcomes to be strenuously avoided. Up to a point, that’s indeed true, at least at businesses large enough to be properly automated. (Unlike, for example — :) — mine.)

Even so, operational data has some canonical problems. First, it could be inaccurate; somebody can just misspell or otherwise botch an entry. Further, there are multiple ways data can be unreachable, typically because it’s:

Inconsistency can take multiple forms, including:  Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.