Log analysis

Discussion of how data warehousing and analytic technologies are applied to logfile analysis. Related subjects include:

April 25, 2013

Analytic application themes

I talk with a lot of companies, and repeatedly hear some of the same application themes. This post is my attempt to collect some of those ideas in one place.

1. So far, the buzzword of the year is “real-time analytics”, generally with “operational” or “big data” included as well. I hear variants of that positioning from NewSQL vendors (e.g. MemSQL), NoSQL vendors (e.g. AeroSpike), BI stack vendors (e.g. Platfora), application-stack vendors (e.g. WibiData), log analysis vendors (led by Splunk), data management vendors (e.g. Cloudera), and of course the CEP industry.

Yeah, yeah, I know — not all the named companies are in exactly the right market category. But that’s hard to avoid.

Why this gold rush? On the demand side, there’s a real or imagined need for speed. On the supply side, I’d say:

2. More generally, most of the applications I hear about are analytic, or have a strong analytic aspect. The three biggest areas — and these overlap — are:

Also arising fairly frequently are:

I’m hearing less about quality, defect tracking, and equipment maintenance than I used to, but those application areas have anyway been ebbing and flowing for decades.

Read more

April 23, 2013

MemSQL scales out

The third of my three MySQL-oriented clients I alluded to yesterday is MemSQL. When I wrote about MemSQL last June, the product was an in-memory single-server MySQL workalike. Now scale-out has been added, with general availability today.

MemSQL’s flagship reference is Zynga, across 100s of servers. Beyond that, the company claims (to quote a late draft of the press release):

Enterprises are already using distributed MemSQL in production for operational analytics, network security, real-time recommendations, and risk management.

All four of those use cases fit MemSQL’s positioning in “real-time analytics”. Besides Zynga, MemSQL cites penetration into traditional low-latency markets — financial services (various subsectors) and ad-tech.

Highlights of MemSQL’s new distributed architecture start: Read more

January 28, 2013

Attack of the Frankenschemas

In typical debates, the extremists on both sides are wrong. “SQL vs. NoSQL” is an example of that rule. For many traditional categories of database or application, it is reasonable to say:

Reasons to abandon SQL in any given area usually start:

Some would further say that NoSQL is cheaper, scales better, is cooler or whatever, but given the range of NewSQL alternatives, those claims are often overstated.

Sectors where these reasons kick in include but are not limited to: Read more

November 13, 2012

The future of dashboards, if any

Business intelligence dashboards are frequently bashed. I slammed them back in 2006 and 2007. Mark Smith dropped the hammer last August. EIS, the most dashboard-like pre-1990s analytic technology, was also the most reviled. There are reasons for this disdain, but even so dashboards shouldn’t be dismissed entirely.

In essence, I’d say:

In particular: Read more

November 5, 2012

Real-time confusion

I recently proposed a 2×2 matrix of BI use cases:

Let me now introduce another 2×2 matrix of analytic scenarios:

My point is that there are at least three different cool things people might think about when they want their analytics to be very fast:

There’s also one slightly boring one that however drives a lot of important applications: Read more

August 24, 2012

Hadoop notes: Informatica, Splunk, and IBM

Informatica, Splunk, and IBM are all public companies, and correspondingly reticent to talk about product futures. Hence, anything I might suggest about product futures from any of them won’t be terribly detailed, and even the vague generalities are “the Good Lord willin’ an’ the creek don’ rise”.

Never let a rising creek overflow your safe harbor.


1. Hadoop can be an awesome ETL (Extract/Transform/Load) execution engine; it can handle huge jobs and perform a great variety of transformations. (Indeed, MapReduce was invented to run giant ETL jobs.) Thus, if one offers a development-plus-execution stack for ETL processes, it might seem appealing to make Hadoop an ETL execution option. And so:

Informatica told me about other interesting Hadoop-related plans as well, but I’m not sure my frieNDA allows me to mention them at all.

IBM, however, is standing aside. Specifically, IBM told me that it doesn’t see the point of doing the same thing, as its ETL engine — presumably derived from the old Ascential product line — is already parallel and performant enough.

2. Last year, I suggested that Splunk and Hadoop are competitors in managing machine-generated data. That’s still true, but Splunk is also preparing a Hadoop co-opetition strategy. To a first approximation, it’s just Hadoop import/export. However, suppose you view Splunk as offering a three-layer stack: Read more

July 24, 2012

Notes on Datameer

In a short October, 2011 post about Datameer, I wrote:

Datameer is designed to let you do simple stuff on large amounts of data, where “large amounts of data” typically means data in Hadoop, and “simple stuff” includes basic versions of a spreadsheet, of BI, and of EtL (Extract/Transform/Load, without much in the way of T).

That’s all still mainly true, although with the recent Datameer 2.0:

In essence, Datameer has two positionings.

Read more

June 16, 2012

Introduction to Metamarkets and Druid

I previously dropped a few hints about my clients at Metamarkets, mentioning that they:

But while they’re a joy to talk with, writing about Metamarkets has been frustrating, with many hours and pages of wasted of effort. Even so, I’m trying again, in a three-post series:

Much like Workday, Inc., Metamarkets is a SaaS (Software as a Service) company, with numerous tiers of servers and an affinity for doing things in RAM. That’s where most of the similarities end, however, as  Metamarkets is a much smaller company than Workday, doing very different things.

Metamarkets’ business is SaaS (Software as a Service) business intelligence, on large data sets, with low latency in both senses (fresh data can be queried on, and the queries happen at RAM speed). As you might imagine, Metamarkets is used by digital marketers and other kinds of internet companies, whose data typically wants to be in the cloud anyway. Approximate metrics for Metamarkets (and it may well have exceeded these by now) include 10 customers, 100,000 queries/day, 80 billion 100-byte events/month (before summarization), 20 employees, 1 popular CEO, and a metric ton of venture capital.

To understand how Metamarkets’ technology works, it probably helps to start by realizing: Read more

March 21, 2012

DataStax Enterprise 2.0

Edit: Multiple errors in the post below have been corrected in a follow-on post about DataStax Enterprise and Cassandra.

My client DataStax is announcing DataStax Enterprise 2.0. The big point of the release is that there’s a bunch of stuff integrated together, including at least:

DataStax stresses that all this runs on the same cluster, with the same administrative tools and so on. For example, on a single cluster:

Read more

February 11, 2012

Applications of an analytic kind

The most straightforward approach to the applications business is:

However, this strategy is not as successful in analytics as in the transactional world, for two main reasons:

I first realized all this about a decade ago, after Henry Morris coined the term analytic applications and business intelligence companies thought it was their future. In particular, when Dave Kellogg ran marketing for Business Objects, he rattled off an argument to the effect that Business Objects had generated more analytic app revenue over the lifetime of the company than Cognos had. I retorted, with only mild hyperbole, that the lifetime numbers he was citing amounted to “a bad week for SAP”. Somewhat hoist by his own petard, Dave quickly conceded that he agreed with my skepticism, and we changed the subject accordingly.

Reasons that analytic applications are commonly less complete than the transactional kind include: Read more

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.