March 6, 2014

Splunk and inverted-list indexing

Some technical background about Splunk

In an October, 2009 technical introduction to Splunk, I wrote (emphasis added):

Splunk software both reads logs and indexes them. The same code runs both on the nodes that do the indexing and on machines that simply emit logs.

It turns out that the bolded part was changed several years ago. However, I don’t have further details, so let’s move on to Splunk’s DBMS-like aspects.

I also wrote:

The fundamental thing that Splunk looks at is an increment to a log – i.e., whatever has been added to the log since Splunk last looked at it.

That remains true. Confusingly, Splunk refers to these log increments as “rows”, even though they’re really structured and queried more like documents.

I further wrote:

Splunk has a simple ILM (Information Lifecycle management) story based on time. I didn’t probe for details.

Splunk’s ILM story turns out to be simple indeed.

Finally, I wrote:

I get the impression that most Splunk entity extraction is done at search time, not at indexing time. Splunk says that, if a <name, value> pair is clearly marked, its software does a good job of recognizing same. Beyond that, fields seem to be specified by users when they define searches.

and

I have trouble understanding how Splunk could provide flexible and robust reporting unless it tokenized and indexed specific fields more aggressively than I think it now does.

The point of what I in October, 2013 called

a high(er)-performance data store into which you can selectively copy columns of data

and which Splunk enthusiastically calls its “High Performance Analytic Store” is to meet that latter need.

Inverted-list indexing

Inverted list technology is confusing for several reasons, which start:  Read more

October 31, 2013

Specialized business intelligence

A remarkable number of vendors are involved in what might be called “specialized business intelligence”. Some don’t want to call it that, because they think that “BI” is old and passé’, and what they do is new and better. Still, if we define BI technology as, more or less:

then BI is indeed a big part of what they’re doing.

Why would vendors want to specialize their BI technology? The main reason would be to suit it for situations in which even the best general-purpose BI options aren’t good enough. The obvious scenarios are those in which the mismatch is one or both of:

For example, in no particular order: Read more

October 30, 2013

Splunk strengthens its stack

I’m a little shaky on embargo details — but I do know what was in my own quote in a Splunk press release that went out yesterday. :)

Splunk has been rolling out a lot of news. In particular:

I imagine there are some operationally-oriented use cases for which Splunk instantly offers the best Hadoop business intelligence choice available. But what I really think is cool is Splunk’s schema-on-need story, wherein:

That highlights a pretty serious and flexible vertical analytic stack. I like it.

October 30, 2013

Glassbeam instantiates a lot of trends

Glassbeam checked in recently, and they turn out to exemplify quite a few of the themes I’ve been writing about. For starters:

Glassbeam basics include:

All Glassbeam customers except one are SaaS/cloud (Software as a Service), and even that one was only offered a subscription (as oppose to perpetual license) price.

So what does Glassbeam’s technology do? Glassbeam says it is focused on “machine data analytics,” specifically for the “Internet of Things”, which it distinguishes from IT logs.* Specifically, Glassbeam sells to manufacturers of complex devices — IT (most of its sales so far ), medical, automotive (aspirational to date), etc. — and helps them analyze “phone home” data, for both support/customer service and marketing kinds of use cases. As of a recent release, the Glassbeam stack can: Read more

February 13, 2013

It’s hard to make data easy to analyze

It’s hard to make data easy to analyze. While everybody seems to realize this — a few marketeers perhaps aside — some remarks might be useful even so.

Many different technologies purport to make data easy, or easier, to an analyze; so many, in fact, that cataloguing them all is forbiddingly hard. Major claims, and some technologies that make them, include:

*Complex event/stream processing terminology is always problematic.

My thoughts on all this start:  Read more

November 5, 2012

Real-time confusion

I recently proposed a 2×2 matrix of BI use cases:

Let me now introduce another 2×2 matrix of analytic scenarios:

My point is that there are at least three different cool things people might think about when they want their analytics to be very fast:

There’s also one slightly boring one that however drives a lot of important applications: Read more

August 24, 2012

Hadoop notes: Informatica, Splunk, and IBM

Informatica, Splunk, and IBM are all public companies, and correspondingly reticent to talk about product futures. Hence, anything I might suggest about product futures from any of them won’t be terribly detailed, and even the vague generalities are “the Good Lord willin’ an’ the creek don’ rise”.

Never let a rising creek overflow your safe harbor.

Anyhow:

1. Hadoop can be an awesome ETL (Extract/Transform/Load) execution engine; it can handle huge jobs and perform a great variety of transformations. (Indeed, MapReduce was invented to run giant ETL jobs.) Thus, if one offers a development-plus-execution stack for ETL processes, it might seem appealing to make Hadoop an ETL execution option. And so:

Informatica told me about other interesting Hadoop-related plans as well, but I’m not sure my frieNDA allows me to mention them at all.

IBM, however, is standing aside. Specifically, IBM told me that it doesn’t see the point of doing the same thing, as its ETL engine — presumably derived from the old Ascential product line — is already parallel and performant enough.

2. Last year, I suggested that Splunk and Hadoop are competitors in managing machine-generated data. That’s still true, but Splunk is also preparing a Hadoop co-opetition strategy. To a first approximation, it’s just Hadoop import/export. However, suppose you view Splunk as offering a three-layer stack: Read more

May 3, 2012

Big Data hype?

A reporter wrote in to ask whether investor interest in “Big Data” was justified or hype. (More precisely, that’s how I reinterpreted his questions. :) ) His examples were Splunk’s IPO, Teradata’s stock price increase, and Birst’s financing. In a nutshell:

1. A great example of hype is that anybody is calling Birst a “Big Data” or “Big Data analytics” company. If anything, Birst is a “little data” analytics company that claims, as a differentiating feature, that it can handle ordinary-sized data sets as well. Read more

January 10, 2012

Splunk update

Splunk is announcing the Splunk 4.3 point release. Before discussing it, let’s recall a few things about Splunk, starting with:

As in any release, a lot of Splunk 4.3 is about “Oh, you didn’t have that before?” features and Bottleneck Whack-A-Mole performance speed-up. One performance enhancement is Bloom filters, which are a very hot topic these days. More important is a switch from Flash to HTML5, so as to accommodate mobile devices with less server-side rendering. Splunk reports that its users — especially the non-IT ones — really want to get Splunk information on the tablet devices. While this somewhat contradicts what I wrote a few days ago pooh-poohing mobile BI, let me hasten to point out:

That’s pretty much the ideal scenario for mobile BI: Timeliness matters and prettiness doesn’t.

Read more

January 8, 2012

Big data terminology and positioning

Recently, I observed that Big Data terminology is seriously broken. It is reasonable to reduce the subject to two quasi-dimensions:

given that

But the conflation should stop there.

*Low-volume/high-velocity problems are commonly referred to as “event processing” and/or “streaming”.

When people claim that bigness and structure are the same issue, they oversimplify into mush. So I think we need four pieces of terminology, reflective of a 2×2 matrix of possibilities. For want of better alternatives, my suggestions are:

Read more

Next Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.