Sumo Logic and UIs for text-oriented data
I talked with the Sumo Logic folks for an hour Thursday. Highlights included:
- Sumo Logic does SaaS (Software as a Service) log management.
- Sumo Logic is text indexing/Lucene-based. Thus, it is reasonable to think of Sumo Logic as “Splunk-like”. (However, Sumo Logic seems to have a stricter security/trouble-shooting orientation than Splunk, which is trying to branch out.)
- Sumo Logic has hacked Lucene for faster indexing, and says 10-30 second latencies are typical.
- Sumo Logic’s main differentiation is automated classification of events.
- There’s some kind of streaming engine in the mix, to update counters and drive alerts.
- Sumo Logic has around 30 “customers,” free (mainly) or paying (around 5) as the case may be.
- A truly typical Sumo Logic customer has single to low double digits of gigabytes of log data per day. However, Sumo Logic seems highly confident in its ability to handle a terabyte per customer per day, give or take a factor of 2.
- When I asked about the implications of shipping that much data to a remote data center, Sumo Logic observed that log data compresses really well.
- Sumo Logic recently raised a bunch of venture capital.
- Sumo Logic’s founders are out of ArcSight, a log management company HP paid a bunch of money for.
- Sumo Logic coined a marketing term “LogReduce”, but it has nothing to do with “MapReduce”. Sumo Logic seems to find this amusing.
What interests me about Sumo Logic is that automated classification story. I thought I heard Sumo Logic say:
- It’s largely unsupervised machine learning.
- It’s specific to a particular user/data set.
- It can be up and running and classifying things effectively almost instantly (i.e., on seconds’ or minutes’ worth of data).
- It’s informed by what different users tag as false positives. (Or maybe that is planned for future versions.)
I have a little trouble seeing how all those points fit exactly together, so perhaps I got some details wrong.
The payoff is that machine learning directly informs the Sumo Logic user interface. In particular, large numbers of events are bundled into a small number of categories, hopefully making it much easier for network operations types to scan the UI and pick out what’s important.
In general, the idea of machine-learning informing analytic UIs via some sort of classification is common in text-oriented technologies, notably in:
- Good ol’ text search.
- Text mining vendors’ approaches to clustering hits on words or phrases that say substantially the same thing.
But otherwise it seems kind of rare, if we stipulate that ad-serving/general internet personalization isn’t really an analytic UI — but I’d love to hear of any interesting examples I’ve overlooked.
Comments
7 Responses to “Sumo Logic and UIs for text-oriented data”
Leave a Reply
Curt,
What is the unit for “single to low double digits of log data per day”? Is it GB?
Jim,
Ack! Thanks! Yes! Fixed.
[…] analyzer Sumo Logic probably doesn’t rely on an off-the-shelf machine learning […]
“Sumo Logic’s main differentiation is automated classification of events.”
– Is this a comparison to Splunk?
How else does it differ? More dependence on machine learning techniques?
I haven’t talked with Sumo Logic for a while. Their last PR pitch was a generic “Golly gee whiz big data SaaS cloud” piece of nonsense; if they actually enhanced the offering in interesting ways, they did a good job of covering it up.
If you’re interested in Sumo you can always contact them directly
Sumo has more differentiators on their backend–“elastic log processing” for scaling without performance implications, machine learning and native anomaly detection technology, dashboards which run off of continuous queries for auto-updating, etc. The cloud marketing “nonsense” has room for improvement 
[…] are apt to backfire instead. Splunk seems to actually have had some limited success intimidating Sumo Logic. But it tried something similar against Rocana, and I was set up to potentially be collateral […]