Comments on: Technical introduction to Splunk

By: BUy QE Keto Boost

BUy QE Keto Boost — Wed, 02 Mar 2022 00:17:27 +0000

This website truly has all of the information I needed concerning this subject and didn’t know
who to ask.

By: Splunk and inverted-list indexing | DBMS 2 : DataBase Management System Services

Thu, 06 Mar 2014 12:55:51 +0000

[…] an October, 2009 technical introduction to Splunk, I wrote (emphasis added): Splunk software both reads logs and indexes them. The same code runs […]

By: Collision of big data analytics and splunk | Splunk Blogs

Collision of big data analytics and splunk | Splunk Blogs — Thu, 08 Dec 2011 21:05:11 +0000

[…] was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into […]

By: Joshua Rodman

Joshua Rodman — Thu, 12 Nov 2009 23:36:34 +0000

Query performance is normally good, even on very large databases. If you experienced something else, then there was troubleshooting needed.

All data is indexed.

Can’t speak to pricing.

By: Rob

Rob — Mon, 26 Oct 2009 15:01:00 +0000

Splunk is conceptually a great product, but there are a couple of gotchas:

1) Query performance is dismal on even moderately sized data sets. It’s not a database, doesn’t have indexes, etc. I wanted to love Splunk, but the query performance just wasn’t there for exploring data. It’s primarily a batch-mode reporting tool. I could live with that, except…

2) Pricing. Splunk gets expensive fast, and the price is not well-aligned with the amount of value it delivers. I’d say it’s about twice as expensive as it should be.

By: Collision of big data analytics and splunk » erik

Collision of big data analytics and splunk » erik — Fri, 23 Oct 2009 18:53:14 +0000

[…] was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into […]

By: Erik Swan

Erik Swan — Wed, 21 Oct 2009 20:51:53 +0000

Hi Tom,
Try using a wildcard search like fail*. Also, if you want the conjunction try adding quotes – “failed login”. Usual suspects like NOT, OR work as well.

I agree that it helps to know whats in your logs – but i find the opposite that i find that heterogenous data is more *interesting*. Splunk doesn’t need any parsing rules or predetermined schema so you can dump in any data. I index all my logs, all my config files, the output from commands like vmstat, iostat, top, network traffic, as well as mail in my inbox, and so on. Its most interesting to splunk across all sorts of datasets as there are often interesting relationships between data. I know people who throw in pitch-by-pitch baseball stats, global windmill power plant output, protein prediction data, and on and on – its not just IT data.

One thing we are working on is a Guide to finding stuff in your data. I hope this will help people who pick up splunk, throw data at it, quickly find interesting information. I’ll re-post when the guide is ready.

Feel free to bug me if you have specific questions on usage and thanks for the comments.

By: Tom Grabowski

Tom Grabowski — Tue, 20 Oct 2009 19:46:35 +0000

Great post. The technology and architecture of Splunk is interesting. It looks like a useful tool for a sysadmin.

I tried the ‘failed login’ report on some of my system logs and it picked up the messages that had both the words ‘failed’ and ‘login’, but it didn’t pick up the applications that had ‘login failure’ or ‘failed logon’. I tried the word ‘fail’ but that didn’t register ‘failed’ or ‘failure’ either.

From what I can tell it is most useful in a homogeneous environment where you are very knowledgeable of the log format and contents before you run the queries.

By: Erik Swan

Erik Swan — Tue, 20 Oct 2009 00:44:36 +0000

Nice post, I thought i’d try and clarify our search results and tabular data.
As you point out, most of the time you interact with splunk by building and saving searches, usually through a simple and interactive process.

A search can be as simple as “failed login”, which will search our index using keywords much like the way Google will search the web for “failed login”, except that splunk will return log events, config files, network packets, etc., that contain those terms. Unlike a web search engine, Splunk will turn the results of any search into a table where the columns are either auto detected or can be specified by a user in advance. Auto detection works by looking for patterns of data like key=value or key:value, etc. User defined extractions can occur inadvance by specifying a regex or a user can use the UI to define a field. I’ll skip all they nice ways users can do this, but it’s usually easy to extract out fields if Splunk does not do so automatically.

I use the following example, suppose in Google i could say, “What is the average price of Pad Thai in San Francisco, broken out by Zip code over the past 6 months”. Something like Google would have a hard time of doing that, but that is a typical Splunk search – though analyzing Pad Thai prices in Splunk is not common but someone must have tried ;-).

The Splunk search language supports piping from one search command to the next. A table is the output of one command, and the input to the next, and is executed in our map reduce framework. The above example “failed login” defaults to “| search failed login”, since a search without a “|” defaults to the “search” command. The results of “failed login” return both the raw data so that users can see their log events, config files, etc, as well as a table. That table can be extremely sparse if the results are heterogenous or dense if all from the same source. Splunk has dozens of useful command to make reporting easy – for example, we could add to the above “failed login | top username” and the first table of results is piped through the “top” command which will quickly calculate an aggregate statistic listing a table of top usernames. Top is just one of many commands that you can easily string together and use to build reporting and analysis for putting on dashboards or using for alerting purposes. We have filtering commands like search, where, dedup. We have enriching commands like eval, extract, lookup, delta, fillnull, etc. We have reporting commands like stats, chart, timechart, rare, etc. And we have other transforming commands for extracting transactions, clustering, sorting, etc.. All very easy to use and work out of the box on any time series data.

Lastly we are looking at providing SQL interface to splunk so that tools that speak odbc/jdbc and query splunk.

Not sure that this comment helps any, but its important to understand how our search language works out-of-the-box for big data

By: Christina Noren

Christina Noren — Sun, 18 Oct 2009 20:01:00 +0000

Hello Curt and Jerome… to clarify the answers to both questions…

Our search execution uses MapReduce for all statistical analysis, whether on-demand when users search against the raw unsummarized index data, or on a scheduled basis into “summary indexes”, our version of materialized views. The latter may be how you picked up that we use MapReduce for indexing.

Re storage – we’ve built our own indexing technology and datastore – we rely on nothing more than the filesystem.

Hope this clarifies.