<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Technical introduction to Splunk</title>
	<atom:link href="http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Tue, 09 Mar 2010 23:54:53 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Joshua Rodman</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-149035</link>
		<dc:creator>Joshua Rodman</dc:creator>
		<pubDate>Thu, 12 Nov 2009 23:36:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-149035</guid>
		<description>Query performance is normally good, even on very large databases.  If you experienced something else, then there was troubleshooting needed.

All data is indexed.

Can&#039;t speak to pricing.</description>
		<content:encoded><![CDATA[<p>Query performance is normally good, even on very large databases.  If you experienced something else, then there was troubleshooting needed.</p>
<p>All data is indexed.</p>
<p>Can&#8217;t speak to pricing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Rob</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-145921</link>
		<dc:creator>Rob</dc:creator>
		<pubDate>Mon, 26 Oct 2009 15:01:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-145921</guid>
		<description>Splunk is conceptually a great product, but there are a couple of gotchas:

1) Query performance is dismal on even moderately sized data sets. It&#039;s not a database, doesn&#039;t have indexes, etc. I wanted to love Splunk, but the query performance just wasn&#039;t there for exploring data. It&#039;s primarily a batch-mode reporting tool. I could live with that, except...

2) Pricing. Splunk gets expensive fast, and the price is not well-aligned with the amount of value it delivers. I&#039;d say it&#039;s about twice as expensive as it should be.</description>
		<content:encoded><![CDATA[<p>Splunk is conceptually a great product, but there are a couple of gotchas:</p>
<p>1) Query performance is dismal on even moderately sized data sets. It&#8217;s not a database, doesn&#8217;t have indexes, etc. I wanted to love Splunk, but the query performance just wasn&#8217;t there for exploring data. It&#8217;s primarily a batch-mode reporting tool. I could live with that, except&#8230;</p>
<p>2) Pricing. Splunk gets expensive fast, and the price is not well-aligned with the amount of value it delivers. I&#8217;d say it&#8217;s about twice as expensive as it should be.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Collision of big data analytics and splunk &#187; erik</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-145302</link>
		<dc:creator>Collision of big data analytics and splunk &#187; erik</dc:creator>
		<pubDate>Fri, 23 Oct 2009 18:53:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-145302</guid>
		<description>[...] was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into [...]</description>
		<content:encoded><![CDATA[<p>[...] was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Swan</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-144933</link>
		<dc:creator>Erik Swan</dc:creator>
		<pubDate>Wed, 21 Oct 2009 20:51:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-144933</guid>
		<description>Hi Tom, 
Try using a wildcard search like fail*. Also, if you want the conjunction try adding quotes - &quot;failed login&quot;. Usual suspects like NOT, OR work as well. 

I agree that it helps to know whats in your logs - but i find the opposite that i find that heterogenous data is more *interesting*. Splunk doesn&#039;t need any parsing rules or predetermined schema so you can dump in any data. I index all my logs, all my config files, the output from commands like vmstat, iostat, top, network traffic, as well as mail in my inbox, and so on. Its most interesting to splunk across all sorts of datasets as there are often interesting relationships between data. I know people who throw in pitch-by-pitch baseball stats, global windmill power plant output, protein prediction data, and on and on - its not just IT data. 

One thing we are working on is a Guide to finding stuff in your data. I hope this will help people who pick up splunk, throw data at it, quickly find interesting information. I&#039;ll re-post when the guide is ready.

Feel free to bug me if you have specific questions on usage and thanks for the comments.

e</description>
		<content:encoded><![CDATA[<p>Hi Tom,<br />
Try using a wildcard search like fail*. Also, if you want the conjunction try adding quotes &#8211; &#8220;failed login&#8221;. Usual suspects like NOT, OR work as well. </p>
<p>I agree that it helps to know whats in your logs &#8211; but i find the opposite that i find that heterogenous data is more *interesting*. Splunk doesn&#8217;t need any parsing rules or predetermined schema so you can dump in any data. I index all my logs, all my config files, the output from commands like vmstat, iostat, top, network traffic, as well as mail in my inbox, and so on. Its most interesting to splunk across all sorts of datasets as there are often interesting relationships between data. I know people who throw in pitch-by-pitch baseball stats, global windmill power plant output, protein prediction data, and on and on &#8211; its not just IT data. </p>
<p>One thing we are working on is a Guide to finding stuff in your data. I hope this will help people who pick up splunk, throw data at it, quickly find interesting information. I&#8217;ll re-post when the guide is ready.</p>
<p>Feel free to bug me if you have specific questions on usage and thanks for the comments.</p>
<p>e</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom Grabowski</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-144705</link>
		<dc:creator>Tom Grabowski</dc:creator>
		<pubDate>Tue, 20 Oct 2009 19:46:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-144705</guid>
		<description>Great post.  The technology and architecture of Splunk is interesting.  It looks like a useful tool for a sysadmin.

I tried the &#039;failed login&#039; report on some of my system logs and it picked up the messages that had both the words &#039;failed&#039; and &#039;login&#039;, but it didn&#039;t pick up  the applications that had &#039;login failure&#039; or &#039;failed logon&#039;.  I tried the word &#039;fail&#039; but that didn&#039;t register &#039;failed&#039; or &#039;failure&#039; either. 

From what I can tell it is most useful in a homogeneous environment where you are very knowledgeable of the log format and contents before you run the queries.</description>
		<content:encoded><![CDATA[<p>Great post.  The technology and architecture of Splunk is interesting.  It looks like a useful tool for a sysadmin.</p>
<p>I tried the &#8216;failed login&#8217; report on some of my system logs and it picked up the messages that had both the words &#8216;failed&#8217; and &#8216;login&#8217;, but it didn&#8217;t pick up  the applications that had &#8216;login failure&#8217; or &#8216;failed logon&#8217;.  I tried the word &#8216;fail&#8217; but that didn&#8217;t register &#8216;failed&#8217; or &#8216;failure&#8217; either. </p>
<p>From what I can tell it is most useful in a homogeneous environment where you are very knowledgeable of the log format and contents before you run the queries.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik Swan</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-144617</link>
		<dc:creator>Erik Swan</dc:creator>
		<pubDate>Tue, 20 Oct 2009 00:44:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-144617</guid>
		<description>Nice post, I thought i&#039;d try and clarify our search results and tabular data.
As you point out, most of the time you interact with splunk by building and saving searches, usually through a simple and interactive process.

A search can be as simple as &quot;failed login&quot;, which will search our index using keywords much like the way Google will search the web for &quot;failed login&quot;, except that splunk will return log events, config files, network packets, etc., that contain those terms. Unlike a web search engine, Splunk will turn the results of any search into a table where the columns are either auto detected or can be specified by a user in advance. Auto detection works by looking for patterns of data like key=value or key:value, etc. User defined extractions can occur inadvance by specifying a regex or a user can use the UI to define a field.  I&#039;ll skip all they nice ways users can do this, but it&#039;s usually easy to extract out fields if Splunk does not do so automatically.

I use the following example, suppose in Google i could say, &quot;What is the average price of Pad Thai in San Francisco, broken out by Zip code over the past 6 months&quot;. Something like Google would have a hard time of doing that, but that is a typical Splunk search - though analyzing Pad Thai prices in Splunk is not common but someone must have tried ;-). 

The Splunk search language supports piping from one search command to the next. A table is the output of one command, and the input to the next, and is executed in our map reduce framework. The above example &quot;failed login&quot; defaults to &quot;&#124; search failed login&quot;, since a search without a &quot;&#124;&quot; defaults to the &quot;search&quot; command. The results of &quot;failed login&quot; return both the raw data so that users can see their log events, config files, etc, as well as a table. That table can be extremely sparse if the results are heterogenous or dense if all from the same source. Splunk has dozens of useful command to make reporting easy - for example, we could add to the above &quot;failed login &#124; top username&quot; and the first table of results is piped through the &quot;top&quot; command which will quickly calculate an aggregate statistic listing a table of top usernames. Top is just one of many commands that you can easily string together and use to build reporting and analysis for putting on dashboards or using for alerting purposes. We have filtering commands like search, where, dedup. We have enriching commands like eval, extract, lookup, delta, fillnull, etc. We have reporting commands like stats, chart, timechart, rare, etc. And we have other transforming commands for extracting transactions, clustering, sorting, etc..  All very easy to use and work out of the box on any time series data.

Lastly we are looking at providing SQL interface to splunk so that tools that speak odbc/jdbc and query splunk. 

Not sure that this comment helps any, but its important to understand how our search language works out-of-the-box for big data</description>
		<content:encoded><![CDATA[<p>Nice post, I thought i&#8217;d try and clarify our search results and tabular data.<br />
As you point out, most of the time you interact with splunk by building and saving searches, usually through a simple and interactive process.</p>
<p>A search can be as simple as &#8220;failed login&#8221;, which will search our index using keywords much like the way Google will search the web for &#8220;failed login&#8221;, except that splunk will return log events, config files, network packets, etc., that contain those terms. Unlike a web search engine, Splunk will turn the results of any search into a table where the columns are either auto detected or can be specified by a user in advance. Auto detection works by looking for patterns of data like key=value or key:value, etc. User defined extractions can occur inadvance by specifying a regex or a user can use the UI to define a field.  I&#8217;ll skip all they nice ways users can do this, but it&#8217;s usually easy to extract out fields if Splunk does not do so automatically.</p>
<p>I use the following example, suppose in Google i could say, &#8220;What is the average price of Pad Thai in San Francisco, broken out by Zip code over the past 6 months&#8221;. Something like Google would have a hard time of doing that, but that is a typical Splunk search &#8211; though analyzing Pad Thai prices in Splunk is not common but someone must have tried <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> . </p>
<p>The Splunk search language supports piping from one search command to the next. A table is the output of one command, and the input to the next, and is executed in our map reduce framework. The above example &#8220;failed login&#8221; defaults to &#8220;| search failed login&#8221;, since a search without a &#8220;|&#8221; defaults to the &#8220;search&#8221; command. The results of &#8220;failed login&#8221; return both the raw data so that users can see their log events, config files, etc, as well as a table. That table can be extremely sparse if the results are heterogenous or dense if all from the same source. Splunk has dozens of useful command to make reporting easy &#8211; for example, we could add to the above &#8220;failed login | top username&#8221; and the first table of results is piped through the &#8220;top&#8221; command which will quickly calculate an aggregate statistic listing a table of top usernames. Top is just one of many commands that you can easily string together and use to build reporting and analysis for putting on dashboards or using for alerting purposes. We have filtering commands like search, where, dedup. We have enriching commands like eval, extract, lookup, delta, fillnull, etc. We have reporting commands like stats, chart, timechart, rare, etc. And we have other transforming commands for extracting transactions, clustering, sorting, etc..  All very easy to use and work out of the box on any time series data.</p>
<p>Lastly we are looking at providing SQL interface to splunk so that tools that speak odbc/jdbc and query splunk. </p>
<p>Not sure that this comment helps any, but its important to understand how our search language works out-of-the-box for big data</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Christina Noren</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-144528</link>
		<dc:creator>Christina Noren</dc:creator>
		<pubDate>Sun, 18 Oct 2009 20:01:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-144528</guid>
		<description>Hello Curt and Jerome... to clarify the answers to both questions...

Our search execution uses MapReduce for all statistical analysis, whether on-demand when users search against the raw unsummarized index data, or on a scheduled basis into &quot;summary indexes&quot;, our version of materialized views. The latter may be how you picked up that we use MapReduce for indexing.

Re storage - we&#039;ve built our own indexing technology and datastore - we rely on nothing more than the filesystem.

Hope this clarifies.</description>
		<content:encoded><![CDATA[<p>Hello Curt and Jerome&#8230; to clarify the answers to both questions&#8230;</p>
<p>Our search execution uses MapReduce for all statistical analysis, whether on-demand when users search against the raw unsummarized index data, or on a scheduled basis into &#8220;summary indexes&#8221;, our version of materialized views. The latter may be how you picked up that we use MapReduce for indexing.</p>
<p>Re storage &#8211; we&#8217;ve built our own indexing technology and datastore &#8211; we rely on nothing more than the filesystem.</p>
<p>Hope this clarifies.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jerome Pineau</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-144502</link>
		<dc:creator>Jerome Pineau</dc:creator>
		<pubDate>Sun, 18 Oct 2009 16:41:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-144502</guid>
		<description>Sorry if I missed it but do the spelunkers use a commercial backend or did they roll it all on their own on the storage side?
Thanks.
J.</description>
		<content:encoded><![CDATA[<p>Sorry if I missed it but do the spelunkers use a commercial backend or did they roll it all on their own on the storage side?<br />
Thanks.<br />
J.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: General introduction to Splunk &#124; DBMS2 -- DataBase Management System Services</title>
		<link>http://www.dbms2.com/2009/10/18/technical-introduction-to-splunk/comment-page-1/#comment-144499</link>
		<dc:creator>General introduction to Splunk &#124; DBMS2 -- DataBase Management System Services</dc:creator>
		<pubDate>Sun, 18 Oct 2009 16:02:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.dbms2.com/?p=1124#comment-144499</guid>
		<description>[...] More on those in a separate post. [...]</description>
		<content:encoded><![CDATA[<p>[...] More on those in a separate post. [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

<!-- Dynamic page generated in 0.229 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2010-03-09 21:58:37 -->
