<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DBMS 2 : DataBase Management System Services &#187; Columnar database management</title>
	<atom:link href="http://www.dbms2.com/category/database-theory-practice/columnar-database-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dbms2.com</link>
	<description>Choices in data management and analysis</description>
	<lastBuildDate>Thu, 02 Sep 2010 09:06:44 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Vertica&#8217;s innovative architecture for Flash, plus more about temp space than you perhaps wanted to know</title>
		<link>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/</link>
		<comments>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 08:07:33 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Solid-state memory]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2788</guid>
		<description><![CDATA[Vertica is announcing:

Technology it already has 	released*, but has not published any reference architectures 	for
A 	Barney partnership**

In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. But if we look past that kind of all-too-common nonsense, Vertica is highlighting [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">Vertica is announcing:</p>
<ul>
<li>Technology it already has 	released*, but has not published any reference architectures 	for</li>
<li><span style="font-style: normal;">A 	<a href="http://www.strategicmessaging.com/barney-partnerships/2010/08/12/" onclick="javascript:pageTracker._trackPageview('/www.strategicmessaging.com');">Barney</a> partnership**</span></li>
</ul>
<p style="margin-bottom: 0in;"><span style="font-style: normal;">In other words, Vertica has succumbed to the common delusion that it&#8217;s a good idea to put out half-baked press releases the week of TDWI conferences. </span>But if we look past that kind of all-too-common nonsens<span style="font-weight: normal;">e, Vertica is highlighting an interesting technical story, about </span><strong>how the analytic DBMS industry can exploit solid-state memory technology.</strong></p>
<p style="margin-bottom: 0in;"><em>*Upgrades to <a href="../2009/08/04/flexstore-and-the-rest-of-vertica-35/">Vertica FlexStore</a> to handle Flash memory, actually released as part of <a href="../2010/02/22/vertica-4/">Vertica 4.0</a></em></p>
<p style="margin-bottom: 0in;"><em>** With Fusion I/O</em></p>
<p style="margin-bottom: 0in;">To set the context, let&#8217;s recall a few points I&#8217;ve noted in the past:</p>
<ul>
<li><a href="../2010/01/31/flash-pcmsolid-state-memory-disk/">Solid-state 	memory&#8217;s price/throughput tradeoffs obviously make it the future of 	database storage</a>.</li>
<li><a href="../2010/06/25/flash-is-coming-well/">The 	Flash future is coming soon</a>, in part because Flash&#8217;s propensity 	to wear out is overstated. This is especially true in the case of 	modern analytic DBMS, which tend to write to blocks all at once, and 	most particularly the case for append-only systems such as Vertica.</li>
<li><a href="../2010/08/12/teradata-future-product-strategy/">Being 	able to intelligently split databases among various cost tiers of 	storage – e.g. Flash and disk – makes a whole lot of sense</a>.</li>
</ul>
<p style="margin-bottom: 0in;">Taken together, those points tell us:</p>
<p style="margin-bottom: 0in;"><strong>For optimal price/performance, analytic DBMS should support databases that run part on Flash, part on disk.</strong></p>
<p style="margin-bottom: 0in;">While all this is a future for some other analytic DBMS vendors, Vertica is shipping it today.* What&#8217;s more, three aspects of Vertica&#8217;s architecture make it particularly well-suited for hybrid Flash/disk storage, in each case for a similar reason – you can get most of the performance benefit of all-Flash for a relatively low actual investment in Flash chips:  <span id="more-2788"></span></p>
<ul>
<li><strong>Vertica lets you split tables 	by column, </strong><span style="font-weight: normal;">and Vertica 	FlexStore is versatile enough to let you put only the most-used 	columns in Flash. (Vertica offers a figure that 85% of usage calls 	on only 15% of columns, but I don&#8217;t know how rigorously grounded 	those numbers are.)</span></li>
<li>To the extent that Vertica data is<span style="font-weight: normal;"> <a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">more </a></span><a href="../2008/09/24/vertica-finally-spells-out-its-compression-claims/">compressed</a> than many of Vertica&#8217;s competitors&#8217; (which it probably is, debates 	over the magnitude of Vertica&#8217;s advantage notwithstanding), the 	total storage-hardware cost of sticking stuff in Flash is less when 	you use Vertica than with other systems.</li>
<li>Vertica has <span style="font-weight: normal;">relatively 	less need for </span><strong>temp space</strong> than some other systems. 	(Vertica uses figures of &lt;20% of total storage, vs. 30%+ for some 	other systems.) If you want to use Flash for temp space, so as to 	accelerate your toughest queries, that can save you some cash …</li>
<li>… and by the way, <strong>temp space 	is an especially good use of Flash, </strong>because <strong>temp space is 	accessed in a less sequential manner than data storage is.</strong></li>
</ul>
<p style="margin-bottom: 0in;">The least obvious of those points are about temp space; I only understood the particulars when Vertica development chief Shilpa Lawande explained them to me Thursday.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>* At least in theory; customer adoption may be a different matter.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">But before drilling down on temp space, let me first note that there&#8217;s one offsetting factor to all those “We need somewhat less Flash than the other guys” Vertica advantages. Like all serious databases, a Vertica installation keeps two or more copies of all data, to that there&#8217;s no storage single point of failure. In a flexible system like Vertica, you can put one copy on Flash and one on disk. But if you do that in Vertica, you forgo fully exploiting one possible benefit of Vertica&#8217;s architecture – the ability to store different copies of a column in different orders, which are beneficial for accelerating different groups of queries.*</p>
<p style="margin-bottom: 0in;"><em>*More precisely, you don&#8217;t get the full benefits of Flash acceleration for every query touching those columns.</em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">OK. Back to temp space. There are four kinds of things you can put in storage if you&#8217;re running a database management system:</p>
<ul>
<li>The <strong>software</strong> itself.</li>
<li><span style="font-weight: normal;">Persistent </span><strong>data. </strong><span style="font-weight: normal;">(I.e., tables, 	if the DBMS you&#8217;re running is relational.)</span></li>
<li><strong>Metadata,</strong> especially the 	kind that lets you find data &#8211;<strong> indexes,</strong> zone maps, catalogs, 	etc.</li>
<li><strong>Temporary data constructs</strong> built as part of, say, a s<span style="font-weight: normal;">ort-merge 	join. These, by definition, are what populate temp space.</span></li>
</ul>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Just to be clear, those constructs are NOT temporary tables of the sort created by, say, Microstrategy; such tables are handled like any other data. Rather, they are ephemeral creat<span style="font-weight: normal;">ions and, so far as I can tell, not tables at all. </span></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Vertica offered two theories as to why its DBMS requires less temp space than competitors do:</p>
<ul>
<li>To the extent data is decompressed 	before being operated on in memory by the DBMS, that decompression 	would of course also apply to temp space as well. Vertica prides 	itself on <strong>keeping data compressed</strong> all the way through, and 	seems to get away with smaller temp space allocations as a benefit.</li>
<li>Since Vertica can store columns in 	expedient sort orders, it does less sorting overall, and sorting is 	a big use of temp space.</li>
</ul>
<p style="margin-bottom: 0in;">Obviously, no matter which DBMS you use, the amount of temp space you need is surely workload-dependent. Even so, Vertica&#8217;s claim to something of an advantage seems legit.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><em>Truth be told, I&#8217;m not convinced the savings involved are great enough to </em>matter<em> a whole lot – but it&#8217;s a fun subject to think through. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </em></p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">And finally: One of my biggest surprises since starting to look at analytic-DBMS-on-Flash has been the centrality of temp space. Talking to Vertica Thursday, I finally uncovered a key reason why: <strong>Temp space tends to be accessed via multiple streams of data at once.</strong> I&#8217;m still struggling with WHY that is true, with two reasons suggested being:</p>
<ul>
<li>Temp space can be accessed by 	multiple operations at once. (But isn&#8217;t that also true of the rest 	of storage?)</li>
<li>Merge sorts, a common use of temp 	space, read multiple streams of data. (Couldn&#8217;t you tweak your 	software to make that not be true?)</li>
</ul>
<p style="margin-bottom: 0in;">But if we grant that temp space naturally is accessed in multiple places at once – well, that&#8217;s a lot like random I/O, and <a href="../2005/11/13/breaking-the-disk-speed-barrier/">if you&#8217;re doing a lot of random reads, you&#8217;d love to use something other than spinning disk</a>.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/08/16/vertica-flash-temp-space/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>What kinds of data warehouse load latency are practical?</title>
		<link>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/</link>
		<comments>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 12:15:17 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Aster Data]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[IBM and DB2]]></category>
		<category><![CDATA[Netezza]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2319</guid>
		<description><![CDATA[I took advantage of my recent conversations with Netezza and IBM to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:

Subsecond load latency is 	substantially impossible. Doing that amounts to OLTP.
5 seconds or so is doable with 	aggressive investment and tuning.
Several minute load latency is 	pretty easy.
10-15 [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I took advantage of my recent conversations with <a href="http://www.dbms2.com/2010/06/21/netezza-database-software-technology-overview/" >Netezza</a> and <a href="http://www.dbms2.com/2010/06/21/netezza-ibm-db2-compression/" >IBM</a> to discuss what kinds of data warehouse load latency were practical. In both cases I got the impression:</p>
<ul>
<li>Subsecond load latency is 	substantially impossible. Doing that amounts to OLTP.</li>
<li>5 seconds or so is doable with 	aggressive investment and tuning.</li>
<li>Several minute load latency is 	pretty easy.</li>
<li>10-15 minute latency or longer is 	now very routine.</li>
</ul>
<p style="margin-bottom: 0in;">There&#8217;s generally a throughput/latency tradeoff, so if you want very low latency with good throughput, you may have to throw a lot of hardware at the problem.</p>
<p style="margin-bottom: 0in;">I&#8217;d expect to hear similar things from any other vendor with reasonably mature analytic DBMS technology. Low-latency load is a problem for columnar systems, but both <a href="http://www.dbms2.com/2008/08/12/vertica-paraccel-exasol/" >Vertica <span style="font-style: normal;">and</span> ParAccel</a> designed in workarounds from the getgo. Aster Data probably didn&#8217;t meet these criteria until <a href="http://www.dbms2.com/2009/10/30/aster-data-application-server-ncluster/" >Version 4.0</a>, its old “<a href="http://www.dbms2.com/2008/10/22/aster-data-systems-ncluster/" >frontline</a>” positioning notwithstanding, but I think it does now.</p>
<p style="margin-bottom: 0in;"><em><strong>Related link</strong></em></p>
<ul>
<li>
<p style="margin-bottom: 0in;"><a href="http://www.dbms2.com/2009/09/10/analytic-speed-latency/" >Just what is your need for speed</a> anyway?</p>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/21/data-warehouse-load-latency/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Ingres VectorWise technical highlights</title>
		<link>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/</link>
		<comments>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 11:28:18 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Benchmarks and POCs]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Ingres]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[VectorWise]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2261</guid>
		<description><![CDATA[After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what I&#8217;d written in the past about VectorWise had been and remained accurate, so I focused on filling in the gaps. Highlights included:  [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">After working through problems w/ travel, cell phones, and so on, Peter Boncz of VectorWise finally caught up with me for a regrettably brief call. Peter gave me the strong impression that what <a href="http://www.dbms2.com/2009/08/04/vectorwise-ingres-and-monetdb/" >I&#8217;d written in the past about VectorWise</a> had been and remained accurate, so I focused on filling in the gaps. Highlights included:  <span id="more-2261"></span></p>
<ul>
<li>VectorWise is indeed a 	shared-everything analytic DBMS.</li>
<li>The VectorWise front-end is 	Ingres. Ingres VectorWise supports almost all SQL that Ingres does (there 	are a few edge-case exceptions).</li>
<li>Conversely, Ingres VectorWise 	doesn&#8217;t support any SQL Ingres doesn&#8217;t, most notably SQL-99 	Analytics. Naturally, SQL-99 Analytics is a roadmap item for 	Ingres/VectorWise.</li>
<li>Ingres VectorWise 1.0 is pretty 	purely columnar. There&#8217;s a bit of <a href="http://www.dbms2.com/2009/08/04/pax-analytica-row-and-column-stores-begin-to-come-together/" >PAX</a>, but it&#8217;s mainly 	automagic/under the covers. The one user-controlled exception I 	understood was that one can ensure that composite keys are stored 	together.</li>
<li>The main Ingres VectorWise 	performance secret sauce ingredients we touched on were:
<ul>
<li>Vectorization of operations (hence VectorWise&#8217;s name).</li>
<li>Compression that is tuned for 	speed rather than to minimize storage utilization.</li>
</ul>
</li>
<li>We unfortunately didn&#8217;t have time 	to revisit the other big part of the Ingres VectorWise performance 	story, namely clever design for modern microprocessor architectures. 	High-level generalities about that do pervade <a href="http://www.dbms2.com/2010/06/10/vectorwise-press-release/" >the Ingres 	VectorWise press release</a>,<span style="font-style: normal;"> but – 	well, they&#8217;re very high level.</span></li>
<li>Unlike Vertica but like most other 	columnar DBMS vendors, Ingres VectorWise wants you to store your 	data once. You can index-organize the data. You can also organize 	multiple tables in the same order, to make joins among them fast.</li>
<li>Support for actual join indexes is an Ingres VectorWise roadmap item.</li>
<li>As do ever more analytic DBMS, 	Ingres VectorWise has something akin to <a href="http://www.dbms2.com/2006/09/20/netezza-vs-conventional-data-warehousing-rdbms/" >Netezza zone maps</a>.</li>
<li>When I asked 	Peter what had changed most from the initial VectorWise development 	plan, other than the above, he basically said that their performance 	priorities had shifted a bit. Specifically, he said.
<ul>
<li>They had 	originally been “blinded” (his word) by the TPC-H benchmark, but 	figured out that they were overly focused on it. (<a href="http://www.dbms2.com/2009/06/22/the-tpc-h-benchmark-is-a-blight-upon-the-industry/" >Well, duh</a>.)</li>
<li>They learned 	about the importance of other things such as data loading speeds.</li>
</ul>
</li>
</ul>
<p style="margin-bottom: 0in;">
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/06/11/ingres-vectorwise-technical-highlights/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More on Sybase IQ, including Version 15.2</title>
		<link>http://www.dbms2.com/2010/05/23/sybase-iq-15/</link>
		<comments>http://www.dbms2.com/2010/05/23/sybase-iq-15/#comments</comments>
		<pubDate>Sun, 23 May 2010 08:34:28 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Application areas]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Investment research and trading]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Petabyte-scale data management]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Telecommunications]]></category>
		<category><![CDATA[Text]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2186</guid>
		<description><![CDATA[Back in March, Sybase was kind enough to give me permission to post a slide deck about Sybase IQ. Well, I&#8217;m finally getting around to doing so. Highlights include but are not limited to:

Slide 2 has some market success figures and so on. (&#62;3100 copies at &#62;1800 users, &#62;200 sales last year)
Slides 6-11 give more [...]]]></description>
			<content:encoded><![CDATA[<p>Back in March, Sybase was kind enough to give me permission to post <a href="http://www.monash.com/uploads/Sybase-IQ-slides-March-2010.pdf" onclick="javascript:pageTracker._trackPageview('/www.monash.com');">a slide deck about Sybase IQ</a>. Well, I&#8217;m finally getting around to doing so. Highlights include but are not limited to:</p>
<ul>
<li>Slide 2 has some market success figures and so on. (&gt;3100 copies at &gt;1800 users, &gt;200 sales last year)</li>
<li>Slides 6-11 give more detail on Sybase&#8217;s indexing and data access methods than I put into my recent <a href="http://www.dbms2.com/2010/05/17/technical-basics-of-sybase-iq/" >technical basics of Sybase IQ</a> post.</li>
<li>Slide 16 reminds us that in-database data mining is quite competitive with what <a href="http://www.dbms2.com/2010/05/15/further-clarifying-in-database-mpp-sas/" >SAS has actually delivered with its DBMS partners</a>, even if it doesn&#8217;t have the nice architectural approach of <a href="http://www.dbms2.com/2010/02/22/netezza-twinfin/" >Aster or Netezza</a>. (I.e., Sybase IQ&#8217;s more-than-SQL advanced analytics story relies on C++ UDFs  &#8212; User Defined Functions &#8212; running in-process with the DBMS.) In particular, there&#8217;s a data mining/predictive analytics library &#8212; modeling and scoring both &#8212; licensed from a small third party.</li>
<li>A number of the other later slides also have quite a bit of technical crunch. (More on some of those points below too.)</li>
</ul>
<p>Sybase IQ may have a bit of a funky architecture (e.g., no MPP), but the age of the product and the substantial revenue it generates have allowed Sybase to put in a bunch of product features that newer vendors haven&#8217;t gotten around to yet.</p>
<p>More recently, Sybase volunteered permission for me to preannounce <strong>Sybase IQ Version 15.2</strong> by a few days (it&#8217;s scheduled to come out this week). <span id="more-2186"></span>Sybase IQ seems to be focused on large part on the government/intelligent market, with three major features being:</p>
<ul>
<li>A kind of <strong>data federation,</strong> querying external databases, that makes sense mainly in the context of rigorous security rules. (I find that confusing, since Sybase IQ&#8217;s indexes tend to hold all the information in the database, but I didn&#8217;t push the point.)</li>
<li>An upgrade to Sybase IQ&#8217;s built-in <strong>text indexing.</strong> I doubt anybody would confuse this with best-of-breed text search, but evidently that intelligence community is satisfied with less. But even before 15.2, Sybase IQ could do both LIKE and WHERE CONTAINS searching.</li>
<li>Improved LOB (Large OBject) management.</li>
</ul>
<p>One part of my Sybase IQ conversations I haven&#8217;t blogged yet in much details is <strong>scale-out, concurrency, </strong>and<strong> &#8220;multiplexing.&#8221;</strong></p>
<ul>
<li>Sybase feels that Sybase IQ&#8217;s competitive sweet spot, especially in terms of performance, is reached when there are 20 or more concurrent queries.</li>
<li>In general, Sybase asserts that a shared-everything architecture is great for concurrency &#8212; just run different queries on different boxes, all against the same data.</li>
<li>The ability to use a bunch of boxes run Sybase IQ is called &#8220;multiplexing.&#8221;  This is a chargeable option, without which one is limited to a single SMP box.</li>
<li>Just under 20% of the top 250 Sybase IQ customers have multi-node scale-out configuration (vs. single-node SMP scale-up). And around 8% have it overall.</li>
<li>Sybase IQ nodes can be heterogeneous (e.g., in compute power).</li>
<li>Sybase IQ nodes can be dedicated to be read-only, or can be read-write. Indeed, Sybase IQ nodes can change roles dynamically, for example becoming write-only during nightly batch load. (I didn&#8217;t clarify whether all this applies just to nodes-as-boxes, or if some parts apply to specific processors or cores within the same box.)</li>
<li>Sybase noted that data mart outsourcers can offer differentiated SLAs (Service Level Agreements) depending upon which nodes they give which customers access to.</li>
<li>Most Sybase IQ installations start at 8 cores or more. The Sybase IQ Small Business Edition, limited to 4 cores, is not a big seller.</li>
<li>Sybase IQ has a straightforward round-robin load-balancing story via third-party technology.</li>
</ul>
<p>Finally, along the way in the discussions I picked up various tidbits about the Sybase IQ user base. Unfortunately, Sybase is pretty vague in discussing database sizes &#8212; are they user data? Are they compressed? What do the numbers mean? With that huge caveat:</p>
<ul>
<li>By some metric or other, a couple of classified customers are approaching petabyte scale.</li>
<li>The largest commercial Sybase IQ customer &#8212; a credit card company &#8212; has a couple hundred terabytes or so.</li>
<li>The largest financial services Sybase IQ databases are 50-70 terabytes. This sounds low, frankly, so maybe those are compressed figures, with user data being 200+ terabytes. But I&#8217;m just speculating there.</li>
<li>Sybase IQ has a little less than 100 customers in the &#8220;data aggregator&#8221; market, which is a lot like what I call &#8220;data mart outsourcer.&#8221;</li>
<li><a href="http://www.dbms2.com/2009/08/25/sybase-iq-technical-highlights/" >Sybase IQ&#8217;s ILM technology</a> is a chargeable option, with Sybase being &#8220;cautious&#8221; about sales. Compliance is a big market driver for it.</li>
<li>Sybase IQ&#8217;s #1 vertical market is financial services. Other biggies are government, telecom, marketing services, and to some extent retail.</li>
<li>As of February, there were 40-45 production users of Sybase IQ 15.0 and 15.1.</li>
</ul>
<p><!-- 		@page { margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/23/sybase-iq-15/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Technical basics of Sybase IQ</title>
		<link>http://www.dbms2.com/2010/05/17/technical-basics-of-sybase-iq/</link>
		<comments>http://www.dbms2.com/2010/05/17/technical-basics-of-sybase-iq/#comments</comments>
		<pubDate>Mon, 17 May 2010 05:18:44 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Database compression]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2163</guid>
		<description><![CDATA[The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I&#8217;ve been slow about posting based on those briefings. But what with Sybase being acquired by SAP, Sybase having an analyst meeting this week, and other reasons – well, this [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">The Sybase IQ folks had been rather slow about briefing me, at least with respect to crunch. They finally fixed that in February. Since then, I&#8217;ve been slow about posting based on those briefings. But what with <a href="http://www.dbms2.com/2010/05/13/sap-sybase-reactions/" >Sybase being acquired by SAP</a>, Sybase having an analyst meeting this week, and other reasons – well, this seems like a good time to post about Sybase IQ. <img src='http://www.dbms2.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p style="margin-bottom: 0in;">For starters, Sybase IQ is not just a bitmapped system, but it&#8217;s also not all that closely akin to C-Store or Vertica. In particular,</p>
<ul>
<li>Sybase IQ stores data in <strong>columns</strong> – like, for example, Vertica.</li>
<li>Sybase IQ relies on <strong>indexes</strong> to retrieve data – unlike, for example, Vertica, in which the 	column pretty much is the index.</li>
<li>However, columns themselves can be 	used as indexes in the usual <a href="http://www.dbms2.com/2007/01/22/are-row-oriented-rdbms-obsolete/" >Vertica</a>-like way.</li>
<li>Most of Sybase IQ&#8217;s indexes are 	<strong>bitmaps</strong>, or a lot like bitmaps, ala&#8217; the original IQ product.</li>
<li>Some of Sybase IQ&#8217;s indexes are 	not at all like bitmaps, but more like <strong>B-trees.</strong></li>
<li>In general, 	Sybase recommends that you put multiple indexes on each column 	because &#8212; what the heck – each one of them is pretty small. (In 	particular, the bitmap-like indexes are highly compressible.) 	Together, indexes tend to take up &lt;10% of Sybase IQ storage 	space.</li>
</ul>
<p style="margin-bottom: 0in;"><span id="more-2163"></span>Sybase IQ is not immune to <a href="http://www.dbms2.com/2010/02/25/sybase-adaptive-server-enterprise-as/" >Sybase&#8217;s confusing choices in version numbering</a>. Thus:</p>
<ul>
<li>Sybase IQ Version 15.2 will be 	announced and released soon.</li>
<li>Sybase IQ Version 15.1 was a set 	of “binary replacements” rather than an “upgrade release” 	for Sybase IQ Version 15.0.</li>
<li>Sybase IQ Version 15.0 was 	launched in February, 2009 and released for general availability 	some time thereafter.</li>
<li>The prior version of Sybase IQ was 	12.7.</li>
<li>GA isn&#8217;t always GA, and some 	language localizations and so on weren&#8217;t ready for a while. 	Consequently, a lot of Sybase IQ sales continued to be of Version 	12.7 even in the second half of 2009.</li>
</ul>
<p style="margin-bottom: 0in;">Now let&#8217;s get down to some technical particulars.</p>
<p style="margin-bottom: 0in;"><strong>Sybase IQ columns are always stored in RowID order.</strong> However, RowIDs are logical and not physical, and hence take up little disk space. A small amount of per-page metadata lets you find the specific cell you want. (Cells are commonly fixed-width, in which case finding the cell of choice is a simple calculation.) So RowIDs are not much of an I/O overhead issue, although I&#8217;m not sure at what point they get unpacked and start needing to be carried around as the data travels through silicon.</p>
<p style="margin-bottom: 0in;">Sybase IQ has 9 or so kinds of indexes. <strong>The choice of index has a lot to do with cardinality.</strong> In the extreme low-cardinality case, a simple bitmap might do. With intermediate cardinality, you might go to a modified kind of bitmap – e.g., if there there are 2^16 possible values, you can represent a value in 16 bits, and bitmap operations are approximately 16 times as costly as if the number of possible values were only 2^1. For very high cardinality, there&#8217;s a B-tree-like index called “High Group”.</p>
<p style="margin-bottom: 0in;"><em>Note: Surely every Sybase index name, at some time, made sense to at least one engineer.</em></p>
<p style="margin-bottom: 0in;">Sybase IQ&#8217;s <strong>execution engine</strong> does seem to rely quite a bit on bitmaps. E.g., intermediate query results are stored as bitmaps, which helps them play nicely with each other and with many of the indexes. Sybase claims that Sybase IQ&#8217;s bitmap orientation often makes WHERE clauses execute very quickly. Sybase IQ reoptimizes queries after WHERE clauses are evaluated. Complex expressions are, when possible, evaluated once per unique value, not once per row.</p>
<p style="margin-bottom: 0in;">Speaking of unique values – Sybase IQ&#8217;s <strong>compression</strong> story doesn&#8217;t currently match that of some other columnar products, but it seems to stack up pretty well against row-based systems. In particular:</p>
<ul>
<li>Sybase says IQ compression is most 	commonly 50-70%.</li>
<li>Sybase further says that, in most 	cases, compression falls into the range 40-85%.</li>
<li>Page-level LZ compression is 	decompressed upon read (duh).</li>
<li>Dictionary/token compression may 	be decompressed later. For example, GROUP BYs are commonly done on 	tokens, and JOINs sometimes are.</li>
</ul>
<p style="margin-bottom: 0in;">Sybase IQ boasts <strong>pipelining,</strong> in two senses. First, IQ tries to read pages for multiple queries at the same time. Second, Sybase IQ tries to <strong>prefetch</strong> pages into cache before they&#8217;re needed. Sybase points out that these prefetched pages have the WHERE clauses already executed, and that no extra baggage is being dragged into cache that doesn&#8217;t need to be there.</p>
<p style="margin-bottom: 0in;">Highlights of Sybase IQ&#8217;s update and load story include:</p>
<ul>
<li>Sybase IQ is optimized for large 	bulk loads. No surprise there.</li>
<li>Sybase IQ has several options for 	microbatching and/or trickle feeds.
<ul>
<li>The coolest is <a href="http://www.dbms2.com/2010/02/05/sybase-aleri-rap/" >Sybase RAP</a>.</li>
<li>More generally, microbatching is 	based on Change Data Capture. Sybase has various ETL/replication 	technologies, creating a confusing array of options in that regard.</li>
<li>Sybase says that one customer is 	microbatching 1000s of rows with 1 minute latency.</li>
</ul>
</li>
<li>There&#8217;s something about 	snapshotting and hence loads not interfering with queries. I&#8217;m not 	clear on the details.</li>
<li>Assuming you have enough 	parallelism, you can dedicate some nodes to queries while others are 	dedicated to load. (Recall that Sybase IQ is shared-disk.)</li>
</ul>
<p style="margin-bottom: 0in;">I&#8217;ve lost track a little bit as to which “advanced analytics” functionality is in Sybase IQ 15.1, which will be in 15.2, and what&#8217;s a future beyond that, which is a great excuse for me to leave it out of what has already become a rather long post. But anyhow, except perhaps for the future stuff and/or some time series functionality, none of it seems terribly advanced. Sybase IQ does have two stored procedure languages, namely the ones for Sybase ASE (T-SQL) and for Sybase Anywhere or Adaptive Server Anywhere or whatever it&#8217;s called this week (Watcom SQL, which Sybase asserts is similar to the ANSI SQL stored procedure language).</p>
<p style="margin-bottom: 0in;">Similarly, I&#8217;ll leave a lot of other stuff out as well, and for now stop here.</p>
<p style="margin-bottom: 0in;"><em><strong>Related links</strong></em></p>
<ul>
<li>I haven&#8217;t repeated every detail 	here from my <a href="../2009/08/25/sybase-iq-technical-highlights/">August, 	2009 technical post about Sybase IQ</a></li>
<li>And here&#8217;s <a href="http://www.dbms2.com/2010/05/23/sybase-iq-15/" >more about Sybase IQ</a>, including some Sybase IQ 15.2 features, some market penetration info, and a slide deck</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/17/technical-basics-of-sybase-iq/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Further quick SAP/Sybase reactions</title>
		<link>http://www.dbms2.com/2010/05/13/sap-sybase-reactions/</link>
		<comments>http://www.dbms2.com/2010/05/13/sap-sybase-reactions/#comments</comments>
		<pubDate>Thu, 13 May 2010 15:30:46 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Aleri and Coral8]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Complex event processing (CEP)]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[Mid-range]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2128</guid>
		<description><![CDATA[Raj Nathan of Sybase has been calling around to chat quickly about the SAP/Sybase deal and related matters. Talking with Raj didn&#8217;t change any of my initial reactions to SAP&#8217;s acquisition of Sybase. I also didn&#8217;t bother Raj with too many hard questions, as he was clearly in call-and-reassure mode, reaching out to customers and [...]]]></description>
			<content:encoded><![CDATA[<p>Raj Nathan of Sybase has been calling around to chat quickly about the SAP/Sybase deal and related matters. Talking with Raj didn&#8217;t change any of <a href="http://www.dbms2.com/2010/05/12/sap-acquire-sybase/" >my initial reactions to SAP&#8217;s acquisition of Sybase</a>. I also didn&#8217;t bother Raj with too many hard questions, as he was clearly in call-and-reassure mode, reaching out to customers and influencers alike.</p>
<p>That said,   <span id="more-2128"></span></p>
<ul>
<li>Raj said that Sybase&#8217;s Aleri acquisition was, if anything, tracking ahead of expectations.</li>
<li>Raj didn&#8217;t seem the slightest bit focused on the Coral8/Aleri CEP-based BI strategy that John Morell had long championed.</li>
<li>Raj reminded me that Sybase SQL Anywhere has numerous OEMs, not just on the true desktop/laptop or smaller, but also in a return to its server/workgroup roots. Sybase SQL Anywhere even added geospatial indexing recently.</li>
</ul>
<p>Raj also spoke glowingly of SAP&#8217;s in-memory database technology and the potential for Sybase of same &#8212; until I asked a follow-up question. At that point, he confessed that he didn&#8217;t really know much about about SAP&#8217;s in-memory database technology yet. As I said before, I believe SAP is fairly sincere about its belief that its in-memory database technology will conquer the world &#8212; but this is a naive and poorly-founded opinion even so.</p>
<p>One tidbit I did get is that SAP&#8217;s in-memory database technology is not just <a href="http://www.dbms2.com/2006/09/20/saps-bi-accelerator/" >son-of-T-REX</a>. A Korean (Raj thinks) company SAP had acquired is also in the mix. Raj also had the impression SAP&#8217;s in-memory technology can do rows, columns, or hybrid structures. On the one hand, that makes sense. On the other, it&#8217;s not a perfect fit with <a href="http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/" >what Hasso Plattner said last year</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/13/sap-sybase-reactions/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Quick reactions to SAP acquiring Sybase</title>
		<link>http://www.dbms2.com/2010/05/12/sap-acquire-sybase/</link>
		<comments>http://www.dbms2.com/2010/05/12/sap-acquire-sybase/#comments</comments>
		<pubDate>Wed, 12 May 2010 23:48:37 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[ANTs Software]]></category>
		<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Business Objects]]></category>
		<category><![CDATA[Business intelligence]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[In-memory DBMS]]></category>
		<category><![CDATA[Memory-centric data management]]></category>
		<category><![CDATA[OLTP]]></category>
		<category><![CDATA[ParAccel]]></category>
		<category><![CDATA[SAP AG]]></category>
		<category><![CDATA[Sybase]]></category>
		<category><![CDATA[Theory and architecture]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=2105</guid>
		<description><![CDATA[SAP is acquiring Sybase. On the conference call SAP said Sybase would be run as a separate division of SAP (no surprise). Most of the focus was on Sybase&#8217;s mobile technology, which is forecast at &#62;$400 million in 2010 revenues (which would be 30%ish of the total). My quick reactions include: 

Sybase&#8217;s main businesses are:

Classic [...]]]></description>
			<content:encoded><![CDATA[<p>SAP is acquiring Sybase. On the conference call SAP said Sybase would be run as a separate division of SAP (no surprise). Most of the focus was on Sybase&#8217;s mobile technology, which is forecast at &gt;$400 million in 2010 revenues (which would be 30%ish of the total). My quick reactions include: <span id="more-2105"></span></p>
<ul>
<li>Sybase&#8217;s main businesses are:
<ul>
<li><strong>Classic OLTP DBMS</strong> (Sybase ASE, for Adapative Server Enterprise, unless I&#8217;ve missed yet another name change).</li>
<li><strong>Analytic technology</strong> &#8212; mainly <strong>Sybase IQ,</strong> but more generally <a href="http://www.dbms2.com/2010/02/05/sybase-aleri-rap/" >Sybase RAP</a>.</li>
<li><strong>Mobile technology. </strong>(The frequently renamed small DBMS SQL Anywhere was the foundational product of and still is included in the mobile division.)</li>
</ul>
</li>
<li><a href="http://www.dbms2.com/2009/07/07/hasso-plattner-calls-for-in-memory-oltp-column-stores/" >SAP&#8217;s thoughts on in-memory database management</a> are interesting. However, I think SAP&#8217;s oft-repeated claim that it has a lot of important in-memory database technology to bring to Sybase (or for that matter SAP customers) is mainly smoke and mirrors. <strong>Cool data access methods, good niche database products, and broadly applicable multi-domain DBMS innovations are three different things.</strong> Granting that SAP probably has the first and thinks it has the second is not the same as giving it much credence for having the third.</li>
<li>SAP claims that, 15 years after its refusal to support Sybase turned Sybase into a DBMS also-ran, it by now is &#8220;relatively simple&#8221; to port SAP&#8217;s apps to Sybase ASE, and that they will make that happen. I actually believe that <strong>SAP&#8217;s apps will soon run on Sybase ASE,</strong> where by &#8220;soon&#8221; I mean &#8220;in a couple of years for no-apologies general availability.&#8221; (Certifying a DBMS for SAP is a long process.) The main missing features &#8212; e.g., row-level locking &#8212; were already put into Sybase back in the last millenium. Nor could there be fundamental architectural problems that keep SAP from supporting Sybase ASE, or else SAP couldn&#8217;t have supported Microsoft SQL Server (which, long ago, was a Sybase fork).</li>
<li><strong>I don&#8217;t see any market or competitive dynamics that would lead the SAP acquisition to hurt Sybase&#8217;s ASE or mobile businesses. </strong>General merger management mishegas is, of course, always a possibility.</li>
<li>SAP Business Objects partners with Sybase IQ&#8217;s competitors. That could be a problem. However, <strong>coopetition is pretty strong in the business intelligence market</strong>. I don&#8217;t think any of SAP Business Objects, IBM Cognos, or Oracle Business Intelligence are much held back from partnering by competitive dislike of their parent companies.</li>
<li><strong>The rest of SAP might be able to drum up some extra business for Sybase IQ.</strong></li>
<li><strong>It would be natural for IBM/Cognos to now buy a columnar DBMS of its own.</strong> Vertica is an obvious first choice. ParAccel would surely come much cheaper. Since ParAccel has little chance of surviving as an independent company &#8212; <a href="http://www.dbms2.com/2010/04/16/story-of-an-analytic-dbms-evaluation/" >too immature</a> and too little differentiation to overcome that &#8212; I&#8217;d expect ParAccel&#8217;s board to jump at the chance to sell out.</li>
<li>It would be interesting if SAP Business Objects would revive the <a href="http://www.dbms2.com/2009/03/25/aleri-update/" >CEP-based BI</a> idea.</li>
<li>I gather Sybase&#8217;s AnswersAnywhere concept network/object model-based natural language/speech recognition technology never went anywhere. Unsurprising (it seemed like it needed too much hand-building to scale semantically), but regrettable even so.</li>
<li>I don&#8217;t see anything in this acquisition that would revive PowerBuilder (Sybase&#8217;s Visual Basic competitor), Sybase&#8217;s CASE (Computer-Aided Software Engineering) tools, and so on.</li>
<li>And on the personal side &#8212; I&#8217;ll probably lose Sybase as a customer due to this merger, but it could have been worse. A lot of vendors smaller than Sybase are bigger customers for Monash Research.</li>
</ul>
<p><em>Edit: Right after I posted this, I saw email from Sybase clarifying that Sybase&#8217;s in-memory technology, while slightly influenced by some ANTs IP Sybase bought non-exclusive rights to, is essentially home-grown. That&#8217;s what I thought, but the call sounded like it was saying something different.</em></p>
<p><strong><em>Further coverage of SAP/Sybase:</em></strong></p>
<ul>
<li><a href="http://www.dbms2.com/2010/05/13/sap-database-proliferation/" >SAP believes in database proliferation</a></li>
<li><a href="http://www.dbms2.com/2010/05/13/sap-sybase-reactions/" >More quick reactions to SAP/Sybase</a></li>
<li><a href="http://www.dbms2.com/2010/05/17/technical-basics-of-sybase-iq/" >Technical basics of Sybase IQ</a><strong><em><br />
</em></strong></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/05/12/sap-acquire-sybase/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Vertica update</title>
		<link>http://www.dbms2.com/2010/04/29/vertica-zynga/</link>
		<comments>http://www.dbms2.com/2010/04/29/vertica-zynga/#comments</comments>
		<pubDate>Fri, 30 Apr 2010 03:44:59 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Games and virtual worlds]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Specific users]]></category>
		<category><![CDATA[Vertica Systems]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1973</guid>
		<description><![CDATA[Last month, Vertica&#8217;s CEO Ralph Breslauer quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He&#8217;s a guy I&#8217;ve never heard of before named Chris Lynch, apparently quite the sales machine builder. The most substance I&#8217;ve found is a pair [...]]]></description>
			<content:encoded><![CDATA[<p>Last month, <a href="http://www.dbms2.com/2010/03/19/vertica-update-4/" >Vertica&#8217;s CEO Ralph Breslauer</a> quit,* and Vertica made it sound like there would be a new CEO late in April. And indeed, as of April 29, there was. He&#8217;s a guy I&#8217;ve never heard of before named <a href="http://www.vertica.com/company/news/Vertica-appoints-Christopher-Lynch-new-president-and-CEO" onclick="javascript:pageTracker._trackPageview('/www.vertica.com');">Chris Lynch</a>, apparently quite the sales machine builder. The most substance I&#8217;ve found is a pair of <a href="http://www.masshightech.com/stories/2010/04/26/daily40-Vertica-names-Acopia-vet-Lynch-to-CEO-post.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');">Mass High Tech</a> <a href="http://www.masshightech.com/stories/2010/04/26/daily42-New-Vertica-CEO-Lynch-talks-of-plans-to-hire.html" onclick="javascript:pageTracker._trackPageview('/www.masshightech.com');">articles</a> &#8212; the latter exceedingly typo-ridden &#8212; to the general effect that:</p>
<ul>
<li>Vertica plans to build a massive, world-conquering sales force.</li>
<li>If Vertica dips back into negative cash flow to do that and has to raise more venture capital, so be it.</li>
<li>&#8220;Triple-digit&#8221; revenue growth is expected for this year.</li>
</ul>
<p><em><span id="more-1973"></span>*I&#8217;ve since heard more both from Ralph and his former colleagues, and I&#8217;m comfortable taking the move more or less at face value &#8212; for some reasons he doesn&#8217;t want to spell out, Ralph really wanted to move back home to South Africa.</em></p>
<p>While they were at it, Vertica also put out a press release reporting very good <a href="http://www.vertica.com/company/news/worlds-top-social-gaming-companies-tap-Vertica" onclick="javascript:pageTracker._trackPageview('/www.vertica.com');">success in the social gaming market</a>. The biggest and best known of the bunch is Zynga. Three months ago, <a href="http://tdwi.org/Blogs/WayneEckerson/2010/02/Zynga.aspx" onclick="javascript:pageTracker._trackPageview('/tdwi.org');">Wayne Eckerson</a> had figures of 3 TB/day added to the database, 200 nodes, and &gt;40 million users. Now Zynga is using a figure of &gt;65 million daily users and 230 nodes. More precisely, at Zynga:</p>
<ul>
<li>There are two Vertica databases with identical data.</li>
<li>Each Zynga Vertica database runs on 115 nodes.</li>
<li>Zynga&#8217;s two Vertica database clusters are used for different applications.</li>
<li>It&#8217;s undisclosed exactly what Zynga runs on what Vertica cluster. But best practice would be to put mission-critical, fast-response stuff on one cluster, and use the other for longer-running or less-critical queries &#8212; plus have it be available as hot standby &#8212; given that I don&#8217;t see much reason to put data geographically close to users around the world for reasons of latency or whatever.</li>
<li>An undisclosed amount of data, amounting to all of what Wayne earlier estimated at 3 TB, is added to each of Zynga&#8217;s Vertica databases daily.</li>
</ul>
<p>In other news, Vertica now states its customer count as being &gt;130.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/04/29/vertica-zynga/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Infobright blog update</title>
		<link>http://www.dbms2.com/2010/03/19/infobright-blog-update/</link>
		<comments>http://www.dbms2.com/2010/03/19/infobright-blog-update/#comments</comments>
		<pubDate>Fri, 19 Mar 2010 13:42:01 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data mart outsourcing]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Infobright]]></category>
		<category><![CDATA[Log analysis]]></category>
		<category><![CDATA[Market share]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[Web analytics]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1733</guid>
		<description><![CDATA[I often offer that, if a company puts up a sufficiently good blog post, I&#8217;ll link to it. Well, I just noticed that Infobright CEO Mark Burton (somewhere along the way he seems to have dropped the “interim”) put up an excellent post last month.
Highlights on the market share/sector side include:

Infobright’s customer base grew 500% [...]]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom: 0in;">I often offer that, if a company puts up a sufficiently good blog post, I&#8217;ll link to it. Well, I just noticed that Infobright CEO Mark Burton <span style="text-decoration: line-through;">(somewhere along the way he seems to have dropped the “interim”)</span> put up <a href="http://www.infobright.com/Blog/Entry/infobright_strategy_and_plans" onclick="javascript:pageTracker._trackPageview('/www.infobright.com');">an excellent post</a> last month.</p>
<p style="margin-bottom: 0in;">Highlights on the market share/sector side include:<span id="more-1733"></span></p>
<ul>
<li>Infobright’s customer base grew 500% over the past year, to 	120 paying customers.</li>
<li>This included end users (60%), as well as ISVs and SaaS 	providers (40%) who embed Infobright&#8217;s DBMS in their application.</li>
<li>During the same period, Infobright&#8217;s open source software was 	downloaded 35,000 times.</li>
<li>The end user applications were heavily clustered around web 	and online analytics tracking, with a focus on understanding 	customer behavior on the web.</li>
<li>Infobright also continues to see the growth of 	application-specific data marts.</li>
<li>There is also continued interest and growth in using 	Infobright technology to analyze IT logs and telecom CDR (Call 	Detail Record) data, to identify fraud or security issues, to 	understand and improve network performance, and other purposes.</li>
</ul>
<p>Product highlights include:</p>
<ul>
<li>Infobright be much more transparent in 2010 about its plans.</li>
<li>Infobright will start posting and commenting on future 	releases and themes in March of this year. (However, they haven&#8217;t 	run much of that by me yet, and we&#8217;re past the middle of March.)</li>
<li>Infobright expects to drop 3-4 interim releases for every 	major release, with at least two major releases in 2010.</li>
<li>Some of Infobright&#8217;s major improvements this year will be:
<ul>
<li>Continued SMP performance improvements “without the need 	for complex hardware configurations or administrative effort”.</li>
<li>Extending the “hit rate” of the Knowledge Grid, which is 	central to Infobright&#8217;s performance story.</li>
<li>Better international support with UTF-8 extensions.</li>
</ul>
</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/03/19/infobright-blog-update/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Vertica 4.0</title>
		<link>http://www.dbms2.com/2010/02/22/vertica-4/</link>
		<comments>http://www.dbms2.com/2010/02/22/vertica-4/#comments</comments>
		<pubDate>Mon, 22 Feb 2010 08:19:00 +0000</pubDate>
		<dc:creator>Curt Monash</dc:creator>
				<category><![CDATA[Analytic technologies]]></category>
		<category><![CDATA[Columnar database management]]></category>
		<category><![CDATA[Data warehousing]]></category>
		<category><![CDATA[Vertica Systems]]></category>

		<guid isPermaLink="false">http://www.dbms2.com/?p=1607</guid>
		<description><![CDATA[Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.
For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, [...]]]></description>
			<content:encoded><![CDATA[<p>Vertica briefed me last month on its forthcoming Vertica 4.0 release. I think it&#8217;s fair to say that Vertica 4.0 is mainly a cleanup/catchup release, washing away some of the tradeoffs Vertica had previously made in support of its innovative DBMS architecture.</p>
<p>For starters, there&#8217;s a lot of new analytic functionality. This isn&#8217;t Aster/Netezza-style ambitious. Rather, there&#8217;s a lot more SQL-99 functionality, plus some time series extensions of the sort that financial services firms – an important market for Vertica – need and love. Vertica did suggest a couple of these time series extensions are innovative, but I haven&#8217;t yet gotten detail about those.</p>
<p>Perhaps even more important, Vertica is cleaning up a lot of its previous SQL optimization and execution weirdnesses. In no particular order, I was told:<span id="more-1607"></span></p>
<ul>
<li>Vertica&#8217;s delete performance is up “literally” 30-100X, at least in the case of “large” deletes. Performance for “large” updates has been enhanced as well.</li>
<li>Vertica has finally cleaned up all vestiges of its prior <a href="http://www.dbms2.com/2007/10/23/vertica-star-snowflake-schema/" >bias to star schemas</a>. For example, Vertica concedes that its product previously would sometimes force a star execution plan that wasn&#8217;t really appropriate.</li>
<li>It is no longer the case that you need to define projections before you load a table into Vertica. This is now fully automatic.</li>
<li>Vertica 4.0 automatically redesigns the database when new nodes are added to the system.</li>
<li>When a database designer does hand-tune projections – and there&#8217;s no shame in this still being a possibility in Vertica 4.0 – that hand-tuning is now pulled back into the automatic generation/recommendation/whatever wizards for further projections. I.e., there&#8217;s a kind of DBA round-trip engineering going on.</li>
<li>Vertica used to require that tables being joined be identically “segmented” (I think this means distributed across joins). That is no longer the case in 4.0.</li>
<li>In connection with this new-found flexibility, Vertica now supports full outer joins directly, rather than requiring the left outer join/right outer join/UNION kluge.</li>
<li>The Vertica 4.0 optimizer is smarter than its predecessor about things like predicate pushdown into subqueries, or exploiting commonality between predicates and partition keys.</li>
<li>There&#8217;s a fundamental change that I don&#8217;t understand very well in the Vertica execution engine basic unit of work. It sounds as if in the past all the disk-based data containers the query needed got opened at once and read into memory, whether or not there was enough RAM and CPU cores to handle them, and this problem has now been fixed.</li>
<li>Vertica always seemed to say that you could query immediately on new data, because even if it hadn&#8217;t hit disk yet – the ROS (Read-Optimized Store) – it was available in memory – the WOS (Write-Optimized Store). And queries were in essence federated between the ROS and WOS. But apparently it&#8217;s a new feature in Vertica 4.0 that you can read totally fresh data without locking. I confess to not understanding this very well either. (It has something to do with what  Vertica calls “Epochs”.)</li>
<li>Temporary tables can now be created in Vertica on a local/session basis without any DDL. Make temporary tables easier and more performant is important for a variety of reasons:
<ul>
<li>Microstrategy, Company V* et al. use lots of temp tables. E.g,, Company V on Vertica has 3000 permanent tables and 5-7000 temporary ones.</li>
<li>Vertica rightly points out that temporary tables are also important for ELT (Extract/Load/Transform).</li>
<li>Vertica further says that single-node OEMs such as security appliance vendors use lots of temp tables.</li>
</ul>
</li>
</ul>
<p><em>*Company V = one of the more prominent vertical-market application providers.</em></p>
<p>In other Vertica highlights:</p>
<ul>
<li>It sounds as if 4.0 is the first Vertica release with what I would regard as serious workload management.</li>
<li>While Vertica has stored and retrieved Unicode since Vertica 3.5 or so, 4.0 will be the first Vertica release in which Unicode is sorted and collated properly.</li>
<li>Stored-procedure-like functionality is still a future for Vertica.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dbms2.com/2010/02/22/vertica-4/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
