Columnar database management

Analysis of products and issues in column-oriented database management systems. Related subjects include:

October 24, 2012

Quick notes on Impala

Edit: There is now a follow-up post on Cloudera Impala with substantially more detail.

In my world it’s possible to have a hasty 2-hour conversation, and that’s exactly what I had with Cloudera last week. We touched on hardware and general adoption, but much of the conversation was about Cloudera Impala, announced today. Like Hive, Impala turns Hadoop into a basic analytic RDBMS, with similar SQL/Hadoop integration benefits to those of Hadapt. In particular:

Beyond that: Read more

October 23, 2012

Introduction to Platfora

When I wrote last week that I have at least 5 clients claiming they’re uniquely positioned to support BI over Hadoop (most of whom partner with a 6th client, Tableau) the non-partnering exception I had in mind was Platfora, Ben Werther’s oh-so-stealthy startup that is finally de-stealthing today. Platfora combines:

The whole thing sounds like a perhaps more general and certainly non-SaaS version of what Metamarkets has been offering for a while.

The Platfora technical story starts: Read more

October 16, 2012

Hadapt Version 2

My clients at Hadapt are coming out with a Version 2 to be available in Q1 2013, and perhaps slipstreaming some of the features before then. At that point, it will be reasonable to regard Hadapt as offering:

Solr is in the mix as well.

Hadapt+Hadoop is positioned much more as “better than Hadoop” than “a better scale-out RDBMS”– and rightly so, due to its limitations when viewed strictly from an analytic RDBMS standpoint. I.e., Hadapt is meant for enterprises that want to do several of:

Hadapt has 6 or so production customers, a dozen or so more coming online soon, 35 or so employees (mainly in Cambridge or Poland), reasonable amounts of venture capital, and the involvement of a variety of industry luminaries. Hadapt’s biggest installation seems to have 10s of terabytes of relational data and 100s of TBs of multi-structured; Hadapt is very confident in its ability to scale an order of magnitude beyond that with the Version 2 product, and reasonably confident it could go even further.

At the highest level, Hadapt works like this: Read more

October 1, 2012

Notes on the Oracle OpenWorld Sunday keynote

I’m not at Oracle OpenWorld, but as usual that won’t keep me from commenting. My bottom line on the first night’s announcements is:

In particular:

1. At the highest level, my view of Oracle’s strategy is the same as it’s been for several years:

Clayton Christensen’s The Innovator’s Solution teaches us that Oracle should focus on selling a thick stack of technology to its highest-end customers, and that’s exactly what Oracle does focus on.

2. Tonight’s news is closely in line with what Oracle’s Juan Loaiza told me three years ago, especially:

  • Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
  • Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.

3. Oracle is confusing people with its comments on multi-tenancy. I suspect:

4. SaaS (Software as a Service) vendors don’t want to use Oracle, because they don’t want to pay for it.* This limits the potential impact of Oracle’s true multi-tenancy features. Even so: Read more

September 27, 2012

Hoping for true columnar storage in Oracle12c

I was asked to clarify one of my July comments on Oracle12c,

I wonder whether Oracle will finally introduce a true columnar storage option, a year behind Teradata. That would be the obvious enhancement on the data warehousing side, if they can pull it off. If they can’t, it’s a damning commentary on the core Oracle codebase.

by somebody smart who however seemed to have half-forgotten my post comparing (hybrid) columnar compression to (hybrid) columnar storage.

In simplest terms:

August 7, 2012

Notes on some basic database terminology

In a call Monday with a prominent company, I was told:

That, to put it mildly, is not accurate. So I shall try, yet again, to set the record straight.

In an industry where people often call a DBMS just a “database” — so that a database is something that manages a database! — one may wonder why I bother. Anyhow …

1. The products commonly known as Oracle, Exadata, DB2, Sybase, SQL Server, Teradata, Sybase IQ, Netezza, Vertica, Greenplum, Aster, Infobright, SAND, ParAccel, Exasol, Kognitio et al. all either are or incorporate relational database management systems, aka RDBMS or relational DBMS.

2. In principle, there can be difficulties in judging whether or not a DBMS is “relational”. In practice, those difficulties don’t arise — yet. Every significant DBMS still falls into one of two categories:

*I expect the distinction to get more confusing soon, at which point I’ll adopt terms more precise than “relational things” and “relational stuff”.

3. There are two chief kinds of relational DBMS: Read more

July 28, 2012

Some Vertica 6 features

Vertica 6 was recently announced, and so it seemed like a good time to catch up on Vertica features. The main topics I want to address are:

Also:

In general, the main themes of Vertica 6 appear to be:

Let’s do the analytic functionality first. Notes on that include:

I’ll also take this opportunity to expand on something I wrote about a few vendors — including Vertica — at the end of my post on approximate query results. When I probed how customers of Vertica and other RDBMS-based analytic platform vendors used vendor-proprietary advanced analytic SQL and other analytic capabilities, answers included: Read more

July 25, 2012

Thoughts on the next releases of Oracle and Exadata

A reporter asked me to speculate about the next releases of Oracle and Exadata. He and I agreed:

My answers mixed together thoughts on what Oracle should and will emphasize (which aren’t the same thing but hopefully bear some relationship to each other ;)). They were (lightly edited):

July 25, 2012

The eternal bogosity of performance marketing

Chris Kanaracus uncovered a case of Oracle actually pulling an ad after having been found “guilty” of false advertising. The essence seems to be that Oracle claimed 20X hardware performance vs. IBM, based on a comparison done against 6 year old hardware running an earlier version of the Oracle DBMS. My quotes in the article were:

Another example of Oracle exaggeration was around the Exadata replacement of Teradata at Softbank. But the bogosity flows both ways. Netezza used to make a flat claim of 50X better performance than Oracle, while Vertica’s standard press release boilerplate long boasted

50x-1000x faster performance at 30% the cost of traditional solutions

Of course, reality is a lot more complicated. Even if you assume apples-to-apples comparisons in terms of hardware and software versions, performance comparisons can vary greatly depending upon queries, databases, or use cases. For example:

And so, vendor marketing claims about across-the-board performance should be viewed with the utmost of suspicion.

Related links

July 19, 2012

A data distribution idea at Vertica and Clustrix

Yesterday I wrote:

Clustrix has one cool idea I haven’t heard from anybody else, which I’m calling index distribution. The idea is that each index can be distributed differently across the cluster …  i.e. on different distribution keys. Clustrix thinks that paying special attention to index distribution and movement is helpful to the performance of distributed joins.

While that’s true, I thought I’d heard something similar from Vertica; so I checked, and indeed I had. Vertica famously lets you store columns in different sort orders, in both reasonable senses:

It turns out those columns can also be distributed on different keys as well.

Related link

← Previous PageNext Page →

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.