Columnar database management
Analysis of products and issues in column-oriented database management systems. Related subjects include:
Many kinds of memory-centric data management
I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:
- The desire for human real-time interactive response naturally leads to keeping data in RAM.
- Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
- However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.
Getting more specific than that is hard, however, because:
- The possibilities for in-memory data storage are as numerous and varied as those for disk.
- The individual technologies and products for in-memory storage are much less mature than those for disk.
- Solid-state options such as flash just confuse things further.
Consider, for example, some of the in-memory data management ideas kicking around. Read more
Comments on the analytic DBMS industry and Gartner’s Magic Quadrant for same
This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is out.* I shall now comment, just as I did on the 2010, 2009, 2008, 2007, and 2006 Gartner Data Warehouse Database Management System Magic Quadrants, to varying extents. To frame the discussion, let me start by saying:
- In general, I regard Gartner Magic Quadrants as a bad use of good research.
- Illustrating the uselessness of — or at least poor execution on — the overall quadrant metaphor, a large majority of the vendors covered are lined up near the line x = y, each outpacing the one below in both of the quadrant’s dimensions.
- I find fewer specifics to disagree with in this Gartner Magic Quadrant than in previous year’s versions. Two factors jump to mind as possible reasons:
- This year’s Gartner Magic Quadrant for Data Warehouse Database Management Systems is somewhat less ambitious than others; while it gives as much company detail as its predecessors, it doesn’t add as much discussion of overall trends. So there’s less to (potentially) disagree with.
- Merv Adrian is now at Gartner.
- Whatever the problems may be with Gartner’s approach, the whole thing comes out better than do Forrester’s failed imitations.
*As of February, 2012 — and surely for many months thereafter — Teradata is graciously paying for a link to the report.
Specific company comments, roughly in line with Gartner’s rough single-dimensional rank ordering, include: Read more
Some big-vendor execution questions, and why they matter
When I drafted a list of key analytics-sector issues in honor of look-ahead season, the first item was “execution of various big vendors’ ambitious initiatives”. By “execute” I mean mainly:
- “Deliver products that really meet customers’ desires and needs.”
- “Successfully convince them that you’re doing so …”
- “… at an attractive overall cost.”
Vendors mentioned here are Oracle, SAP, HP, and IBM. Anybody smaller got left out due to the length of this post. Among the bigger omissions were:
- salesforce.com (multiple subjects).
- SAS HPA.
- The evolution of Hadoop.
Clarifying SAND’s customer metrics, positioning and technical story
Talking with my clients at SAND can be confusing. That said:
- I need to revise my figures for SAND’s customer count way downward.
- SAND finally has a reasonably clear positioning.
- SAND’s product actually seems to have a lot of features.
A few months ago, I wrote:
SAND Technology reported >600 total customers, including >100 direct.
Upon talking with the company, I need to revise that figure downward, from > 600 to 15.
Exasol update
I last wrote about Exasol in 2008. After talking with the team Friday, I’m fixing that now.
The general theme was as you’d expect: Since last we talked, Exasol has added some new management, put some effort into sales and marketing, got some customers, kept enhancing the product and so on.
Top-level points included:
- Exasol’s technical philosophy is substantially the same as before, albeit not with as extreme a focus on fitting everything in RAM.
- Exasol believes its flagship DBMS EXASolution has great performance on a load-and-go basis.
- Exasol has 25 EXASolution customers, all in Germany.*
- 5 of those are “cloud” customers, at hosting providers engaged by Exasol.
- EXASolution database sizes now range from the low 100s of gigabytes up to 30 terabytes.
- Pretty much the whole company is in Nuremberg.
Oracle is buying Endeca
Oracle is buying Endeca. The official talking points for the deal aren’t a perfect match for Endeca’s actual technology, but so be it.
In that post, I wrote:
… the Endeca paradigm is really to help you make your way through a structured database, where different portions of the database have different structures. Thus, at various points in your journey, it automagically provides you a list of choices as to where you could go next.
That kind of thing could help Oracle with apps like the wireless telco product catalog deal MongoDB got.
Going back to the Endeca-post quote well, Endeca itself said:
Inside the MDEX Engine there is no overarching schema; each data record carries its own metadata. This enables the rapid combination of a wide range of structured and unstructured content into Latitude’s unified data model. Once inside, the MDEX Engine derives common dimensions and metrics from the available metadata, instantly exposing each for high-performance refinement and analysis in the Discovery Framework. Have a new data source? Simply add it and the MDEX Engine will create new relationships where possible. Changes in source data schema? No problem, adjustments on the fly are easy.
And I pointed out that the MDEX engine was a columnar DBMS.
Meanwhile, Oracle’s own columnar DBMS efforts have been disappointing. Endeca could be an intended answer to that. However, while Oracle’s track record with standalone DBMS acquisitions is admirable (DEC RDB, MySQL, etc.), Oracle’s track record of integrating DBMS acquisitions into the Oracle product itself is not so good. (Express? Essbase? The text product line? None of that has gone particularly well.)
So while I would expect Endeca’s flagship e-commerce shopping engine products to flourish under Oracle’s ownership, I would be cautious about the integration of Endeca’s core technology into the Oracle product line.
| Categories: Columnar database management, Endeca, Oracle | 6 Comments |
Hybrid-columnar soundbites
Busy couple of days talking with reporters. A few notes on hybrid-columnar analytic DBMS, all backed up by yesterday’s post on Teradata columnar:
- Oracle does not actually offer columnar I/O; the other three systems do. But see the “I won’t be surprised” part in yesterday’s Teradata post.
- Aster does not offer columnar compression; the other three do.
- EMC Greenplum and Teradata offer different kinds of ways to mix column and row storage in the same table; each has its advantages.
- Teradata generally has a more mature and capable offering than EMC Greenplum, for most purposes, whichever way you choose to organize your tables.
Edit: The Wall Street Journal got this wrong, writing that Teradata was the first-ever hybrid columnar system. Specifically, they wrote
While columnar technology has been around for years, Teradata says its product is unique because it allows users to include both columns and rows in the same database.
Googling on “Teradata To Unveil New Analytics Product To Speed Business Adoption” might get you around the paywall to see the offending piece.
| Categories: Aster Data, Columnar database management, Data warehousing, Database compression, Greenplum, Teradata | 2 Comments |
Teradata Columnar and Teradata 14 compression
Teradata is pre-announcing Teradata 14, for delivery by the end of this year, where by “Teradata 14″ I mean the latest version of the DBMS that drives the classic Teradata product line. Teradata 14′s flagship feature is Teradata Columnar, a hybrid-columnar offering that follows in the footsteps of Greenplum (now part of EMC) and Aster Data (now part of Teradata).
The basic idea of Teradata Columnar is:
- Each table can be stored in Teradata in row format, column format, or a mix.
- You can do almost anything with a Teradata columnar table that you can do with a row-based one.
- If you choose column storage, you also get some new compression choices.
| Categories: Archiving and information preservation, Columnar database management, Data warehousing, Database compression, Oracle, Rainstor, Teradata | 6 Comments |
Vertica projections — an overview
Partially at my suggestion, Vertica has blogged a three-part series explaining the “projections” that are central to a Vertica database. This is important, because in Vertica projections play the roles that in many analytic DBMS might be filled by base tables, indexes, AND materialized views. Highlights include:
- A Vertica projection can contain:
- All the columns in a table.
- Some of the columns in a table.
- A prejoin among tables.
- Vertica projections are updated and maintained just as base tables are. (I.e., there’s no kind of batch lag.)
- You can import the same logical schema you use elsewhere. Vertica puts no constraints on your logical schema. Note: Vertica has been claiming good support for all logical schemas since Vertica 4.0 came out in early 2010.
- Vertica (the product) will automatically generate a physical schema for you — i.e. a set of projections — that Vertica (the company) thinks will do a great job for you. Note: That also dates back to Vertica 4.0.
- Vertica claims that queries are very fast even when you haven’t created projections explicitly for them. Note: While the extent to which this is true may be a matter of dispute, competitors clearly overreach when they make assertions like “every major Vertica query needs a projection prebuilt for it.”
- On the other hand, it is advisable to build projections (automatically or manually) that optimize performance of certain parts of your query load.
The blog posts contain a lot more than that, of course, both rah-rah and technical detail, including reminders of other Vertica advantages (compression, no logging, etc.). If you’re interested in analytic DBMS, they’re worth a look.
Virtual data marts in Sybase IQ
I made a few remarks about Sybase IQ 15.3 when it became generally available in July. Now that I’ve had a current briefing, I’ll make a few more.
The key enhancement in Sybase IQ 15.3 is distributed query — what others might call parallel query — aka PlexQ. A Sybase IQ query can now be distributed among many nodes, all talking to the same SAN (Storage-Area Network). Any Sybase IQ node can take the responsibility of being the “leader” for that particular query.
In itself, this isn’t that impressive; all the same things could have been said about pre-Exadata Oracle.* But PlexQ goes somewhat further than just removing a bottleneck from Sybase IQ. Notably, Sybase has rolled out a virtual data mart capability. Highlights of the Sybase IQ virtual data mart story include: Read more
