Theory and architecture

Analysis of design choices in databases and database management systems. Related subjects include:

Any subcategory
Database diversity
Explicit support for specific data types
(in Text Technologies) Text search

February 15, 2008

Database management system choices – 4 categories of relational

This is the second of a five-part series on database management system choices. For the first post in the series, please click here.

For the most part, relational database management systems divide into four major classes:

High-end OLTP (OnLine Transaction Processing) relational DBMS. Oracle is the flagship for this category, followed by DB2.
Specialty data warehouse DBMS. Teradata is the leader here, followed by Netezza, DATAllegro, ParAccel, Vertica, Infobright, Greenplum, Kognitio, Sybase IQ, and a host of others.
Mid-range relational database management systems. Most of the contenders here fall into one or more of three categories: Open-source-based relational DBMS (MySQL, PostgreSQL, EnterpriseDB); reseller-focused relational DBMS (Progress OpenEdge, Pervasive PSQL); or crippled “editions” of high-end systems. Microsoft SQL Server was once a clear mid-range system, but now is better classified as high-end OLTP.
Embedded relational database management systems. The leader of this category is Sybase’s SQL Anywhere. Also significant are memory-centric products Oracle TimesTen and solidDB.

Categories: Database diversity, OLTP, Theory and architecture

9 Comments

February 15, 2008

Database management system choices — overview

This is the first in a 5-part series of posts on data management product choices. By pre-arrangement, Mike Stonebraker is responding on The Database Column, starting with his own taxonomy of DBMS types.

In the 1990s, most database management experts believed that a single general-purpose DBMS could meet substantially all needs. If you just kept adding in enough datatypes and data access methods (e.g., specialized indexes), your DBMS could eventually do a good job of meeting almost any requirement. And so, from the late 1990s into the beginning of this decade, it seemed that technology was supporting business trends, and the DBMS industry was inexorably consolidating. There was an oligopoly of high-end vendors, who sold increasingly similar super-sophisticated database management systems. Nothing else in database management seemed to matter.

Well, we were wrong. The big thing we overlooked is that database optimizations go down to the level of actual storage. Read more

Categories: Database diversity, Parallelization, Theory and architecture

14 Comments

February 8, 2008

Load speeds and related issues in columnar DBMS

Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.

I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:

Can columnar DBMS do operational BI?
Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?

Categories: Analytic technologies, Business intelligence, Columnar database management, Data warehousing, EAI, EII, ETL, ELT, ETLT, Michael Stonebraker, ParAccel, Sybase, Theory and architecture, Vertica Systems

Leave a Comment

February 7, 2008

Vertica update

I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.

Vertica now is “approaching” 50 paid customers, up from 15 or so in early November. (Compared to most of Vertica’s fellow data warehouse specialists, that’s a lot.) Many — perhaps most — of these customers are hedge funds or telcos.
Vertica’s typical lag from sale to deployment is about one quarter.
Vertica’s typical initial selling price is $250K. Or maybe it’s $100-150K. The Vertica guys are generally pretty forthcoming, but pricing is an exception. Whatever they charge, it’s strictly per terabyte of user data. They think they are competitive with other software vendors, and cheaper, all-in, than appliance vendors.
One subject on which they’re totally non-forthcoming (lawyers’ orders) is the recent patent lawsuit filed by Sybase. They wouldn’t even say whether they thought it was bogus because they didn’t infringe, or whether they thought it was bogus because the patent shouldn’t have been granted.
Average Vertica database size is a little under 10 terabytes of user data, with many examples in the 15-20 Tb range. Lots of customers plan to expand to 50-100 Tb.
Vertica claims sustainable load speeds of 3-5 megabytes/sec/node, irrespective of database size. Data is sucked into RAM uncompressed, then written out a gig/node at a time, compressed. Gigabyte chunks are then merged on disk, which is superfast as it doesn’t involve sorting. (30 megabytes/second.) Mike insists this doesn’t compromise compression.

We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.

Categories: Analytic technologies, Data warehousing, Database compression, Investment research and trading, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems

6 Comments

February 1, 2008

Dan Weinreb on ObjectStore

Dan Weinreb was one of the key techies at Object Design, the company that made the object-oriented database management system ObjectStore. (Object Design later merger into Excelon, which was eventually sold to Progress, which has deemphasized but still supports ObjectStore.) Recently he wrote a pair of long and fascinating articles* about Object Design, ObjectStore, and OODBMS, the first of which makes the case that “object-oriented database management systems succeeded.”
Read more

Categories: Data models and architecture, Object, Progress, Apama, and DataDirect

Leave a Comment

February 1, 2008

CouchDB — lazy database design taken to excess?

I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.

Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read more

Categories: CouchDB, Data models and architecture, Database diversity, Structured documents, Theory and architecture

4 Comments

January 24, 2008

A passionate defense of MapReduce

Mark Chu-Carroll has weighed in with a passionate defense of MapReduce. I only see one thing he got wrong, which was to overlook the great shared-nothing parallelism of today’s data warehouse appliances and specialty data warehouse DBMS. But that doesn’t detract from his overall point, which is that MapReduce is designed to help with parallel computing in general, not database querying in particular.

He also has the best version I know of an old observation, namely:

… [relational database] people have found the most beautiful, wonderful, perfect hammer in the whole world. It’s perfectly balanced – not too heavy, not too light, and swings just right to pound in a nail just right every time. The grip is custom-made, fitted to the shape of the owners hand, so that they can use it all day without getting any blisters. It’s also beautifully decorated – encrusted with gemstones and gold filigree – but only in places that won’t detract from how well it works as a hammer. It really is the greatest hammer ever. Relational database guys love their hammer. It’s just such a wonderful tool! And when they make something with it, it really comes out great. In fact, they like it so much that they think it’s the only tool they need. If you give them a screw, they’ll just pound it in like it’s a nail. And when you point out to them that dammit, it’s a screw, not a nail, they’ll say “I know that. But you can’t expect me to use a crappy little screwdriver when I have a magnificent hammer like this!”

Categories: Data models and architecture, Database diversity, MapReduce, Parallelization

Leave a Comment

January 18, 2008

A sane article from a strict relational advocate

Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.

Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.

Categories: Data models and architecture, Theory and architecture

Leave a Comment

January 10, 2008

Netezza targets 1 petabyte

Netezza is promising petabyte-scale appliances later this year, up from 100 terabytes. That’s user data (I checked), and assumes 2-3X compression, or a little less than they think is actually likely. I.e., they’re describing their capacity in the same kinds of terms other responsible vendors do. They haven’t actually built and tested any 1 petabyte systems internally yet, but they’ve gone over 100 terabytes.

Basically, this leaves Netezza’s high-end capability about 10X below Teradata’s. On the other hand, it should leave them capable of handling pretty much every Teradata database in existence. Read more

Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Petabyte-scale data management, Teradata

Leave a Comment

December 18, 2007

Amazon SimpleDB – when less is, supposedly, enough

I’ve posted several times about Amazon as an innovative, super-high-end user — doing transactional object caching with ObjectStore, building an inhouse less-than-DBMS called Dynamo, or just generally adopting a very DBMS2-like approach to data management. Now Amazon is bring the Dynamo idea to the public, via a SaaS offering called SimpleDB. (Hat tip to Tim Anderson.)

SimpleDB is obviously meant to be a data server for online applications. There are no joins, and queries don’t run over 5 seconds, so serious analytics are out of the question. Domains are limited to 10GB for now, so extreme media file serving also isn’t what’s intended; indeed, Amazon encourages one to use SimpleDB to store pointers to larger objects stored as files in Amazon S3.

On the other hand, if you think of SimpleDB as an OLTP DBMS, your head might explode. There’s no sense of transaction, no mechanisms to help with integrity, no way to do arithmetic, and indeed no assurance that writes will be immediately reflected in reads. Read more

Categories: Amazon and its cloud, Cloud computing, Data models and architecture, NoSQL, OLTP, Software as a Service (SaaS), Theory and architecture

6 Comments

← Previous Page — Next Page →

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Theory and architecture

Database management system choices – 4 categories of relational

Database management system choices — overview

Load speeds and related issues in columnar DBMS

Vertica update

Dan Weinreb on ObjectStore

CouchDB — lazy database design taken to excess?

A passionate defense of MapReduce

A sane article from a strict relational advocate

Netezza targets 1 petabyte

Amazon SimpleDB – when less is, supposedly, enough

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin