Theory and architecture
Analysis of design choices in databases and database management systems. Related subjects include:
- Any subcategory
- Database diversity
- Explicit support for specific data types
- (in Text Technologies) Text search
Database management system choices – 4 categories of relational
This is the second of a five-part series on database management system choices. For the first post in the series, please click here.
For the most part, relational database management systems divide into four major classes:
- High-end OLTP (OnLine Transaction Processing) relational DBMS. Oracle is the flagship for this category, followed by DB2.
- Specialty data warehouse DBMS. Teradata is the leader here, followed by Netezza, DATAllegro, ParAccel, Vertica, Infobright, Greenplum, Kognitio, Sybase IQ, and a host of others.
- Mid-range relational database management systems. Most of the contenders here fall into one or more of three categories: Open-source-based relational DBMS (MySQL, PostgreSQL, EnterpriseDB); reseller-focused relational DBMS (Progress OpenEdge, Pervasive PSQL); or crippled “editions” of high-end systems. Microsoft SQL Server was once a clear mid-range system, but now is better classified as high-end OLTP.
- Embedded relational database management systems. The leader of this category is Sybase’s SQL Anywhere. Also significant are memory-centric products Oracle TimesTen and solidDB.
Categories: Database diversity, OLTP, Theory and architecture | 9 Comments |
Database management system choices — overview
This is the first in a 5-part series of posts on data management product choices. By pre-arrangement, Mike Stonebraker is responding on The Database Column, starting with his own taxonomy of DBMS types.
In the 1990s, most database management experts believed that a single general-purpose DBMS could meet substantially all needs. If you just kept adding in enough datatypes and data access methods (e.g., specialized indexes), your DBMS could eventually do a good job of meeting almost any requirement. And so, from the late 1990s into the beginning of this decade, it seemed that technology was supporting business trends, and the DBMS industry was inexorably consolidating. There was an oligopoly of high-end vendors, who sold increasingly similar super-sophisticated database management systems. Nothing else in database management seemed to matter.
Well, we were wrong. The big thing we overlooked is that database optimizations go down to the level of actual storage. Read more
Categories: Database diversity, Parallelization, Theory and architecture | 14 Comments |
Load speeds and related issues in columnar DBMS
Please do not rely on the parts of the post below that are about ParAccel. See our February 18 post about ParAccel instead.
I’ve already posted about a chat I had with Mike Stonebraker regarding Vertica yesterday. I naturally raised the subject of load speed, unaware that Mike’s colleague Stan Zlodnik had posted at length about load speed the day before. Given that post, it seems timely to go into a bit more detail, and in particular to address three questions:
- Can columnar DBMS do operational BI?
- Can columnar DBMS do ELT (Extract-Load-Transform, as opposed to ETL)?
- Are columnar DBMS’ load speeds a problem other than in issues #1 and #2?
Vertica update
I chatted with Andy Ellicott and Mike Stonebraker of Vertica today. Some of the content is embargoed until February 19 (for TDWI), but here are some highlights of the rest.
- Vertica now is “approaching” 50 paid customers, up from 15 or so in early November. (Compared to most of Vertica’s fellow data warehouse specialists, that’s a lot.) Many — perhaps most — of these customers are hedge funds or telcos.
- Vertica’s typical lag from sale to deployment is about one quarter.
- Vertica’s typical initial selling price is $250K. Or maybe it’s $100-150K. The Vertica guys are generally pretty forthcoming, but pricing is an exception. Whatever they charge, it’s strictly per terabyte of user data. They think they are competitive with other software vendors, and cheaper, all-in, than appliance vendors.
- One subject on which they’re totally non-forthcoming (lawyers’ orders) is the recent patent lawsuit filed by Sybase. They wouldn’t even say whether they thought it was bogus because they didn’t infringe, or whether they thought it was bogus because the patent shouldn’t have been granted.
- Average Vertica database size is a little under 10 terabytes of user data, with many examples in the 15-20 Tb range. Lots of customers plan to expand to 50-100 Tb.
- Vertica claims sustainable load speeds of 3-5 megabytes/sec/node, irrespective of database size. Data is sucked into RAM uncompressed, then written out a gig/node at a time, compressed. Gigabyte chunks are then merged on disk, which is superfast as it doesn’t involve sorting. (30 megabytes/second.) Mike insists this doesn’t compromise compression.
We also addressed the subject of Vertica’s schema assumptions, but I’ll leave that to another post.
Categories: Analytic technologies, Data warehousing, Database compression, Investment research and trading, Michael Stonebraker, Sybase, Theory and architecture, Vertica Systems | 6 Comments |
Dan Weinreb on ObjectStore
Dan Weinreb was one of the key techies at Object Design, the company that made the object-oriented database management system ObjectStore. (Object Design later merger into Excelon, which was eventually sold to Progress, which has deemphasized but still supports ObjectStore.) Recently he wrote a pair of long and fascinating articles* about Object Design, ObjectStore, and OODBMS, the first of which makes the case that “object-oriented database management systems succeeded.”
Read more
CouchDB — lazy database design taken to excess?
I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.
Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read more
Categories: CouchDB, Data models and architecture, Database diversity, Structured documents, Theory and architecture | 4 Comments |
A passionate defense of MapReduce
Mark Chu-Carroll has weighed in with a passionate defense of MapReduce. I only see one thing he got wrong, which was to overlook the great shared-nothing parallelism of today’s data warehouse appliances and specialty data warehouse DBMS. But that doesn’t detract from his overall point, which is that MapReduce is designed to help with parallel computing in general, not database querying in particular.
He also has the best version I know of an old observation, namely:
… [relational database] people have found the most beautiful, wonderful, perfect hammer in the whole world. It’s perfectly balanced – not too heavy, not too light, and swings just right to pound in a nail just right every time. The grip is custom-made, fitted to the shape of the owners hand, so that they can use it all day without getting any blisters. It’s also beautifully decorated – encrusted with gemstones and gold filigree – but only in places that won’t detract from how well it works as a hammer. It really is the greatest hammer ever. Relational database guys love their hammer. It’s just such a wonderful tool! And when they make something with it, it really comes out great. In fact, they like it so much that they think it’s the only tool they need. If you give them a screw, they’ll just pound it in like it’s a nail. And when you point out to them that dammit, it’s a screw, not a nail, they’ll say “I know that. But you can’t expect me to use a crappy little screwdriver when I have a magnificent hammer like this!”
Categories: Data models and architecture, Database diversity, MapReduce, Parallelization | Leave a Comment |
A sane article from a strict relational advocate
Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.
Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.
Netezza targets 1 petabyte
Netezza is promising petabyte-scale appliances later this year, up from 100 terabytes. That’s user data (I checked), and assumes 2-3X compression, or a little less than they think is actually likely. I.e., they’re describing their capacity in the same kinds of terms other responsible vendors do. They haven’t actually built and tested any 1 petabyte systems internally yet, but they’ve gone over 100 terabytes.
Basically, this leaves Netezza’s high-end capability about 10X below Teradata’s. On the other hand, it should leave them capable of handling pretty much every Teradata database in existence. Read more
Categories: Analytic technologies, Data warehouse appliances, Data warehousing, Netezza, Petabyte-scale data management, Teradata | Leave a Comment |
Amazon SimpleDB – when less is, supposedly, enough
I’ve posted several times about Amazon as an innovative, super-high-end user — doing transactional object caching with ObjectStore, building an inhouse less-than-DBMS called Dynamo, or just generally adopting a very DBMS2-like approach to data management. Now Amazon is bring the Dynamo idea to the public, via a SaaS offering called SimpleDB. (Hat tip to Tim Anderson.)
SimpleDB is obviously meant to be a data server for online applications. There are no joins, and queries don’t run over 5 seconds, so serious analytics are out of the question. Domains are limited to 10GB for now, so extreme media file serving also isn’t what’s intended; indeed, Amazon encourages one to use SimpleDB to store pointers to larger objects stored as files in Amazon S3.
On the other hand, if you think of SimpleDB as an OLTP DBMS, your head might explode. There’s no sense of transaction, no mechanisms to help with integrity, no way to do arithmetic, and indeed no assurance that writes will be immediately reflected in reads. Read more