Data models and architecture

Discussion of issues in data modeling, and whether databases should be consolidated or loosely coupled. Related subjects include:

July 21, 2008

Project Cassandra — Facebook’s open sourced quasi-DBMS

Facebook has open-sourced Project Cassandra, an imitation of Google’s BigTable.  Actual public information about Facebook’s Cassandra seems to reside in a few links that may be found on the Cassandra Project’s Google code page.  All the discussion I’ve seen seems to be based solely on some slides from a SIGMOD presentation. In particular, Dare Obasanjo offers an excellent overview of Cassandra.  To wit: Read more

April 13, 2008

ScaleDB presents The Revenge of the Pointer

The MySQL user conference is upon us, and hence so are MySQL-related product announcements, including storage engines. One such is Kickfire. ScaleDB — smaller and earlier-stage — is another.

In a nutshell, ScaleDB’s proposition is:

Like many software companies with non-US roots, ScaleDB seems to have started with a single custom project, using a Patricia trie indexing system. Then they decided Patricia tries might be really useful for relational OLTP as well. The ScaleDB team now features four developers, plus half-time or so “Chief Architect” involvement from Vern Watts. Watts seems to pretty much have been Mr. IMS for the past four decades, and thus surely knows a whole lot about pointer-based database management systems; presumably, he’s responsible for the generic DBMS design features that are being added to the innovative indexing scheme. On ScaleDB’s advisory board is PeopleSoft veteran Rick Berquist, about whom I’ve had fond thoughts ever since he talked me into focusing on consulting as the core of my business.*

*More precisely, Rick pretty much tricked me into doing a day of consulting for $15K, then revealed that’s what he’d done, expressing the thought that he’d very much gotten his money’s worth. But I digress …

ScaleDB has no customers to date, but hopes to be in beta by the end of this year. Angels and a small VC firm have provided bridge loans; otherwise, ScaleDB has no outside investment. ScaleDB’s business model thoughts include:

Read more

February 19, 2008

Kalido — CASE for complex data warehouses

Kalido briefed me last week, under pre-TDWI embargo. To a first approximation, their story is confusingly buzzword-laden, as is evident from their product names. The Kalido suite is called the Kalido Information Engine, and it comprises:

But those mouthfuls aside, Kalido has some pretty interesting things to say about data warehouse schema complexity and change.

Read more

February 1, 2008

CouchDB — lazy database design taken to excess?

I’ve run into a research/alpha/whatever project called CouchDB a couple of times now. It’s yet another “Who needs relational databases? Who needs schemas?” kind of idea. Rather, CouchDB is for taking random documents and banging them into databases, then calculating views on the fly as needed. It’s REST-friendly. Lucene and a web server are built in.

Damien Katz seems to be the driving force behind CouchDB, and his discussion of document-oriented development seems to be a good starting point. Read more

January 24, 2008

A passionate defense of MapReduce

Mark Chu-Carroll has weighed in with a passionate defense of MapReduce. I only see one thing he got wrong, which was to overlook the great shared-nothing parallelism of today’s data warehouse appliances and specialty data warehouse DBMS. But that doesn’t detract from his overall point, which is that MapReduce is designed to help with parallel computing in general, not database querying in particular.

He also has the best version I know of an old observation, namely:

… [relational database] people have found the most beautiful, wonderful, perfect hammer in the whole world. It’s perfectly balanced - not too heavy, not too light, and swings just right to pound in a nail just right every time. The grip is custom-made, fitted to the shape of the owners hand, so that they can use it all day without getting any blisters. It’s also beautifully decorated - encrusted with gemstones and gold filigree - but only in places that won’t detract from how well it works as a hammer. It really is the greatest hammer ever. Relational database guys love their hammer. It’s just such a wonderful tool! And when they make something with it, it really comes out great. In fact, they like it so much that they think it’s the only tool they need. If you give them a screw, they’ll just pound it in like it’s a nail. And when you point out to them that dammit, it’s a screw, not a nail, they’ll say “I know that. But you can’t expect me to use a crappy little screwdriver when I have a magnificent hammer like this!”


January 18, 2008

A sane article from a strict relational advocate

Anybody who cites — with approval — both Fabian Pascal and Joe Celko can’t be all bad. “Why Programmers Don’t Like Relational Databases” is a bit of polemic, but on the whole it’s a good reminder of why relational-bashing often is overdone.

Personally, I think the applications for which traditional schema-heavy relational/SQL programming make sense are less interesting that those for which it doesn’t — but the world is indeed chock full of less interesting tasks.


December 18, 2007

Amazon SimpleDB - when less is, supposedly, enough

I’ve posted several times about Amazon as an innovative, super-high-end user — doing transactional object caching with ObjectStore, building an inhouse less-than-DBMS called Dynamo, or just generally adopting a very DBMS2-like approach to data management. Now Amazon is bring the Dynamo idea to the public, via a SaaS offering called SimpleDB. (Hat tip to Tim Anderson.)

SimpleDB is obviously meant to be a data server for online applications. There are no joins, and queries don’t run over 5 seconds, so serious analytics are out of the question. Domains are limited to 10GB for now, so extreme media file serving also isn’t what’s intended; indeed, Amazon encourages one to use SimpleDB to store pointers to larger objects stored as files in Amazon S3.

On the other hand, if you think of SimpleDB as an OLTP DBMS, your head might explode. There’s no sense of transaction, no mechanisms to help with integrity, no way to do arithmetic, and indeed no assurance that writes will be immediately reflected in reads. Read more

December 2, 2007

Amazon Dynamo — when primary key access is enough

Amazon has a very decentralized technical operation. But even the individual pieces have interestingly huge scale. Thus, various different things they’re doing are of interest.

They recently presented a research paper on a high-performance transactional system called Dynamo. (Hat tip to Dare Obasanjo.) A key point is the following:

There are many services on Amazon’s platform that only need primary-key access to a data store. For many services, such as those that provide best seller lists, shopping carts, customer preferences, session management, sales rank, and product catalog, the common pattern of using a relational database would lead to inefficiencies and limit scale and availability. Dynamo provides a simple primary-key only interface to meet the requirements of these applications.

Now, I don’t think too many organizations past Amazon are going to decide that they can’t afford the overhead of an RDBMS for such OLTP-like applications. But I do think it will become increasingly common to find other reasons to eschew traditional OLTP relational architectures. Maybe you’ll want the schema flexibility of XML. Or perhaps you’ll be happy with a fixed relational schema, but will want to optimize for analytic performance.

October 23, 2007

Vertica — just star and snowflake schemas?

One of the longest-running technotheological disputes I know of is the one pitting flat/normalized data warehouse architectures vs. cubes, stars, and snowflake schemas. Teradata, for example, is a flagwaver for the former camp; Microstrategy is firmly in the latter. (However, that doesn’t keep lots of retailers from running Microstrategy on Teradata boxes.) Attensity (a good Teradata partner) is in the former camp; text mining rival Clarabridge (sort of a Microstrategy spinoff) is in the latter. And so on.

Vertica is clearly in the star/snowflake camp as well. I asked them about this, and Vertica’s CTO Mike Stonebraker emailed a response. I’m reproducing it below, with light edits; the emphasis is also mine. Key points include:

Great question. This is something that we’ve thought a lot about and have done significant research on with large enterprise customers. … short answer is as follows:

Vertica supports star and snowflake schemas because that is the desired data structure for data warehousing. The overwhelming majority of the schemas we see are of this form, and we have highly optimized for this case.

Read more

September 24, 2007

Pervasive Summit PSQL v10

Pervasive Software has a long history – 25 years, in fact, as they’re emphasizing in some current marketing. Ownership and company name have changed a few times, as the company went from being an independent startup to being owned by Novell to being independent again. The original product, and still the cash cow, was a linked-list DBMS called Btrieve, eventually renamed Pervasive PSQL as it gained more and more relational functionality.

Pervasive Summit PSQL v10 has just been rolled out, and I wrote a nice little white paper to commemorate the event, describing some of the main advances over v9, primarily for the benefit of current Pervasive PSQL developers. In one major advance, Pervasive made the SQL functionality much stronger. In particular, you now can have a regular SQL data dictionary, so that the database can be used for other purposes – BI, additional apps, whatever. Apparently, that wasn’t possible before, although it had been possible in yet earlier releases. Pervasive also added view-based security permissions, which is obviously a Very Good Thing.

There also are some big performance boosts. Read more

June 15, 2007

Fast RDF in specialty relational databases

When Mike Stonebraker and I discussed RDF yesterday, he quickly turned to suggesting fast ways of implementing it over an RDBMS. Then, quite characteristically, he sent over a paper that allegedly covered them, but actually was about closely related schemes instead. :) Edit: The paper has a new, stable URL. Hat tip to Daniel Abadi.

All minor confusion aside, here’s the story. At its core, an RDF database is one huge three-column table storing subject-property-object triples. In the naive implementation, you then have to join this table to itself repeatedly. Materialized views are a good start, but they only take you so far. Read more

June 15, 2007

RDF “definitely has legs”

Thus spake Mike Stonebraker to me, on a call we’d scheduled to talk about several other things altogether. This was one day after I was told at the Text Analytics Summit that the US government is going nuts for RDF. And I continue to get confirmation of something I first noted last year — Oracle is pushing RDF heavily, especially in the life sciences market.

Evidently, the RDF data model is for real … unless, of course, you’re the kind of purist who cares to dispute whether RDF is a true “data model” at all.

December 9, 2005

More flame war stupidity

Robert Seiner (publisher of TDAN) and Fabian Pascal are now claiming that Computerworld approached Bob and asked him to do something about the false charge that I personally engaged in censorship. To the best of my knowledge, they’re both lying. It was just me, and me alone, who approached Bob, which is exactly what one would think, if for some odd reason one cared about the matter at all. I don’t have the faintest idea why they fabricated this story, or what they think it demonstrates — but they did.

Seiner also picked a title for an article of mine he published, then published one by Fabian attacking me for the title. Classy.

Bob also made two promises in the matter which he didn’t keep. Nor did he have the courtesy to inform me that he’d changed his mind, nor did he really address it when I called him on it.

I wondered why Seiner kept on publishing Pascal’s stuff, even for free, when most of Fabian’s other publishers have dropped him. Now I have a better idea. They’re soulmates.

A pity. Partway through our discussions, Bob sounded eminently reasonable. That’s why I jumped at his suggestion I write an article for him. Oh well; live and learn.

And for the record — no, I won’t respond to Pascal’s critiques point by point. He typically attacks straw men, rather than restricting his barbs to my actual opinions. In those areas where we do actually disagree, I haven’t hesitated to publish follow-on arguments, repeatedly and at length, here and elsewhere. I’ve given that relative nonentity much more attention than he deserves.

Also for the record — even though I don’t respond to every nasty shot Pascal and his associates take at me, I’m of course not conceding that his other libels and opinions are actually correct. I just think that by and large he’s a waste of bandwidth, because even his coherent ideas are quickly sidetracked by highly illogical fulminations. Even in articles where he’s otherwise making enough sense to respond to, he usually goes off on some extremist semantics-related kick that doesn’t mesh well with his own imperfect command of the English language.

(I really want to respond to his film contracts example from a three-year-old anti-XML diatribe. But the article gets bogged down with various “definitions” that are not easily reconciled to normal usage of the words, and it’s too much trouble to sort through them all. Maybe I’ll respond to the idea without linking to the article itself, when I get around to it.)

Exception to the above slam at Pascal — he recently posted a good interchange he had with Hugh Darwen, which I’m referencing in another post in this blog. His side was wrong, but both sides were well-presented.

Feed including blog about database management, data warehousing, and business intelligence Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Recent white paper

The Explosion in DBMS Choice

August, 2008

Recent webcast

What leading database vendors don't want you to know

Originally broadcast April 9, 2008

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.