Theory and architecture
Analysis of design choices in databases and database management systems. Related subjects include:
- Any subcategory
- Database diversity
- Explicit support for specific data types
- (in Text Technologies) Text search
The disk rotation speed bottleneck
I’ve been referring to the disk (rotation) speed bottleneck for years, but I don’t really have a clean link for it. Let me fix that right now.
The first hard disks ever were introduced by IBM in 1956. They rotated 1,200 times per minute. Today’s state-of-the-art disk drives rotate 15,000 times per minute. That’s a 12.5-fold improvement since the first term of the Eisenhower Administration. (I understand that the reason for this slow improvement is aerodynamic — a disk that spins too fast literally flies off the spindle.)
Unfortunately, random seek time is bounded below, on average, by 1/2 of a disk’s rotation time. Hence disk seek times can never get below 2 milliseconds.
15,000 RPM = 250 rotations/second, which implies 4 milliseconds/rotation.
From that, much about modern analytic DBMS design follows.
Two cornerstones of Oracle’s database hardware strategy
After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
- Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being “bumped off” if they don’t get it right.
- Juan believes the “bulk” of Oracle’s business will move over to Exadata-like technology over the next 5-10 years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise’s many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.
And by the way, Oracle doesn’t make its storage-tier software available to run on anything than Oracle-designed boxes. At the moment, that means Exadata Versions 1 and 2. Since Exadata is by far Oracle’s best DBMS offering (at least in theory), that means Oracle’s best database offering only runs on specific Oracle-sold hardware platforms. Read more
Three broad categories of data
People often try to draw a distinction between:
- Traditional data of the sort that’s stored in relational databases, aka “structured.”
- Everything else, aka “unstructured” or “semi-structured” or “complex.”
There are plenty of problems with these formulations, not the least of which is that the supposedly “unstructured” data is the kind that actually tends to have interesting internal structures. But of the many reasons why these distinctions don’t tend to work very well, I think the most important one is that:
Databases shouldn’t be divided into just two categories. Even as a rough-cut approximation, they should be divided into three, namely:
- Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays
- Human/Nontabular data — i.e., all other data generated by humans
- Machine-Generated data
Even that trichotomy is grossly oversimplified, for reasons such as:
- These categories overlap.
- There are kinds of data that get into fuzzy border zones.
- Not all data in each category has all the same properties.
But at least as a starting point, I think this basic categorization has some value. Read more
Categories: Database diversity, Investment research and trading, Log analysis, Telecommunications, Web analytics | 19 Comments |
Vertica slaughters Sybase in patent litigation
Back in August, 2008, I pooh-poohed Sybase’s patent lawsuit against Vertica. Filed in the notoriously patent-holder-friendly East Texas courts, the suit basically claimed patent rights over the whole idea of a columnar RDBMS. It was pretty clear that this suit was meant to be a model for claims against other columnar RDBMS vendors as well, should they ever achieve material marketplace success.
If a recent Vertica press release is to be believed, Sybase got clobbered. The meat is:
… Sybase has admitted that under the claim construction order issued by the Court on November 9, 2009, “Vertica does not infringe Claims 1-15 of U.S. Patent No. 5,794,229.” Sybase further acknowledged that because the Court ruled that all the remaining claims in the patent (claims 16-24) were invalid, “Sybase cannot prevail on those claims.”
For those counting along at home — the patent only has 24 claims in total.
I have no idea whether Sybase can still cobble together grounds for appeal, or claims under some other patent. But for now, this sounds like a total victory for Vertica.
Edit: I’ve now seen a PDF of a filing suggesting the grounds under which Sybase will appeal. Basically, it alleges that the judge erred in defining a “page” of data too narrowly. Note that if Sybase prevails on appeal on that point, Vertica has a bunch of other defenses that haven’t been litigated yet. It further seems that Sybase may have recently filed another patent case against Vertica, in a different venue, based on a different patent.
One annoying blog troll excepted, is anybody surprised at this outcome?
Categories: Columnar database management, Data warehousing, Sybase, Vertica Systems | 6 Comments |
Intersystems Cache’ highlights
I talked with Robert Nagle of Intersystems last week, and it went better than at least one other Intersystems briefing I’ve had. Intersystems’ main product is Cache’, an object-oriented DBMS introduced in 1997 (before that Intersystems was focused on the fourth-generation programming language M, renamed from MUMPS). Unlike most other OODBMS, Cache’ is used for a lot of stuff one would think an RDBMS would be used for, across all sorts of industries. That said, there’s a distinct health-care focus to Intersystems, in that:
- MUMPS, the original Intersystems technology, was focused on health care.
- The reasons Intersystems went object-oriented have a lot to do with the structure of health-care records.
- Intersystems’ biggest and most visible ISVs are in the health-care area.
- Intersystems is actually beginning to sell an electronic health records system called TrakCare around the world (but not in the US, where it has lots of large competitive VARs).
Note: Intersystems Cache’ is sold mainly through VARs (Value-Added Resellers), aka ISVs/OEMs. I.e., it’s sold by people who write applications on top of it.
So far as I understand – and this is still pretty vague and apt to be partially erroneous – the Intersystems Cache’ technical story goes something like this: Read more
Categories: Data models and architecture, Emulation, transparency, portability, Health care, Intersystems and Cache', Mid-range, Object, OLTP, Sybase, Theory and architecture | 8 Comments |
There sure seem to be a lot of inaccuracies on ParAccel’s website
In what is actually an interesting post on database compression, ParAccel CTO Barry Zane threw in
Anyone who has met with us knows ParAccel shies away from hype.
But like many things ParAccel says, that is not true.
Edit (October, 2010): Like other posts I’ve linked to from Barry Zane’s blog, that one seems to be gone, with the URL redirecting elsewhere on ParAccel’s website.
The latest whoppers came in the form of several customers ParAccel listed on its website who hadn’t actually bought ParAccel’s DBMS, nor even decided to do so. It is fairly common to to claim a customer win, then retract the claim due to lack of permission to disclose. But that’s not what happened in these cases. Based on emails helpfully shared by a ParAccel competitor competing in some of those accounts, it seems clear that ParAccel actually posted fabricated claims of customer wins. Read more
Categories: Columnar database management, Data warehousing, Database compression, Market share and customer counts, ParAccel, Telecommunications | 24 Comments |
This and that
I have various subjects backed up that I don’t really want to write about at traditional blog-post length. Here are a few of them. Read more
The legit part of the NoSQL idea
I’ve written some snarky things about the “NoSQL” concept – or at least the moniker. (Carl Olofson’s term “non-schematic databases” seems less bad.) Yet I’m actually favorable about the increasing use of SQL alternatives. Perhaps I should pull those thoughts together. Read more
Categories: Data models and architecture, Database diversity, Hadoop, NoSQL, Theory and architecture | 21 Comments |
NoSQL Q and A
Neal Leavitt is writing an article for IEEE on NoSQL. So he’s circulated a long list of questions, encouraging people to answer as many or few as they choose. Unfortunately, most of the questions are technically meaningless, in that they implicitly rely on the false assumption that there is such a thing as a single or at least reasonably well-defined NoSQL technology. (I imagine most of his questions are really about key-value stores.) Nonetheless, I took a crack at a number of them before getting bored. Anybody else want to pitch in too? Read more
Categories: Data models and architecture, Database diversity, NoSQL, Theory and architecture | 10 Comments |
New England Database Summit (January 28, 2010)
New England Database Day has now, in its third year, become a “Summit.” It’s a nice event, providing an opportunity for academics and business folks to mingle. The organizers are basically the local branch of the Mike Stonebraker research tree, with this year’s programming head being Daniel Abadi. It will be on Thursday, January 28, 2010, once again in the Stata Center at MIT. It would be reasonable to park in the venerable 4/5 Cambridge Center parking lot, especially if you’d like to eat at Legal Seafood afterwards.
So far there are two confirmed speakers — Raghu Ramakrishnan of Yahoo and me. My talk title will be something like “Database and analytic technology: The state of the union”, with all wordplay intended.
There’s more information at the official New England Database Summit website. There’s also a post with similar information on Daniel Abadi’s DBMS Musings blog.
Edit after the event:
Posts based on my January, 2010 New England Database Summit keynote address