April 13, 2008

ScaleDB presents The Revenge of the Pointer

The MySQL user conference is upon us, and hence so are MySQL-related product announcements, including storage engines. One such is Kickfire. ScaleDB — smaller and earlier-stage — is another.

In a nutshell, ScaleDB’s proposition is:

Like many software companies with non-US roots, ScaleDB seems to have started with a single custom project, using a Patricia trie indexing system. Then they decided Patricia tries might be really useful for relational OLTP as well. The ScaleDB team now features four developers, plus half-time or so “Chief Architect” involvement from Vern Watts. Watts seems to pretty much have been Mr. IMS for the past four decades, and thus surely knows a whole lot about pointer-based database management systems; presumably, he’s responsible for the generic DBMS design features that are being added to the innovative indexing scheme. On ScaleDB’s advisory board is PeopleSoft veteran Rick Berquist, about whom I’ve had fond thoughts ever since he talked me into focusing on consulting as the core of my business.*

*More precisely, Rick pretty much tricked me into doing a day of consulting for $15K, then revealed that’s what he’d done, expressing the thought that he’d very much gotten his money’s worth. But I digress …

ScaleDB has no customers to date, but hopes to be in beta by the end of this year. Angels and a small VC firm have provided bridge loans; otherwise, ScaleDB has no outside investment. ScaleDB’s business model thoughts include:

The company is based in Menlo Park, CA.

Probably I should explain what Patricia tries actually are, and how they can help relational DBMS. An ordinary trie* is a way of indexing data that looks a lot like – unsurprisingly – a tree. For example, suppose you need to index a lot of character strings, each consisting of lower-case Latin letters. From the root node you point to the 26 possibilities for starting letter. From those you point to the next possible letter, and so on. Combinatorial explosion is averted because you only have edges if there’s actually a string with that letter combination. Thus, when indexing a corpus of classic novels, there might be a path i-t-i-s-a-t-r-u-t-h-u-n-… and so on, but none that starts i-a-u-z-z-z.

*”Trie” is sometimes pronounced like “tree”, sometimes like “try.”

Patricia tries add a now-obvious compression technique. Namely, if there’s only one branch from a node, just collapse it. Thus, the example I gave above would become something more like i-t-i-s-a-truth-universally-acknowledged-…, or perhaps something even more compact.

While these ideas were evidently invented with text documents in mind, there’s no reason they can’t be applied to other kinds of strings – specifically, to those stored in relational databases. (And numbers can just be treated as strings of bits.) As I wrote last year in discussing solidDB, which uses a similar approach:

The canonical index structure in a disk-centric OLTP RDBMS is a tree of blocks. The record sought is in a block somewhere. There are index blocks whose entries are pointers to the correct block based on values in the index column. There are index blocks of pointers to other index blocks. And so on. One can traverse these trees in very few steps, but each step is costly, because each step involves examining the whole block.

SolidDB, by way of contrast, uses a core index structure called the trie. The key value on which the record search is based is divided into chunks of bits. Each chunk leads to a tree node with a small number of choices for the next chunk. There are more steps, but each step is much cheaper.

Benefits of this strategy include compression and in-memory performance. But a naive implementation would, as in other pointer-based systems, lead to unacceptable disk thrashing. ScaleDB’s answer is to layer the index, essentially creating a “trie of tries.” The company confidently claims that, in almost all cases, data can be found via a single disk read. Part of that story is the assertion that their indexing scheme achieves tremendous compression vs. conventional b-trees.

So far, that all sounds like a performance win, of unclear magnitude. (ScaleDB says it’s hoping for a 3X or better performance advantage versus traditional b-tree-based approaches.) But there’s another cool part as well. The ScaleDB trie doesn’t necessarily end with the first row it finds; it also reaches through to capture foreign-key relationships. E.g., if customer FOO123 places an order with OrderID BAR456, the BAR456 isn’t just found via the path B-A-R-4-5-6. It also can be found via FOO-1-2-3-BAR-456. Thus, referential integrity and updatable views are baked into the core database management architecture.

I look forward to seeing how this all works out, in Release 1 and beyond.

Edit: One way to think of this as the integration of the network and relational data models, ala IDMS/R, but with more compact linked lists. And I believe Predrag Dizdarevic when he tells me IDMS/R did wind up working pretty well, in a rare instance of a DBMS technology success post acquisition by CA.

Comments

4 Responses to “ScaleDB presents The Revenge of the Pointer”

  1. Relational purists should root for ScaleDB | DBMS2 -- DataBase Management System Services on December 9th, 2008 3:16 am

    [...] just put up a long post about a small development-stage company, ScaleDB. The punchline is that ScaleDB has a data access method — an extension of Patricia tries [...]

  2. The ever-blurring analyst/consultant line | Strategic Messaging on July 28th, 2010 11:55 pm

    [...] only — that also included a day of consulting and related prep and follow-up, a price point I stumbled into and later in various ways validated.* Big companies like Oracle and Microsoft were encouraged to [...]

  3. NewSQL thoughts | DBMS 2 : DataBase Management System Services on January 5th, 2013 1:04 pm

    [...] Akiban, Tokutek, CodeFutures (dbShards), Clustrix, Schooner (Membrain), VoltDB, ScaleBase, and ScaleDB, with GenieDB and NuoDB coming [...]

  4. Avrit on July 23rd, 2014 6:40 pm

    Avrit…

    ScaleDB | DBMS 2 : DataBase Management System Services…

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.