ScaleDB – DBMS 2 : DataBase Management System Services

Notes on indexes and index-like structures

Curt Monash — Thu, 16 Apr 2015 22:42:59 +0000

Indexes are central to database management.

My first-ever stock analyst report, in 1982, correctly predicted that index-based DBMS would supplant linked-list ones …
… and to this day, if one wants to retrieve a small fraction of a database, indexes are generally the most efficient way to go.
Recently, I’ve had numerous conversations in which indexing strategies played a central role.

Perhaps it’s time for a round-up post on indexing.

1. First, let’s review some basics. Classically:

An index is a DBMS data structure that you probe to discover where to find the data you really want.
Indexes make data retrieval much more selective and hence faster.
While indexes make queries cheaper, they make writes more expensive — because when you write data, you need to update your index as well.
Indexes also induce costs in database size and administrative efforts. (Manual index management is often the biggest hurdle for “zero-DBA” RDBMS installations.)

2. Further:

A DBMS or other system can index data it doesn’t control.
- This is common in the case of text indexing, and not just in public search engines like Google. Performance design might speak against recopying text documents. So might security.
- This capability overlaps with but isn’t exactly the same thing as an “external tables” feature in an RDBMS.
Indexes can be updated in batch mode, rather than real time.
- Most famously, this is why Google invented MapReduce.
- Indeed, in cases where you index external data, it’s almost mandatory.
Indexes written in real-time are often cleaned up in batch, or at least asynchronously with the writes.
- The most famous example is probably the rebalancing of B-trees.
- Append-only index writes call for later clean-up as well.

3. There are numerous short-request RDBMS indexing strategies, with various advantages and drawbacks. But better indexing, as a general rule, does not a major DBMS product make.

The latest example is my former clients at Tokutek, who just got sold to Percona in a presumably small deal — regrettably without having yet paid me all the money I’m owed. (By the way, the press release for that acquisition highlights TokuDB’s advantages in compression much more than it mentions straight performance.)
In a recent conversation with my clients at MemSQL, I basically heard from Nikita Shamgunov that:
- He felt that lockless indexes were essential to scale-out, and to that end …
- … he picked skip lists, not because they were the optimal lockless index, but because they were good enough and a lot easier to implement than the alternatives. (Edit: Actually, see Nikita’s comment below.)
Red-black trees are said to be better than B-trees. But they come up so rarely that I don’t really understand how they work.
solidDB did something cool with Patricia tries years ago. McObject and ScaleDB tried them too. Few people noticed or cared.

I’ll try to explain this paradox below.

4. The analytic RDBMS vendors who arose in the previous decade were generally index-averse. Netezza famously does not use indexes at all. Neither does Vertica, although the columns themselves played some of the role of indexes, especially give the flexibility in their sort orders. Others got by with much less indexing than was common in, for example, Oracle data warehouses.

Some of the reason was indexes’ drawbacks in terms of storage space and administrative overhead. Also, sequential scans can be much faster from spinning disk than more selective retrieval, so table scans often outperformed index-driven retrieval.

5. It is worth remembering that almost any data access method brings back more data than you really need, at least as an intermediate step. For starters, data is usually retrieved in whole pages, whether you need all their contents or not. But some indexing and index-alternative technologies go well beyond that.

To avoid doing true full table scans, Netezza relies on “zone maps”. These are a prominent example of what is now often called data skipping.
Bloom filters in essence hash data into a short string of bits. If there’s a hash collision, excess data is returned.
Geospatial queries often want to return data for regions that have no simple representation in the database. So instead they bring back data for a superset of the desired region, which the DBMS does know how to return.

6. Geospatial indexing is actually one of the examples that gave me the urge to write this post. There are two main geospatial indexing strategies I hear about. One is the R-tree, which basically divides things up into rectangles, rectangles within those rectangles, rectangles within those smaller rectangles, and so on. A query initially brings back the data within a set of rectangles whose union contains the desired region; that intermediate result is then checked row by row for whether it belongs in the final result set.

The other main approach to geospatial indexing is the space-filling curve. The idea behind this form of geospatial indexing is roughly:

For computational purposes, a geographic region is of course a lattice of points rather than a true 2-dimensional continuum.
So you take a lattice — perhaps in the overall shape of a square — and arrange its points in a sequence, so that each point is adjacent in some way to its predecessor.
Then regions on a plane are covered by subsequences (or unions of same).

The idea gets its name because, if you trace a path through the sequence of points, what you get is an approximation to a true space-filling curve.

7. And finally — mature DBMS use multiple indexing strategies. One of the best examples of a DBMS winning largely on the basis of its indexing approach is Sybase IQ, which popularized bitmap indexing. But when last I asked, some years ago, Sybase IQ actually used 9 different kinds of indexing. Oracle surely has yet more. This illustrates that different kinds of indexes are good in different use cases, which in turn suggests obvious reasons why clever indexing rarely gives a great competitive advantage.

Introduction to Deep Information Sciences and DeepDB

Curt Monash — Sun, 14 Apr 2013 04:33:17 +0000

I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:

DeepDB’s indexes can help you with analytic queries; hence, DeepDB is marketed as supporting OLTP (OnLine Transaction Processing) and analytics in the same system.
DeepDB is marketed as “designed for big data and the cloud”, with reference to “Volume, Velocity, and Variety”. What I could discern in support of that is mainly:
- DeepDB has been tested at up to 3 terabytes at customer sites and up to 1 billion rows internally.
- Like most other NewSQL and NoSQL DBMS, DeepDB is append-only, and hence could be said to “stream” data to disk.
- DeepDB’s indexes could at some point in the future be made to work well with non-tabular data.*
- The Deep guys have plans and designs for scale-out — transparent sharding and so on.

*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.

Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share Deep’s interest in branching out. In particular:

Akiban definitely does. (Note: Stay tuned for some next-steps company news about Akiban.)
Tokutek has planted a small stake there too.
Key-value-store-backed NuoDB and GenieDB probably leans that way. (And SanDisk evidently shut down Schooner’s RDBMS while keeping its key-value store.)
VoltDB, Clustrix, ScaleDB and MemSQL seem more strictly tabular, except insofar as text search is a requirement for everybody. (Edit: Oops; I forgot about Clustrix’s approach to JSON support.)

Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.

Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.

Among the 10 people listed as part of Deep Information Sciences’ team, I noticed 2 who arguably had DBMS industry experience, in that they worked at virtualization vendor Virtual Iron, and stayed on for a while after Virtual Iron was bought by Oracle. One of them, Chief Scientist & Architect Tom Hazel, also was at Akiban for a few months, where he did actually work on a DBMS. Other Deep Information Sciences notes include:

Deep has 25 or so people in all.
Deep had a recent $10 million funding round.
Deep Information Sciences is the former Cloudtree, which as of February, 2011 was pursuing quite a different strategy. (Evidently there was a pivot.) Deep was founded in 2010.
There are 2 paying customers for DeepDB, even though it’s still in beta, and 8 trials. A similar number of trials and strategic partners are queued up.
DeepDB general availability is expected later this quarter.

Although our call was blessedly technical, we didn’t have a chance to go through the DeepDB architecture in great detail. That said, DeepDB seems to store data in all of 3 ways:

An in-memory row store.
An on-disk row store with a very different architecture.
Indexes, which can also serve as a column store.

Notes on that include:

DeepDB’s in-memory row store is designed to manage single rows as much as possible, rather than pages. Indeed, there are “aspects of tries”, although we didn’t drill down into what exactly that meant.
Indexes are streamed to disk no less than once every 15 seconds, by default, and perhaps with latency as low as 10 milliseconds.
Perhaps the most important point I didn’t grasp is “segments”. The data and indexes on disk are stored in segments, which can be of different sizes, and which may each carry some summary data/metadata/whatever. Somehow, this is central to DeepDB’s design.
In what is evidently a design focus, DeepDB tries to get the benefit of “in-memory data” that isn’t actually taking up RAM. B-trees can point at rows that aren’t actually in memory. Segments evicted from cache can leave some metadata or summary data behind.
DeepDB’s compression story seems to be a work in progress.
- There’s prefix compression already, at least in the indexes, which Deep just calls “compaction”.
- Other compression is working in the lab, but not scheduled for Version 1.0.
  - Block compression seems to be in play.
  - Delta compression was mentioned once
  - Dictionary compression wasn’t mentioned at all.
- DeepDB apparently will keep compressed data in cache, then decompress it to operate on it.
- Different segments can be compressed/uncompressed differently.
DeepDB’s on-disk row store is append-only. Time-travel is being worked on. While I forgot to ask, it seems likely that DeepDB has MVCC (Multi-Version Concurrency Control).

And finally: DeepDB in its current form is a “drop-in” InnoDB replacement, but not necessarily bug-compatible.

NewSQL thoughts

Curt Monash — Sat, 05 Jan 2013 18:04:08 +0000

I plan to write about several NewSQL vendors soon, but first here’s an overview post. Like “NoSQL”, the term “NewSQL” has an identifiable, recent coiner — Matt Aslett in 2011 — yet a somewhat fluid meaning. Wikipedia suggests that NewSQL comprises three things:

OLTP- (OnLine Transaction Processing)/short-request-oriented SQL DBMS that are newer than MySQL.
Innovative MySQL engines.
Transparent sharding systems that can be used with, for example, MySQL.

I think that’s a pretty good working definition, and will likely remain one unless or until:

SQL-oriented and NoSQL-oriented systems blur indistinguishably.
MySQL (or PostgreSQL) laps the field with innovative features.

To date, NewSQL adoption has been limited.

NewSQL vendors I’ve written about in the past include Akiban, Tokutek, CodeFutures (dbShards), Clustrix, Schooner (Membrain), VoltDB, ScaleBase, and ScaleDB, with GenieDB and NuoDB coming soon.
But I’m dubious whether, even taken together, all those vendors have as many customers or production references as any of 10gen, Couchbase, DataStax, or Cloudant.*

That said, the problem may lie more on the supply side than in demand. Developing a competitive SQL DBMS turns out to be harder than developing something in the NoSQL state of the art.

*Revenue might be a different matter.

The main reasons for NewSQL adoption tend to fall in the areas of performance, scaling, manageability and cost. But while they all support SQL, some NewSQL DBMS have differentiated programming models even so.

Akiban wants you to consider mixing access — to the same data in the same data structures — among SQL, JSON and, say, Hibernate.
Tokutek turns a performance argument into a functionality one. In particular, Tokutek claims that TokuDB does a much better job than alternatives of making it practical for you to update indexes at OLTP speeds. Hence, it claims to do a much better job than alternatives of making it practical for you to write and execute queries that only make sense when indexes (or other analytic performance boosts) are in place.
As a trade-off for blazing in-memory performance, VoltDB is hampered by an innovative and restrictive programming model.

Also, the MySQL add-ons and lookalikes vary in the (in)completeness of their MySQL emulation or support.

The most common performance/scaling NewSQL claims are simply “We scale, giving you the power of multiple servers, with sufficiently little downside in the way of tradeoffs.” That story is central to Clustrix, VoltDB, ScaleDB, NuoDB, and to anybody active in transparent sharding. Other performance/scaling claims include but are not limited to:

Optimized for RAM (VoltDB).
Optimized for flash (Schooner/Membrain).
Writes indexes quickly (TokuDB).
Fast joins (Akiban).

Management claims include (from multiple NewSQL vendors in each case):

Little added management pain, but you get scale-out!
Little added management pain, but you get active-active/multi-master wide-area replication!
Online schema change and other uninterrupted operation features.
Not as cumbersome as Oracle.

And that’s about as much as I’m ready to generalize about the NewSQL sector. Posts about particular product and companies are on the way.

Data(base) virtualization — a terminological mess

Curt Monash — Sat, 05 Jan 2013 17:49:49 +0000

Data/database virtualization seems to be a hot subject right now, and vendors of a broad variety of different technologies are all claiming to be in the space. A terminological mess has ensued, as Monash’s First and Third Laws of Commercial Semantics are borne out in spades.

If something is like “virtualization”, then it should resemble hypervisors such as VMware. To me:

The core feature of a hypervisor is that it allows many somethings to run and coexist where ordinarily only one something would come into play. Here the “many somethings” are virtual machines and what’s going on inside them, and the “one something” is the ordinary operating system/hardware computing stack.
A core feature of original VMware was that the “many somethings” could be quite different — for example, the operating environments of numerous different hardware systems you wanted to decommission, or of new systems that you didn’t want to buy quite yet.
Important features of hypervisors include:
- The ability to have multiple virtual machines run side by side at once, safely.
- Flexible and powerful workload management if the virtual machines do contend for resources.
- Easy management.
- The negative feature of having sufficiently low overhead.

Anything that claims to be “like virtualization” should be viewed in that light. I.e., it isn’t real virtualization unless it has the ex uno plures* feature.

*”Out of one, many”. It turns out that e unum pluribus just means the same as e pluribus unum, namely “Out of many, one”; word order isn’t as important in Latin as in English.

Most commonly, “data/database virtualization” is used to denote some kind of transparent data federation.

Forrester Research, in a recent Forrester Wave, conflates that with “Information as a Service”.
Informatica’s data virtualization marketing page gives one vendor’s view as to which capabilities could be involved.
Logical data warehouse would seem to be a related concept.

I think “virtualization” is a bad name for this, because there isn’t much ex uno plures going on. But at least it’s a name that’s in widespread use.

More solid is the sense of “database virtualization” used by Delphix. Their core idea is to take all your different database copies for product, test, development, archiving and so on, and to the extent possible turn them into one real database, plus a bunch of diffs. Cost savings are obvious if that works. The ex uno plures feature is present.

Recently, I’ve noticed that transparent sharding is being referred to as database virtualization, especially by ParElastic. Transparent sharding is a great feature, but I don’t think calling it “database virtualization” makes much sense.

I noted back in October that the essence of multitenancy is a special-case version of ex uno plures. If somebody offered that and wanted to call it “virtualization”, I might not argue too much.

Weirdest of all is ScaleDB’s use of the term. ScaleDB seems to be claiming that:

Any interesting database topology should be called “database virtualization”.
The highest and best form of database virtualization is a clustered, shared-everything DBMS approach such as Oracle RAC.

Neither logic nor language support ScaleDB’s side.

Soundbites: the Facebook/MySQL/NoSQL/VoltDB/Stonebraker flap, continued

Curt Monash — Fri, 15 Jul 2011 08:27:18 +0000

As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:

Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular:
- They have the technical chops to rewrite their code as needed.
- Unlike packaged software vendors, they’re not answerable to anybody for keeping legacy code alive after a rewrite. That makes migration a lot easier.
- If they want to write different parts of their system on different technical underpinnings, nobody can stop them. For example …
- … Facebook innovated Cassandra, and is now heavily committed to HBase.
It makes little sense to talk of Facebook’s use of “MySQL.” Better to talk of Facebook’s use of “MySQL + memcached + non-transparent sharding.” That said:
- It’s hard to see why somebody today would use MySQL + memcached + non-transparent sharding for a new project. At least one of Couchbase or transparently-sharded MySQL is very likely a superior alternative. Other alternatives might be better yet.
- As noted above in the example of Facebook, the many major web businesses that are using MySQL + memcached + non-transparent sharding for existing projects can be presumed able to migrate away from that stack as the need arises.

Continuing with that discussion of DBMS alternatives:

If you just want to write to the memcached API anyway, why not go with Couchbase?
If you want to go relational, why not go with MySQL? There are many alternatives for scaling or accelerating MySQL — dbShards, Schooner, Akiban, Tokutek, ScaleBase, ScaleDB, Clustrix, and Xeround come to mind quickly, so there’s a great chance that one or more will fit your use case. (And if you don’t get the choice of MySQL flavor right the first time, porting to another one shouldn’t be all THAT awful.)
If you really, really want to go in-memory, and don’t mind writing Java stored procedures, and don’t need to do the kinds of joins it isn’t good at, but do need to do the kinds of joins it is, VoltDB could indeed be a good alternative.

And while we’re at it — going schema-free often makes a whole lot of sense. I need to write much more about the point, but for now let’s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.

Notes on short-request scale-out MySQL

Curt Monash — Tue, 19 Apr 2011 09:52:28 +0000

A press person recently asked about:

… start-ups that are building technologies to enable MySQL and other SQL databases to get over some of the problems they have in scaling past a certain size. … I’d like to get a sense as to whether or not the problems are as severe and wide spread as these companies are telling me? If so, why wouldn’t a customer just move to a new database?

While that sounds as if he was asking about scale-out relational DBMS in general, MySQL or otherwise, short-request or analytic, it turned out that he was asking just about short-request scale-out MySQL. My thoughts and comments on that narrower subject include(d) but are not limited to:

The biggest web companies had to go to non-transparently sharded MySQL years ago. The NoSQL movement is, in no small part, an attempt to improve upon that. Ditto for scale-out short-request MySQL.
Some overlapping categories of companies or projects who need scale-out short-request database processing are:
- The aforementioned big companies who have other applications they haven’t hand-sharded yet.
- Other web companies whose applications are getting that big.
- Conventional enterprises whose web efforts happen to be very big.
- Sensor networks and other massive sources of machine-generated data.
- Certain specialized areas (e.g., financial trading).

Relatively few of these applications are totally impossible to do in Oracle. But the Oracle approach might be very expensive.
In particular, there’s a break point when companies — often SaaS vendors — outgrow Oracle Standard Edition.
Yes, the alternatives usually are one of MySQL or Oracle.
InnoDB isn’t an alternative to these newer technologies; it’s just a piece of the puzzle and indeed of default MySQL now. Several of them — e.g. dbShards — are meant to be used in conjunction with InnoDB.
Merging his list and mine, the high-performance/scale-out MySQL alternatives look like dbShards, Schooner, ScaleBase, ScaleDB, Tokutek, Akiban, Xeround, and Clustrix. The first two are to my knowledge more proven than the rest.
Proprietary hardware and the associated hardware/appliance pricing aren’t very appealing for these applications. That speaks against Oracle Exadata and Clustrix, and is the reason Schooner switched to a software-only strategy despite some initial appliance sales.
However, hardware band-aids such as solid-state drives or even RAM-based solid-state storage could make more sense:
- If, for performance, you’ve scaling out your database so that it fits in RAM on each box, you don’t really have a disk-based architecture anyway, now do you?
- Even if you’re not doing that yet — if your problem is throughput rather than storage capacity, silicon-based storage could be a big help.
- In principle, devices of that kind can be moved from one application to another, after the first one is rearchitected not to need them. (In practice, however, I don’t know of anybody who is doing that. I also don’t believe that Kaminario et al. are marketing that kind of idea, more’s the pity.)
My notes on all this from April, 2010 are already badly outdated, but may be interesting anyway.

I’m collecting data points on NoSQL and HVSP adoption

Curt Monash — Wed, 18 Aug 2010 13:09:08 +0000

I was asked to do a magazine article on NoSQL, where by “NoSQL” is meant “whatever they talk about at NoSQL conferences.” By now the number of publications planning to run the article is up to 2, the deadline is next week and, crucially, it has been agreed that I may talk about HVSP in general, NoSQL and SQL alike.

It also is understood that, realistically, I can’t be expected to know and mention the very latest news for all the many products in the categories. Even so, I think this would be fine time to check just where NoSQL and HVSP adoption stand. Here is most of what I know, or links to same; it would be great if you guys would contribute additional data in the comment thread.

In the NoSQL area:

Back in April, the VoltDB guys told me they thought Cassandra and HBase were the two NoSQL systems with the most momentum.
I know distressingly little about HBase adoption, but a source who may or may not wish to remain anonymous was kind enough to alert me that Twitter and StumbleUpon each have ~30 node deployments, for analytics and analytics/HVSP respectively.
I wrote in detail on Cassandra adoption last month. News since then includes:
- Facebook is rumored to have dropped Cassandra completely.
- Twitter clarified that it may not be quite as lovestruck by Cassandra as before, but they’re still very close friends.
- It’s not obvious that the Cassandra Summit unveiled a lot of new adoption stories.
Northscale’s Membase is still in its early days. Zynga is bought in, however, as is something called NHN Korea. (Edit: I subsequently saw NHN Korea on a prominent SEO expert’s list of the top half dozen or so search engines in the world. Who knew?)
Basho has listed a few Riak customers. If memory serves (I haven’t spoken with Basho for a while, and some of my notes are misplaced due to some computer sloppiness), Basho has a few dozen customers in total.
Mozilla has a 4 machine, 64 core Riak cluster in production.
Hypertable has a few users/project sponsors, Baidu being the biggest name among them.
I don’t really know how the MongoDB/10gen guys are doing. I think this is at least as much my fault as theirs. Anyhow, they seem to have links to a couple of folks who have written about MongoDB usage.
NimbusDB is still in stealth mode. I’d be surprised if they had users for a while yet, since in January they didn’t yet sound as if development was very far underway. (Actually, I forget whether NimbusDB is supposed to be SQL-based or not.)

Among the SQL or SQL-friendly guys:

Clustrix says it has a few production users, some big-name, but is not disclosing them yet.
dbShards has around 6 customers, including Facebook. (Facebook may outpace even Twitter and Zynga in using the most products mentioned in this post.)
As of May, VoltDB had one paying customer, plus 150 beta customers who weren’t in production yet.
Akiban says they’ll get me up to speed on Thursday.
ScaleDB seems to be pedaling along in perennial beta. Whether ScaleDB has any actual beta users is less clear. On the plus side, checking that out uncovered a pretty funny April Fool blog post.
Groovy Corporation seems to have disappeared, or morphed into something called uCirrus, or something like that.

ScaleDB presents The Revenge of the Pointer

Curt Monash — Sun, 13 Apr 2008 14:03:42 +0000

The MySQL user conference is upon us, and hence so are MySQL-related product announcements, including storage engines. One such is Kickfire. ScaleDB — smaller and earlier-stage — is another.

In a nutshell, ScaleDB’s proposition is:

Innovative approach to indexing relational DBMS, providing performance advantages.
Shared-everything scale-up that ScaleDB believes will leapfrog the MySQL engine competition already in Release 1. (In my opinion, this is the least plausible part of the ScaleDB story.)
State-of-the-art me-too facilities for locking, logging, replication/fail-over, etc., also already in Release 1.

Like many software companies with non-US roots, ScaleDB seems to have started with a single custom project, using a Patricia trie indexing system. Then they decided Patricia tries might be really useful for relational OLTP as well. The ScaleDB team now features four developers, plus half-time or so “Chief Architect” involvement from Vern Watts. Watts seems to pretty much have been Mr. IMS for the past four decades, and thus surely knows a whole lot about pointer-based database management systems; presumably, he’s responsible for the generic DBMS design features that are being added to the innovative indexing scheme. On ScaleDB’s advisory board is PeopleSoft veteran Rick Berquist, about whom I’ve had fond thoughts ever since he talked me into focusing on consulting as the core of my business.*

*More precisely, Rick pretty much tricked me into doing a day of consulting for $15K, then revealed that’s what he’d done, expressing the thought that he’d very much gotten his money’s worth. But I digress …

ScaleDB has no customers to date, but hopes to be in beta by the end of this year. Angels and a small VC firm have provided bridge loans; otherwise, ScaleDB has no outside investment. ScaleDB’s business model thoughts include:

$1,000/server/year license fee, or something in that range.
Early focus on Web 2.0 kinds of customers (e.g., social networking companies may enjoy the join performance ScaleDB plans to offer).
Early focus on MySQL OLTP (but, like proud parents everywhere, they think the technology is so wonderful that it could eventually be pretty much all things to all people).

The company is based in Menlo Park, CA.

Probably I should explain what Patricia tries actually are, and how they can help relational DBMS. An ordinary trie* is a way of indexing data that looks a lot like – unsurprisingly – a tree. For example, suppose you need to index a lot of character strings, each consisting of lower-case Latin letters. From the root node you point to the 26 possibilities for starting letter. From those you point to the next possible letter, and so on. Combinatorial explosion is averted because you only have edges if there’s actually a string with that letter combination. Thus, when indexing a corpus of classic novels, there might be a path i-t-i-s-a-t-r-u-t-h-u-n-… and so on, but none that starts i-a-u-z-z-z.

*”Trie” is sometimes pronounced like “tree”, sometimes like “try.”

Patricia tries add a now-obvious compression technique. Namely, if there’s only one branch from a node, just collapse it. Thus, the example I gave above would become something more like i-t-i-s-a-truth-universally-acknowledged-…, or perhaps something even more compact.

While these ideas were evidently invented with text documents in mind, there’s no reason they can’t be applied to other kinds of strings – specifically, to those stored in relational databases. (And numbers can just be treated as strings of bits.) As I wrote last year in discussing solidDB, which uses a similar approach:

The canonical index structure in a disk-centric OLTP RDBMS is a tree of blocks. The record sought is in a block somewhere. There are index blocks whose entries are pointers to the correct block based on values in the index column. There are index blocks of pointers to other index blocks. And so on. One can traverse these trees in very few steps, but each step is costly, because each step involves examining the whole block.

SolidDB, by way of contrast, uses a core index structure called the trie. The key value on which the record search is based is divided into chunks of bits. Each chunk leads to a tree node with a small number of choices for the next chunk. There are more steps, but each step is much cheaper.

Benefits of this strategy include compression and in-memory performance. But a naive implementation would, as in other pointer-based systems, lead to unacceptable disk thrashing. ScaleDB’s answer is to layer the index, essentially creating a “trie of tries.” The company confidently claims that, in almost all cases, data can be found via a single disk read. Part of that story is the assertion that their indexing scheme achieves tremendous compression vs. conventional b-trees.

So far, that all sounds like a performance win, of unclear magnitude. (ScaleDB says it’s hoping for a 3X or better performance advantage versus traditional b-tree-based approaches.) But there’s another cool part as well. The ScaleDB trie doesn’t necessarily end with the first row it finds; it also reaches through to capture foreign-key relationships. E.g., if customer FOO123 places an order with OrderID BAR456, the BAR456 isn’t just found via the path B-A-R-4-5-6. It also can be found via FOO-1-2-3-BAR-456. Thus, referential integrity and updatable views are baked into the core database management architecture.

I look forward to seeing how this all works out, in Release 1 and beyond.

Edit: One way to think of this as the integration of the network and relational data models, ala IDMS/R, but with more compact linked lists. And I believe Predrag Dizdarevic when he tells me IDMS/R did wind up working pretty well, in a rare instance of a DBMS technology success post acquisition by CA.