VoltDB and H-Store – DBMS 2 : DataBase Management System Services

Optimism, pessimism, and fatalism — fault-tolerance, Part 2

Curt Monash — Sun, 08 Jun 2014 16:58:35 +0000

The pessimist thinks the glass is half-empty.
The optimist thinks the glass is half-full.
The engineer thinks the glass was poorly designed.

Most of what I wrote in Part 1 of this post was already true 15 years ago. But much gets added in the modern era, considering that:

Clusters will have node hiccups more often than single nodes will. (Duh.)
Networks are relatively slow even when uncongested, and furthermore congest unpredictably.
In many applications, it’s OK to sacrifice even basic-seeming database functionality.

And so there’s been innovation in numerous cluster-related subjects, two of which are:

Distributed query and update. When a database is distributed among many modes, how does a request access multiple nodes at once?
Fault-tolerance in long-running jobs.When a job is expected to run on many nodes for a long time, how can it deal with failures or slowdowns, other than through the distressing alternatives:
- Start over from the beginning?
- Keep (a lot of) the whole cluster’s resources tied up, waiting for things to be set right?

Distributed database consistency

When a distributed database lives up to the same consistency standards as a single-node one, distributed query is straightforward. Performance may be an issue, however, which is why we have seen a lot of:

Analytic RDBMS innovation.
Short-request applications designed to avoid distributed joins.
Short-request clustered RDBMS that don’t allow fully-general distributed joins in the first place.

But in workloads with low-latency writes, living up to those standards is hard. The 1980s approach to distributed writing was two-phase commit (2PC), which may be summarized as:

A write is planned and parceled out to occur on all the different nodes where the data needs to be placed.
Each node decides it’s ready to commit the write.
Each node informs the others of its readiness.
Each node actually commits.

Unfortunately, if any of the various messages in the 2PC process is delayed, so is the write. This creates way too much likelihood of work being blocked. And so modern approaches to distributed data writing are more … well, if I may repurpose the famous Facebook slogan, they tend to be along the lines of “Move fast and break things”,* with varying tradeoffs among consistency, other accuracy, reliability, functionality, manageability, and performance.

By the way — Facebook recently renounced that motto, in favor of “Move fast with stable infrastructure.” Hmm …

Back in 2010, I wrote about various approaches to consistency, with the punch line being:

A conventional relational DBMS will almost always feature RYW consistency. Some NoSQL systems feature tunable consistency, in which — depending on your settings — RYW consistency may or may not be assured.

The core ideas of RYW consistency, as implemented in various NoSQL systems, are:

Let N = the number of copies of each record distributed across nodes of a parallel system.

Let W = the number of nodes that must successfully acknowledge a write for it to be successfully committed. By definition, W <= N.

Let R = the number of nodes that must send back the same value of a unit of data for it to be accepted as read by the system. By definition, R <= N.

The greater N-R and N-W are, the more node or network failures you can typically tolerate without blocking work.

As long as R + W > N, you are assured of RYW consistency.

That bolded part is the key point, and I suggest that you stop and convince yourself of it before reading further.

Eventually :), Dan Abadi claimed that the key distinction is synchronous/asynchronous — is anything blocked while waiting for acknowledgements? From many people, that would simply be an argument for optimistic locking, in which all writes go through, and conflicts — of the sort that locks are designed to prevent — cause them to be rolled back after-the-fact. But Dan isn’t most people, so I’m not sure — especially since the first time I met Dan was to discuss VoltDB predecessor H-Store, which favors application designs that avoid distributed transactions in the first place.

One idea that’s recently gained popularity is a kind of semi-synchronicity. Writes are acknowledged as soon as they arrive at a remote node (that’s the synchronous part). Each node then updates local permanent storage on its own, with no further confirmation. I first heard about this in the context of replication, and generally it seems designed for replication-oriented scenarios.

Single-job fault-tolerance

Finally, let’s consider fault-tolerance within a single long-running job, whether that’s a big query or some other kind of analytic task. In most systems, if there’s a failure partway through a job, they just say “Oops!” and start it over again. And in non-extreme cases, that strategy is often good enough.

Still, there are a lot of extreme workloads these days, so it’s nice to absorb a partial failure without entirely starting over.

Hadoop MapReduce, which stores intermediate results anyway, finds it easy to replay just the parts of the job that went awry.
Spark, which is more flexible in execution graph and data structures alike, has a similar capability.

Additionally, both Hadoop and Spark support speculative execution, in which several clones of a processing step are executed at once (presumably on different nodes), to hedge against the risk that any one copy of the process runs slowly or fails outright. According to my notes, speculative execution is a major part of NuoDB’ architecture as well.

Further topics

I’ve rambled on for two long posts, which seems like plenty — but this survey is in no way complete. Other subjects I could have covered include but are hardly limited to:

Occasionally-connected operation, which for example is a design point of CouchDB, SQL Anywhere (sort of), and most kinds of mobile business intelligence.
Avoiding planned downtime — i.e., operating despite self-inflicted wounds.
Data cleaning and master data management, both of which exist in large part to fix errors people have made in the past.

Related links

Uninterrupted DBMS operation (September, 2012)
The cardinal rules of DBMS development (March, 2013)
Bottleneck Whack-A-Mole (August, 2009)

Notes and comments, May 6, 2014

Curt Monash — Tue, 06 May 2014 13:46:54 +0000

After visiting California recently, I made a flurry of posts, several of which generated considerable discussion.

My claim that Spark will replace Hadoop MapReduce got much Twitter attention — including some high-profile endorsements — and also some responses here.
My MemSQL post led to a vigorous comparison of MemSQL vs. VoltDB.
My post on hardware and storage spawned a lively discussion of Hadoop hardware pricing; even Cloudera wound up disagreeing with what I reported Cloudera as having said. Sadly, there was less response to the part about the partial (!) end of Moore’s Law.
My Cloudera/SQL/Impala/Hive apparently was well-balanced, in that it got attacked from multiple sides via Twitter & email. Apparently, I was too hard on Impala, I was too hard on Hive, and I was too hard on boxes full of cardboard file cards as well.
My post on the Intel/Cloudera deal garnered a comment reminding us Dell had pushed the Intel distro.
My CitusDB post picked up a few clarifying comments.

Here is a catch-all post to complete the set.

1. The recently-announced Cloudera/MongoDB relationship* is still at the Barney stage. That said, I’m optimistic that their stated intention to add substance to the relationship will eventually come to fruition. If nothing else, the two companies have high regard for each other, at least at the Mike Olson/Max Schireson level.

*That’s one of numerous deals with my fingerprints on it, but in this case only lightly. It was probably on track to happen even without my nudges.

2. Most of what I talked about when I visited MongoDB is confidential; the public stuff was mainly in my recent MongoDB technology post. But in one exception, I asked Max for an update as to MongoDB enterprise use cases. He reported a cluster in data combination, especially but not only in use cases which have both a high-volume part and dynamic-schema aspects. Specific examples Max cited included:

Tracking financial holdings from a variety of asset classes — especially if derivatives are involved, because they have a dynamic-schema aspect.
Product catalogs, including for use on web sites.
Customer information.
Patient information.

3. I didn’t ask everybody I saw in California about business trends, and much of what we did discuss was confidential. That said:

MapR was proud of its numbers.
So was DataStax.
ClearStory has a bunch of Very Big Enterprises as customers, mainly but not only in consumer sectors (e.g. retail, packaged goods).

4. Platfora is focusing a bit, starting with clickstream and security — i.e., event series stuff. And by the way, they report that the term “event series” is working well for them.

5. I gather from a variety of comments and conversations that Amazon Redshift has achieved considerable traction.

6. Something I can’t find evidence of having posted before: I think multiple businesses monitor online sales or similar business successes as a guide to network problems. eBay did this via a custom in-memory MOLAP (Multidimensional Online Analytic Process) system years ago. Best evidence that this is hardly restricted to eBay: all the “me-too” responses I get from telling that story.

7. Citus Data tells me that as of PostgreSQL 9.4, Postgres will be able to return just the part of a JSON column needed for a query. This is as opposed to storing the whole thing as text and only retrieving it in its entirety.

8. In the comments to my “Spark on fire” post, Patrick McFadin pointed out that Mahout is transitioning from MapReduce to Spark. (All new work will be on Spark, although old MapReduce-based routines will continue to be supported.) It turns out that Derrick Harris wrote about that over a month ago, and I just missed the news.

9. Also in predictive analytics — there are rumblings that R could eventually be supplanted by Julia, although R’s massive libraries of algorithms still give it the advantage now.

10. Multiple vendors, fed up with the intermittent slowdowns from garbage collection, are moving some processing off the Java heap. Unfortunately, I neglected to ask any of them what the remaining differences then were between Java and C++ programming.

11. And to finish on a light note: BDAS — the project of which Spark is only a part — is pronounced “bad-ass”, something I first heard from Dave Patterson.

MemSQL update

Curt Monash — Fri, 02 May 2014 03:40:39 +0000

I stopped by MemSQL last week, and got a range of new or clarified information. For starters:

Even though MemSQL (the product) was originally designed for OLTP (OnLine Transaction Processing), MemSQL (the company) is now focused on analytic use cases …
… which was the point of introducing MemSQL’s flash-based columnar option.
One MemSQL customer has a 100 TB “data warehouse” installation on Amazon.
Another has “dozens” of terabytes of data spread across 500 machines, which aggregate 36 TB of RAM.
At customer Shutterstock, 1000s of non-MemSQL nodes are monitored by 4 MemSQL machines.
A couple of MemSQL’s top references are also Vertica flagship customers; one of course is Zynga.
MemSQL reports encountering Clustrix and VoltDB in a few competitive situations, but not NuoDB. MemSQL believes that VoltDB is still hampered by its traditional issues — Java, reliance on stored procedures, etc.

On the more technical side:

Some MemSQL users are running 7- or 8-way joins and other long-ish SQL statements.
But MemSQL doesn’t yet have fully peer-to-peer data redistribution.
- MemSQL “leaves” only talk to MemSQL “aggregator nodes,” not each other …
- … but note the plural on “aggregator nodes”, which should immunize MemSQL from the worst of “fat head” bottlenecks.
- Of course, you can sometimes get join locality by sharding multiple tables on the same key …
- … or by broadcast-replicating tables that are sufficiently small.
Better SQL coverage — e.g. SQL Windowing — is coming soon.
MemSQL believes it has an aggressive data skipping story.
MemSQL doesn’t yet have a true workload management story; they’re still at the stage “Our queries run so fast not many of them have to be active at once, and if things nevertheless get too busy we have some throttling capabilities.” But MemSQL at least sounds aware of the difference between that and true workload management, which puts them ahead of some other vendors I talk with.
MemSQL doesn’t have stored procedures. In particular, since MemSQL (the product) generates code on the fly, MemSQL (the company) doesn’t think the performance benefits of stored procedure pre-compilation are needed.

And finally, MemSQL’s column-store compression story — which I mangled in a previous post — goes like this:

There are numerous compression algorithm choices, both columnar (e.g. dictionary/tokenization, run-length encoding) and block (Lempel-Ziv, I presume in multiple variations).
Compression is block-by-block, something I hear more commonly these days than Vertica’s alternative of global compression choices.
The choice of compression scheme is automagic for each block, unless you give explicit hints.
Default block size for the columnar store is 10 million rows.

Comments on the 2013 Gartner Magic Quadrant for Operational Database Management Systems

Curt Monash — Fri, 08 Nov 2013 16:46:46 +0000

The 2013 Gartner Magic Quadrant for Operational Database Management Systems is out. “Operational” seems to be Gartner’s term for what I call short-request, in each case the point being that OLTP (OnLine Transaction Processing) is a dubious term when systems omit strict consistency, and when even strictly consistent systems may lack full transactional semantics. As is usually the case with Gartner Magic Quadrants:

I admire the raw research.
The opinions contained are generally reasonable (especially since Merv Adrian joined the Gartner team).
Some of the details are questionable.
There’s generally an excessive focus on Gartner’s perception of vendors’ business skills, and on vendors’ willingness to parrot all the buzzphrases Gartner wants to hear.
The trends Gartner highlights are similar to those I see, although our emphasis may be different, and they may leave some important ones out. (Big omission — support for lightweight analytics integrated into operational applications, one of the more genuine forms of real-time analytics.)

Anyhow:

The 2013 Gartner Magic Quadrant for Operational Database Management Systems puts Oracle in the lead, closely followed in some order Microsoft, SAP, and IBM, with everybody else way behind. That’s reasonable, harkening back to the time when Oracle, IBM, Microsoft and to some extent Sybase were seemingly secure oligopolists, and most of the other vendors mentioned didn’t yet exist.
Gartner seems to view a proprietary appliance strategy as good for customers, without mentioning that it’s also a way to sell hardware at ridiculous prices.
Gartner evidently likes memory-centric positioning. SAP, Aerospike, VoltDB and McObject all get surprisingly high marks.
Gartner gives Intersystems pretty high marks, while Progress Software isn’t even mentioned. Despite Progress’ recent restructuring, I’d think the core Progress OpenEdge business — arguably Intersystems’ closest rival — deserves more respect than that. (But given how rarely I write about it myself, perhaps I shouldn’t criticize.)
Gartner has long been oddly positive on Actian, which is a floundering hodgepodge of half a dozen database also-rans. I like Mike Hoskins a lot too, but just how much has Actian’s supposedly “energized” “strong leadership” accomplished in the recent past, at Actian or elsewhere?
Gartner has brutally low “vision” rankings for NuoDB and Clustrix. I think scaling out SQL effectively is more impressive than that. Gartner also omits to mention Clustrix’s past as an appliance vendor.
Gartner refers to Oracle’s multi-tenancy support as if … well, as if it supported multi-tenancy.
I don’t understand Gartner’s rankings of or comments about NoSQL vendors. For example:
- Three “strengths” are mentioned for MongoDB, yet none reference MongoDB’s developer outreach, which may be second only to prime Microsoft’s.
- HBase is discussed as if the Hadoop vendors were still pushing it hard, or if it were showing up in a lot of enterprise evaluations.
- Geo-distribution is mentioned as a strength for Riak, yet not for Cassandra.
Every Gartner Magic Quadrant (or Forrester Wave) features one or more outright brain cramps. In this one:
- Gartner writes “the Clustrix database offers no support for data types beyond traditional relational types,” when in fact Clustrix was one of the early indicators of a trend toward relational DBMS JSON support.
- Gartner suggests that EnterpriseDB’s Oracle compatibility is something new, when it was actually the company’s whole strategy 6-7 years ago.

Finally, since I’ve struggled with the definition of “DBMS”, I’ll finish by quoting with approval the start of Gartner’s:

We define a DBMS as a complete software system used to define, create, manage, update and query a database.

Related links

Comments on the most recent Gartner Magic Quadrant for Data Warehouse Database Management Systems
My definition of operational analytics

Introduction to Deep Information Sciences and DeepDB

Curt Monash — Sun, 14 Apr 2013 04:33:17 +0000

I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:

DeepDB’s indexes can help you with analytic queries; hence, DeepDB is marketed as supporting OLTP (OnLine Transaction Processing) and analytics in the same system.
DeepDB is marketed as “designed for big data and the cloud”, with reference to “Volume, Velocity, and Variety”. What I could discern in support of that is mainly:
- DeepDB has been tested at up to 3 terabytes at customer sites and up to 1 billion rows internally.
- Like most other NewSQL and NoSQL DBMS, DeepDB is append-only, and hence could be said to “stream” data to disk.
- DeepDB’s indexes could at some point in the future be made to work well with non-tabular data.*
- The Deep guys have plans and designs for scale-out — transparent sharding and so on.

*For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.

Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same extent DeepDB is. However, if we’re interpreting “big data” to include multi-structured data support — well, only half or so of the NewSQL products and companies I know of share Deep’s interest in branching out. In particular:

Akiban definitely does. (Note: Stay tuned for some next-steps company news about Akiban.)
Tokutek has planted a small stake there too.
Key-value-store-backed NuoDB and GenieDB probably leans that way. (And SanDisk evidently shut down Schooner’s RDBMS while keeping its key-value store.)
VoltDB, Clustrix, ScaleDB and MemSQL seem more strictly tabular, except insofar as text search is a requirement for everybody. (Edit: Oops; I forgot about Clustrix’s approach to JSON support.)

Edit: MySQL has some sort of an optional NoSQL interface, and hence so presumably do MySQL-compatible TokuDB, GenieDB, Clustrix, and MemSQL.

Also, some of those products do not today have the transparent scale-out that Deep plans to offer in the future.

Among the 10 people listed as part of Deep Information Sciences’ team, I noticed 2 who arguably had DBMS industry experience, in that they worked at virtualization vendor Virtual Iron, and stayed on for a while after Virtual Iron was bought by Oracle. One of them, Chief Scientist & Architect Tom Hazel, also was at Akiban for a few months, where he did actually work on a DBMS. Other Deep Information Sciences notes include:

Deep has 25 or so people in all.
Deep had a recent $10 million funding round.
Deep Information Sciences is the former Cloudtree, which as of February, 2011 was pursuing quite a different strategy. (Evidently there was a pivot.) Deep was founded in 2010.
There are 2 paying customers for DeepDB, even though it’s still in beta, and 8 trials. A similar number of trials and strategic partners are queued up.
DeepDB general availability is expected later this quarter.

Although our call was blessedly technical, we didn’t have a chance to go through the DeepDB architecture in great detail. That said, DeepDB seems to store data in all of 3 ways:

An in-memory row store.
An on-disk row store with a very different architecture.
Indexes, which can also serve as a column store.

Notes on that include:

DeepDB’s in-memory row store is designed to manage single rows as much as possible, rather than pages. Indeed, there are “aspects of tries”, although we didn’t drill down into what exactly that meant.
Indexes are streamed to disk no less than once every 15 seconds, by default, and perhaps with latency as low as 10 milliseconds.
Perhaps the most important point I didn’t grasp is “segments”. The data and indexes on disk are stored in segments, which can be of different sizes, and which may each carry some summary data/metadata/whatever. Somehow, this is central to DeepDB’s design.
In what is evidently a design focus, DeepDB tries to get the benefit of “in-memory data” that isn’t actually taking up RAM. B-trees can point at rows that aren’t actually in memory. Segments evicted from cache can leave some metadata or summary data behind.
DeepDB’s compression story seems to be a work in progress.
- There’s prefix compression already, at least in the indexes, which Deep just calls “compaction”.
- Other compression is working in the lab, but not scheduled for Version 1.0.
  - Block compression seems to be in play.
  - Delta compression was mentioned once
  - Dictionary compression wasn’t mentioned at all.
- DeepDB apparently will keep compressed data in cache, then decompress it to operate on it.
- Different segments can be compressed/uncompressed differently.
DeepDB’s on-disk row store is append-only. Time-travel is being worked on. While I forgot to ask, it seems likely that DeepDB has MVCC (Multi-Version Concurrency Control).

And finally: DeepDB in its current form is a “drop-in” InnoDB replacement, but not necessarily bug-compatible.

NewSQL thoughts

Curt Monash — Sat, 05 Jan 2013 18:04:08 +0000

I plan to write about several NewSQL vendors soon, but first here’s an overview post. Like “NoSQL”, the term “NewSQL” has an identifiable, recent coiner — Matt Aslett in 2011 — yet a somewhat fluid meaning. Wikipedia suggests that NewSQL comprises three things:

OLTP- (OnLine Transaction Processing)/short-request-oriented SQL DBMS that are newer than MySQL.
Innovative MySQL engines.
Transparent sharding systems that can be used with, for example, MySQL.

I think that’s a pretty good working definition, and will likely remain one unless or until:

SQL-oriented and NoSQL-oriented systems blur indistinguishably.
MySQL (or PostgreSQL) laps the field with innovative features.

To date, NewSQL adoption has been limited.

NewSQL vendors I’ve written about in the past include Akiban, Tokutek, CodeFutures (dbShards), Clustrix, Schooner (Membrain), VoltDB, ScaleBase, and ScaleDB, with GenieDB and NuoDB coming soon.
But I’m dubious whether, even taken together, all those vendors have as many customers or production references as any of 10gen, Couchbase, DataStax, or Cloudant.*

That said, the problem may lie more on the supply side than in demand. Developing a competitive SQL DBMS turns out to be harder than developing something in the NoSQL state of the art.

*Revenue might be a different matter.

The main reasons for NewSQL adoption tend to fall in the areas of performance, scaling, manageability and cost. But while they all support SQL, some NewSQL DBMS have differentiated programming models even so.

Akiban wants you to consider mixing access — to the same data in the same data structures — among SQL, JSON and, say, Hibernate.
Tokutek turns a performance argument into a functionality one. In particular, Tokutek claims that TokuDB does a much better job than alternatives of making it practical for you to update indexes at OLTP speeds. Hence, it claims to do a much better job than alternatives of making it practical for you to write and execute queries that only make sense when indexes (or other analytic performance boosts) are in place.
As a trade-off for blazing in-memory performance, VoltDB is hampered by an innovative and restrictive programming model.

Also, the MySQL add-ons and lookalikes vary in the (in)completeness of their MySQL emulation or support.

The most common performance/scaling NewSQL claims are simply “We scale, giving you the power of multiple servers, with sufficiently little downside in the way of tradeoffs.” That story is central to Clustrix, VoltDB, ScaleDB, NuoDB, and to anybody active in transparent sharding. Other performance/scaling claims include but are not limited to:

Optimized for RAM (VoltDB).
Optimized for flash (Schooner/Membrain).
Writes indexes quickly (TokuDB).
Fast joins (Akiban).

Management claims include (from multiple NewSQL vendors in each case):

Little added management pain, but you get scale-out!
Little added management pain, but you get active-active/multi-master wide-area replication!
Online schema change and other uninterrupted operation features.
Not as cumbersome as Oracle.

And that’s about as much as I’m ready to generalize about the NewSQL sector. Posts about particular product and companies are on the way.

Many kinds of memory-centric data management

Curt Monash — Sun, 08 Apr 2012 01:33:31 +0000

I’m frequently asked to generalize in some way about in-memory or memory-centric data management. I can start:

The desire for human real-time interactive response naturally leads to keeping data in RAM.
Many databases will be ever cheaper to put into RAM over time, thanks to Moore’s Law. (Most) traditional databases will eventually wind up in RAM.
However, there will be exceptions, mainly on the machine-generated side. Where data creation and RAM data storage are getting cheaper at similar rates … well, the overall cost of RAM storage may not significantly decline.

Getting more specific than that is hard, however, because:

The possibilities for in-memory data storage are as numerous and varied as those for disk.
The individual technologies and products for in-memory storage are much less mature than those for disk.
Solid-state options such as flash just confuse things further.

Consider, for example, some of the in-memory data management ideas kicking around.

In many cases there is essentially an in-memory DBMS, trying for as much ACIDity as RAM reasonably allows, then (usually) also copying data synchronously to persistent storage. These can have many different architectures. For example:
- SAP HANA is an in-memory columnar DBMS, with text indexing/inverted-list antecedents, except when it uses one of a couple of approaches to in-memory row-based data management.
- solidDB, now an IBM product, is an RDBMS that relies on Patricia tries. It is actually a hybrid memory/disk product, but optimized for in-memory operation.
- eXtremeDB is an OODBMS, but relies on B-trees.
- H-Store and its commercialization VoltDB are row-based RDBMS that make drastic assumptions about the nature of your workload, but in return get to drop much of the overhead other DBMS need.
- Oracle TimesTen is a row-based RDBMS, oriented to OLTP (OnLine Transaction Processing), which stores its data persistently via another RDBMS. (MySQL was the default choice before Oracle bought the company.)
- Oracle’s answer to SAP HANA is to take TimesTen and do analytics on it, via the Exalytics appliance.
Some disk-based DBMS just happen to be architected in ways so that for good performance you’re going to want to keep all the data in RAM. Often, their in-memory architecture is lot like their on-disk architecture, with memory mapping for I/O. This is done in very different kinds of DBMS.
- MongoDB is one visible example. In general, scale-out web databases (whether NoSQL or MySQL) often keep all their data in RAM, whether or not that plan is baked into the DBMS architecture.
- Various analytic DBMS vendors have at time been memory-oriented. At the moment, I think:
  - Exasol (columnar) isn’t quite as extreme about wanting to be in-memory as it used to be.
  - ParAccel (columnar) and its memory-mapped architecture can be happily used either in-memory or on disk.
  - Kognitio (row-based), which used to be portrayed as a disk-based system that’s smart about using RAM, is currently being marketed as an in-memory system.
My last technical briefing on Applix TM1 (now an IBM Cognos product) was in September, 2005. (The product itself dates back to 1984.) At the time TM1 had an interesting sparse MOLAP (Multi-Dimensional OnLine Analytic Processing) story, the point being that the system worked hard to isolate what was actually non-zero. Loading of raw data seemed to be batch, but you could update models with derived data, and there was a transaction log for confident persistence.
Alternatively, you can use a caching layer, typically on a separate set of servers from your DBMS, which has no responsibility for managing data persistence. For example:
- TimesTen and solidDB are used, respectively, as relational caches for Oracle and DB2.
- Peter Zencke told me years ago that SAP had a purpose-built caching layer that kept over 99% of requests from touching disk.
- The key-value store memcached is central to many of the world’s largest web sites, typically backed by a MySQL cluster.
- ScaleArc has key-value cache that stores — rather than individual records — the entire TCP string sent by an RDBMS in response to a particular SQL query.
Some systems manage data in memory in one kind of structure, then ensure persistence via a very different structure on disk. Examples include:
- Workday’s architecture — object-oriented in RAM, MySQL (really key-value) on disk. Edit: Workday thinks “key-value” is a slightly misleading way to put it. Stay tuned for more.
- Oracle Coherence (formerly Tangosol) — object-oriented in RAM, Oracle on disk. Edit: Actually, Coherence isn’t really a write-through ORM (Object-Relational Mapper). It functions more like memcached, albeit with a very different data model.
- Couchbase — memcached (key-value) in-memory, evolving from SQLite to CouchDB on disk.
Similarly, business intelligence suites can manage data in-memory that comes from some other kind of data store (usually an RDBMS, sometimes Hadoop or whatever). I haven’t had a lot of luck in getting details, with one exception — QlikView, which uses a simple tabular data structure.
Stream processors — i.e. CEP engines — are a whole other sort of in-memory engine, doing something that’s a lot like data management.

And that, kiddies, is why I hesitate to generalize in too much detail about “in-memory database management.”

Despite its length, this is still a very partial list of memory-centric data management approaches. I encourage you to add other examples into the comments that I might have left out.

Related link

I did a simpler overview of memory-centric alternatives in 2005.

Soundbites: the Facebook/MySQL/NoSQL/VoltDB/Stonebraker flap, continued

Curt Monash — Fri, 15 Jul 2011 08:27:18 +0000

As a follow-up to the latest Stonebraker kerfuffle, Derrick Harris asked me a bunch of smart followup questions. My responses and afterthoughts include:

Facebook et al. are in effect Software as a Service (SaaS) vendors, not enterprise technology users. In particular:
- They have the technical chops to rewrite their code as needed.
- Unlike packaged software vendors, they’re not answerable to anybody for keeping legacy code alive after a rewrite. That makes migration a lot easier.
- If they want to write different parts of their system on different technical underpinnings, nobody can stop them. For example …
- … Facebook innovated Cassandra, and is now heavily committed to HBase.
It makes little sense to talk of Facebook’s use of “MySQL.” Better to talk of Facebook’s use of “MySQL + memcached + non-transparent sharding.” That said:
- It’s hard to see why somebody today would use MySQL + memcached + non-transparent sharding for a new project. At least one of Couchbase or transparently-sharded MySQL is very likely a superior alternative. Other alternatives might be better yet.
- As noted above in the example of Facebook, the many major web businesses that are using MySQL + memcached + non-transparent sharding for existing projects can be presumed able to migrate away from that stack as the need arises.

Continuing with that discussion of DBMS alternatives:

If you just want to write to the memcached API anyway, why not go with Couchbase?
If you want to go relational, why not go with MySQL? There are many alternatives for scaling or accelerating MySQL — dbShards, Schooner, Akiban, Tokutek, ScaleBase, ScaleDB, Clustrix, and Xeround come to mind quickly, so there’s a great chance that one or more will fit your use case. (And if you don’t get the choice of MySQL flavor right the first time, porting to another one shouldn’t be all THAT awful.)
If you really, really want to go in-memory, and don’t mind writing Java stored procedures, and don’t need to do the kinds of joins it isn’t good at, but do need to do the kinds of joins it is, VoltDB could indeed be a good alternative.

And while we’re at it — going schema-free often makes a whole lot of sense. I need to write much more about the point, but for now let’s just say that I look favorably on the Big Four schema-free/NoSQL options of MongoDB, Couchbase, HBase, and Cassandra.

An odd claim attributed to Mike Stonebraker

Curt Monash — Thu, 14 Jul 2011 11:10:34 +0000

This post has a sequel.

Last week, Mike Stonebraker insulted MySQL and Facebook’s use of it, by implication advocating VoltDB instead. Kerfuffle ensued. To the extent Mike was saying that non-transparently sharded MySQL isn’t an ideal way to do things, he’s surely right. That still leaves a lot of options for massive short-request databases, however, including transparently sharded RDBMS, scale-out in-memory DBMS (whether or not VoltDB*), and various NoSQL options. If nothing else, Couchbase would seem superior to memcached/non-transparent MySQL if you were starting a project today.

*The big problem with VoltDB, last I checked, was its reliance on Java stored procedures to get work done.

Pleasantries continued in The Register, which got an amazing-sounding quote from Mike. If The Reg is to be believed — something I wouldn’t necessarily take for granted — Mike claimed that he (i.e. VoltDB) knows how to solve the distributed join performance problem.

So, it’s Stonebraker against the web. And the difference of option is severe. In May, at a MongoDB developer conference in San Francisco, Mongo creator Dwight Merriman told his audience there was “no way” to do distributed joins in a way that really scales. “I’m not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational,” he said

“You can do distributed transactions, but if you do them with no loss of generality and you do them across a thousand machines, it’s not going to be that fast.”

Stonebraker says precisely the opposite, and in typical fashion, he goes right for the jugular. “I reject what Merriman says out of hand,” he tells The Register. Merriman and his company, 10gen, declined to comment for this story. But Stonebaker says words don’t matter. As much as he likes to wield his opinions, he insists the debate will be decided elsewhere. “Let the bake-off begin,” he crows.

But when last I checked, VoltDB made nowhere near that claim. And well it shouldn’t have. In the fully general case, there’s no way to ensure super distributed join performance other than by throwing lots and lots of gear at the problem. But if you do that, many alternatives are fast. More specialized cases may be a different matter — but there are many fast alternatives for those too.

I imagine there will be use cases for which VoltDB sustains a lead as the truly fastest alternative, similarly-architected competitors perhaps excepted.* But what Mike supposedly said seems quite forward-leaning when compared to technical reality.

*The canonical VoltDB use case is e-commerce in virtual goods, the point of “virtual” being that physical inventory might necessitate costlier kinds of joins.

Traditional databases will eventually wind up in RAM

Curt Monash — Mon, 23 May 2011 16:05:24 +0000

In January, 2010, I posited that it might be helpful to view data as being divided into three categories:

Human/Tabular data –i.e., human-generated data that fits well into relational tables or arrays.
Human/Nontabular data — i.e., all other data generated by humans.
Machine-Generated data.

I won’t now stand by every nuance in that post, which may differ slightly from those in my more recent posts about machine-generated data and poly-structured databases. But one general idea is hard to dispute:

Traditional database data — records of human transactional activity, referred to as “Human/Tabular data above” — will not grow as fast as Moore’s Law makes computer chips cheaper.

And that point has a straightforward corollary, namely:

It will become ever more affordable to put traditional database data entirely into RAM.

Actually, there are numerous ways for OLTP, other short-request, and some analytic databases to wind up in RAM.

SAP has some good ideas for how it could happen, banging transactions into what is essentially an in-memory analytic database. (I dispute SAP’s claims of transformational database technology leadership, but that doesn’t mean the underlying ideas aren’t good.)
For those who can afford the associated technology disruption, memory-centric object-oriented DBMS could be appealing.
Web scalability best practices commonly include keeping data in RAM (e.g., that’s pretty much the point of caching layer memcached).
SaaS (Software as a Service) companies — such as Workday — often bring a particular tenant’s database entirely into RAM.
QlikView highlights the benefits of doing business intelligence in RAM.
SAS HPA makes the argument that even “big data analytics” should sometimes be done in RAM.
I don’t have particularly favorable opinions at this time about marketing strategies or momentum at Oracle TimesTen, IBM solidDB, or VoltDB, but those examples at least serve to illustrate that memory-centric OLTP DBMS have existed for years.
Actually, SAP has at least two good ideas, if you count Sybase as part of SAP.

And here’s the kicker: Intel told me last year that CPUs are headed to 46-bit address spaces around mid-decade. Indeed, they hired me to help figure out if that was enough.* That multiplies out to 64 terabytes of RAM on a single server, chip costs permitting. So most of what we now think of as operational databases — and many of the analytic ones too — will fit in-memory, even if they run very large businesses.

*And did so without putting the discussion under any kind of NDA.

Likely consequences of all this include:

Legacy apps will (eventually) be consolidated and virtualized in-memory. Their underlying databases will grow so slowly that eventually the cost of putting them in RAM will be too low to worry about.
Expensive storage systems will (continue to) be irrelevant to database processing. Databases that don’t fit in RAM will typically be big enough to require the attention of a lot of CPUs — and in those cases the DBMS software itself will handle all the storage tasks.
Major OLTP DBMS vendors, such as Oracle, will need alternate in-memory code lines, because disk-centric architectures are sub-optimal in-memory. Well, that’s what they have those big R&D budgets for.
SaaS vendors and web businesses may not rely on today’s major OLTP DBMS vendors. (I was going to say “won’t” rather than “may not” until I recalled the likely M&A endgame.) Traditional enterprises may blanch at migrating away from their legacy DBMS environments, but the trade-offs are different for technology companies using DBMS as subsystems.

Of course, the same trends that make data-storing chips cheaper will make data-generating chips cheaper too. So, just as there are huge amounts of machine-generated data that you’d never pay to store in RAM, the same will still be true 10 years from now; the data volumes involved will just be a lot bigger. And thus there will still be plenty of very large analytic databases using relatively cheap forms of storage, perhaps even disk.

But OLTP and other short-request processing are likely to wind up in-memory. And the same may be true for a considerable amount of analytics, especially but not only if the analytics have a low-latency requirement.