June 8, 2014

Optimism, pessimism and fatalism — fault-tolerance, Part 1

Writing data management or analysis software is hard. This post and its sequel are about some of the reasons why.

When systems work as intended, writing and reading data is easy. Much of what’s hard about data management is dealing with the possibility — really the inevitability — of failure. So it might be interesting to survey some of the many ways that considerations of failure come into play. Some have been major parts of IT for decades; others, if not new, are at least newly popular in this cluster-oriented, RAM-crazy era. In this post I’ll focus on topics that apply to single-node systems; in the sequel I’ll emphasize topics that are clustering-specific.

Major areas of failure-aware design — and these overlap greatly — include:

Long-standing basics

In a single-server, disk-based configuration, techniques for database fault-tolerance start:

Valuable though they are, none of these techniques protects against data corruption that occurs via software errors or security breaches, because in those cases the system will likely do a great job of making incorrect changes consistently across all copies of the data. Hence there also needs to be a backup/restore capability, key aspects of which start:

For many users, it is essential that backups be online/continuous, rather than requiring the database to periodically be taken down.

None of this is restricted to what in a relational database would be single-row or at least single-table changes. Transaction semantics cover the case that several changes must either all change or all fail together. Typically all the changes are made; the system observes that they’ve all been made; only then is a commit issued which makes the changes stick. One complexity of this approach is that you need a way to quickly undo changes that don’t get committed.

Locks, latches and timestamps

If a database has more than one user, two worrisome possibilities arise:

So it is common to lock the portion of the database that is being changed. A major area of competition in the early 1990s — and a big contributor to Sybase’s decline — was the granularity of locks (finer-grained is better, with row-level locking being the relational DBMS gold standard).

In-memory locks are generally called latches; I don’t know what the other differences between locks and latches are.

Increasingly many DBMS are designed with an alternative approach, MVCC (Multi-Version Concurrency Control); indeed, I’m now startled when I encounter one that makes a different choice. The essence of MVCC is that, to each portion (e.g. row) of data, the system appends very granular metadata — a flag for validation/invalidation and/or a timestamp. Except for the flags, the data is never changed until a cleanup operation; rather, new data is written, with later timestamps and validating flags. The payoff is that, while it may still be necessary to lock data against the possibility of two essentially simultaneous writes, reads can proceed without locks, at the cost of a minuscule degree of freshness. For if an update is in progress that affects the data needed for a read, the read just targets versions of the data slightly earlier in time.

MVCC has various performance implications — writes can be faster because they are append-mainly, reads may be slower, and of course the cleanup is needed. These tradeoffs seem to work well even for the query-intensive workloads of analytic RDBMS, perhaps because MVCC fits well with their highly sequential approach to I/O.

“In-memory” DBMS

A DBMS that truly ran only in RAM would lose data each time the power turned off. Hence a memory-centric DBMS will usually also have some strategy for persisting data.

Related links


5 Responses to “Optimism, pessimism and fatalism — fault-tolerance, Part 1”

  1. Optimism, pessimism, and fatalism — fault-tolerance, Part 2 | DBMS 2 : DataBase Management System Services on June 8th, 2014 12:58 pm

    […] of what I wrote in Part 1 of this post was already true 15 years ago. But much gets added in the modern era, considering […]

  2. Bryan Pendleton on June 8th, 2014 3:27 pm

    > In-memory locks are generally called latches; I don’t know what the other differences between locks and latches are.

    When I worked on this (long ago), one difference was that locks (especially write locks) were held until transaction commit, while latches were held only for brief periods (microseconds).

    Another difference was that the lock manager performed deadlock detection; the latch manager did not (so you had to ensure that your latching algorithms were deadlock free).

  3. Ofir Manor on June 8th, 2014 10:09 pm

    I fully agree with Brian on latches (brief, serialized access to a shared in-memory resource) vs. lock (transactional).

    Also, regarding MVCC – what it adds is that readers never block writers and writers never block readers; big win for concurrency.

  4. clive boulton on June 11th, 2014 6:43 pm

    SWAG: “In-memory” DBMS from HANA and Oracle are Memcached based and persisted to both row based tables and column based tables for backup, batch and restart?

    BTW. Brad Fitz was honored by UW CSE for Memchached amongst other early career accomplishments. https://news.cs.washington.edu/2014/03/06/uw-cses-brad-fitzpatrick-wins-diamond-award-for-early-career-achievement/

  5. http://intensitytalks.com/groups/electricals-things-that-can-be-automated-in-and-about-the-property/ on February 11th, 2016 4:17 pm


    Optimism, pessimism and fatalism – fault-tolerance, Part 1 | DBMS 2 : DataBase Management System Services

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.