I’m hearing more and more stories about uninterrupted DBMS operation. There are no iron-clad assurances of zero downtime; if nothing else, you could crash your whole system yourself via some kind of application bug. Even so, it’s a worthy ideal, and near-zero downtime is a practical goal.
Uninterrupted database operations can have a lot of different aspects. The two most basic are probably:
- High availability/fail-over. If a system goes down, another one in the same data center is operational almost immediately.
- Disaster recovery. Same story, but not in the same data center, and hence not quite as immediate.
These work with single-server or scale-out systems alike. However, scale-out and the replication commonly associated with it raise additional issues in continuous database operation:
- Eventual consistency. Scale-out and replication create multiple potential new points of failure, server and network alike. Eventual consistency ensures that a single such failure doesn’t take any part of the database down.
- The use of replicas to avoid planned downtime. If you do rolling maintenance, then you can keep a set of servers with the full database up at all times.
Finally, if you really care about uninterrupted operation, you might also want to examine:
- Administrative tools and utilities. The better your tools, the better your chances of keeping your system up. That applies to anything from administrative dashboards to parallel backup functionality.
- Fencing of in-database analytic processes. If you’re going to do in-database analytics, fenced/out-of-process ones are a lot safer than the alternative.
- Online schema changes. If you change a schema in a relational DBMS, that doesn’t necessarily entail taking the database offline.
Let’s discuss some of those points below.
The core idea of database high availability is straightforward — data gets sent to two or more servers around the same time, which are meant to have identical copies of the database. If one server goes down, the other one takes over. This has been available in productized form for over three decades — think of Tandem Computers. These days, the common form is master-slave — an update goes to one server, which applies it, and also communicates it to slaves. The main distinctions are:
- How is the replication done?
- Are the slaves available to be read, for maximum performance?
Exactly the same is true of database disaster recovery. The big differences are:
- The slave databases need to be geographically remote from the master, so that they aren’t taken down by the same disaster.
- Consequently, the replication technique must be sensitive to the latency and unreliability of a wide-area network.
In these scenarios, three main kinds of replication technique come into play:
- Trigger-based/logical. An update event on the master triggers a message to the slave to apply the same update. This is commonly slow.
- Synchronous log-based. The update log is streamed from the master to the slave, which applies the updates accordingly. This is the state of the art for local high-availability. Those who claim that log-based replication can’t be synchronous haven’t paid attention to product lines such as Schooner or dbShards,
- Asynchronous log-based. Same thing, but with a different tradeoff in reliability vs. latency; the master doesn’t wait for an acknowledgement from the slave. This is the state of the art for remote disaster recovery.
An alternative to master-slave replication is active-active, aka multi-master. Active-active replication is much like master/slave, except that two (or more) databases can each accept updates and try to then update each other. Ultimately, in such a scheme, there needs to be programmer and/or administrator intervention to resolve conflicts that may arise between dueling updates. Active-active is good for uptime insofar as there’s always a copy of the system in the process of taking updates. But it’s bad for uptime insofar as it adds complexity and consequent risk of failure.
When it is impossible or overly expensive to manage data on a single server, data stores can take a scale-out (more or less aka shared-nothing) approach. Examples may be found at least in the worlds of OLTP SQL, analytic SQL, NoSQL, and Hadoop. In most cases, the data in scale-out stores has a replication factor; that is, data written to one server is immediately copied to one or more other servers as well. On a primary data management cluster, the replication factor is usually equal to 2 or 3. But when replication for scale-out replication is integrated with that used for wide-area networking, disaster recovery, and/or extreme high availability, the total replication factor can be 4-6, or even more.
Replication factors — or more precisely the replicas themselves — can both hurt and help database reliability. As noted in my recent post on immediate consistency, the classic two-phase commit protocol turns what would be one write on a single-server into about 3 network hops and 2 writes, creating all sorts of possibilities for blocking and outages. Eventual consistency — for example of the RYW type discussed in the same post and its comment thread — averts most of that risk, but then you have to deal with an increased chance of data inaccuracy.
In other ways, however, replicas are pure uptime goodness. Coolest, in my opinion, is the potential for rolling maintenance and upgrades, in which you always keep up a subset of servers that hold the complete database. In the case of Metamarkets, for example:
One thing Metamarkets does that’s pretty much a best practice these days is roll out new code, mid-day if they like, without ever taking their system down. Why is this possible? Because the data is replicated across nodes, so you can do a rolling deployment of a node at a time without making any data unavailable. Notes on that include:
- Performance could be affected, as the read load is generally balanced across all the data replicas.
- Data locking is not an issue — Metamarkets doesn’t have any read locks, as Druid is an MVCC (Multi-Version Concurrency Control) system.
At least one dbShards customer has a similar story, and I’m sure I’ve heard the same idea from other folks as well.
And finally, I’ll quote myself on another subject as well, namely online schema evolution:
Ever more RDBMS vendors, for example Clustrix, allow schema changes without taking the database down. So does Vertica. Notes on that include:
- It relies on the fact that Vertica uses MVCC (Multi-Version Concurrency Control) rather than doing updates in place. MVCC is pretty much a best practice now, at least for disk storage.
- In particular, a Vertica query locates all the data it needs when it starts running. Database changes made while the query is running don’t affect it. That’s true of ordinary updates and of schema changes alike.
- Schema changes really boil down to dropping and adding columns. Or you can make very simple column changes, such as widening one.
- If you widen a column, that has no effect on the data already in place. (Columnar storage, variable length records — what would there be to change?)
Bottom line: Uninterrupted database operation is a nearly attainable goal.
That’s a lot, but it’s hardly complete. What other points should I have included, had space permitted?