After I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still replicated via two-phase commit, and everybody knows two-phase commit is fundamentally slow.” I replied that no, it wasn’t exactly two-phase commit, but fumbled the explanation of why — so I decided to escalate straight to dbShards honcho Cory Isaacson. Cory’s clarification, lightly edited as per his permission, was:
We are far faster than a 2PC, because we found that if we make the transaction log reliable (our own), then we can go much faster than the actual database engine. We call this approach “out of band” replication, because we replicate the transaction log, and then asynch writes to the secondary database (usually only milliseconds behind). This way a single shard with two servers can handle 1000s of writes/second; again, we can go much faster than the typical database. This link shows a diagram of how it works.
Typically our overhead is around 10%, meaning that we perform with full reliable replication at 90% the speed of an unprotected database. In some cases we are actually faster than a standalone database, because we are very efficient at autoincrement and other internals.
The key paragraph in Cory’s link is probably:
As each SQL statement is executed in the application, the dbShards driver asynchronously sends the SQL to the primary replication agent in parallel to executing the SQL against the primary database. The primary replication agent asynchronously sends the SQL to the secondary replication agent. When the application issues a commit statement, the driver performs a synchronous communication with the primary replication agent to ensure that the transaction details have been received by both the primary and secondary agent. Once the driver receives an ACK then the transaction is committed to the database.
As I read that, it is indeed a kind of two-phase commit protocol. However, the remote phase isn’t being handled by the remote DBMS, but rather by dbShards, and it doesn’t involve checking whether the second copy of the transaction will succeed. So Cory’s claims of performance very different from those of regular 2PC are not ridiculous.