January 25, 2011

dbShards update

I talked yesterday with Cory Isaacson of CodeFutures, and hence can follow up on my previous post about dbShards. dbShards basics include:

One dbShards customer writes 1/2 billion rows on a busy day, and serves 3-4,000 pages per second, naturally with multiple queries per page. This is on a 32-node cluster, with uninspiring hardware, in the cloud. The database has 16 shards, aggregating 128 virtual shards. I forgot to ask how big the database actually is. Overall, dbShards is up to a dozen or so signed customers, half of whom are in production or soon will be.

dbShards’ replication scheme works like this: 

In essence, two-phase database commit is replaced by two-phase log synchronization. Once it’s known that the transaction logs on two machines agree, the underlying DBMS is responsible for getting each of the databases updated.

Edit: A subsequent post clarifies dbShards’ shard replication.

Cory added that dbShards now uses the same mechanism to do distributed transactions. As an example of why somebody in the dbShards target market would care, he cited a game company that shards according to player_ID. If two players interact with each other, a distributed transaction could ensue.

Cory stresses that dbShards’ sharding is configurable, as opposed to being an all-or-nothing black box. That is, you can configure which tables are sharded; the rest are replicated across all the servers (which is of course highly beneficial to JOIN performance); and the whole thing works automagically. There also are true parallel queries when needed, along with the distributed transactions.

As for how dbShards shards, there are three methods.

Comments

9 Responses to “dbShards update”

  1. ScaleBase, another MPP OLTP quasi-DBMS | DBMS 2 : DataBase Management System Services on January 25th, 2011 5:35 pm

    [...] ScaleBase raised his hand on Twitter. It turns out ScaleBase has a story rather similar to that of CodeFutures/dbShards. That [...]

  2. Igor on January 25th, 2011 9:17 pm

    1/2 billion rows/day written on 32 server/16 shard cluster comes to about (avg) 362 rows/sec/shard or 181 rows/sec/server – not very impressive. Am I missing something?

  3. Curt Monash on January 25th, 2011 9:21 pm

    Well, I’d imagine peak loads could easily be 5-10X higher, which would get us to the performance range previously claimed for dbShards.

  4. Cory Isaacson on January 26th, 2011 12:38 am

    Thanks for the post.

    One clarification. dbShards does not use a proxy, there is no middle tier. Our intelligent driver is plug-compatible with vendor DBMS drivers, and once a sharding decision is made the connection is direct to the database — nothing is between the application and the database for typical transactions. The driver also participates in failover control, manageability and stats.

  5. Cory Isaacson on January 26th, 2011 12:40 am

    One other quick comment re: dbShards write performance. At peak load the performance is about the same as the native DBMS can handle, generally 2000 writes/second with MySQL as an example.

  6. The Continuent Tungsten MySQL replication story | DBMS 2 : DataBase Management System Services on February 6th, 2011 12:37 am

    [...] to other MySQL scale-out vendors too. I’ve already posted accordingly about CodeFutures (the dbShards guys) and ScaleBase. Now it’s late-responding Continuent’s [...]

  7. Clarification on dbShards’ shard replication | DBMS 2 : DataBase Management System Services on February 9th, 2011 4:36 am

    [...] I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still [...]

  8. Transparent sharding | DBMS 2 : DataBase Management System Services on March 1st, 2011 3:20 am

    [...] Sharding, commonly used to describe scaled-out MySQL in Internet Request Processing use cases. This one has the advantage of being concise, but is beginning to mean two different things, in that it is used both when the data is REALLY in separate databases on different machines (i.e., the application has to explicitly reference the shard it wants to talk to) and also when the database is transparently distributed (e.g. via dbShards). [...]

  9. Sharding 101 « IT Primer on August 13th, 2011 3:19 am

    [...] dbShards [...]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.