January 25, 2011

dbShards update

I talked yesterday with Cory Isaacson of CodeFutures, and hence can follow up on my previous post about dbShards. dbShards basics include:

dbShards gives you, in effect, an MPP DBMS based on MySQL or PostgreSQL, meant for OLTP (OnLine Transaction Processing). dbShards always did distributed queries, and now does distributed transactions as well.
dbShards works by sharding the database and automagically sending work to the correct shard.
For safety, dbShards of course replicates each shard. Contrary to what I said in the previous post, the replication method is not log-shipping.
At this time, dbShards only works in a single data center.
dbShards can handle any SQL that would work through, say, a JDBC driver, and is not particularly sensitive to data type. However, dbShards’ stored procedure support is iffy — if a procedure touches data in more than one shard, it simply fails.

One dbShards customer writes 1/2 billion rows on a busy day, and serves 3-4,000 pages per second, naturally with multiple queries per page. This is on a 32-node cluster, with uninspiring hardware, in the cloud. The database has 16 shards, aggregating 128 virtual shards. I forgot to ask how big the database actually is. Overall, dbShards is up to a dozen or so signed customers, half of whom are in production or soon will be.

dbShards’ replication scheme works like this:

A write initially goes to two places at once — to the DBMS and a dbShards agent, both running on the same server.
The dbShards agent streams to the dbShards agent on the replica server, and receipt of the streamed write is acknowledged.
At that point the commits start. (Cory seemed to say that the commit on the primary server happens first, but I’m not sure why.)

In essence, two-phase database commit is replaced by two-phase log synchronization. Once it’s known that the transaction logs on two machines agree, the underlying DBMS is responsible for getting each of the databases updated.

Edit: A subsequent post clarifies dbShards’ shard replication.

Cory added that dbShards now uses the same mechanism to do distributed transactions. As an example of why somebody in the dbShards target market would care, he cited a game company that shards according to player_ID. If two players interact with each other, a distributed transaction could ensue.

Cory stresses that dbShards’ sharding is configurable, as opposed to being an all-or-nothing black box. That is, you can configure which tables are sharded; the rest are replicated across all the servers (which is of course highly beneficial to JOIN performance); and the whole thing works automagically. There also are true parallel queries when needed, along with the distributed transactions.

As for how dbShards shards, there are three methods.

Most common is a hash table on the shard key.
The other somewhat common one is what Cory calls “session-based” sharding, but which I might prefer to call “value-list sharding.” If you deliberately put users from the same geography into the same shard, that would be list-based. Ditto if you shard by tenant in a multi-tenancy use case.
dbShards also can be used for time-based sharding. While Cory didn’t say so, I imagine that could be extended to other value-range scenarios as well.

Categories: Clustering, dbShards and CodeFutures, MySQL, OLTP, Parallelization, Transparent sharding

Subscribe to our complete feed!

Comments

9 Responses to “dbShards update”

ScaleBase, another MPP OLTP quasi-DBMS | DBMS 2 : DataBase Management System Services on January 25th, 2011 5:35 pm

[…] ScaleBase raised his hand on Twitter. It turns out ScaleBase has a story rather similar to that of CodeFutures/dbShards. That […]
Igor on January 25th, 2011 9:17 pm

1/2 billion rows/day written on 32 server/16 shard cluster comes to about (avg) 362 rows/sec/shard or 181 rows/sec/server – not very impressive. Am I missing something?
Curt Monash on January 25th, 2011 9:21 pm

Well, I’d imagine peak loads could easily be 5-10X higher, which would get us to the performance range previously claimed for dbShards.
Cory Isaacson on January 26th, 2011 12:38 am

Thanks for the post.

One clarification. dbShards does not use a proxy, there is no middle tier. Our intelligent driver is plug-compatible with vendor DBMS drivers, and once a sharding decision is made the connection is direct to the database — nothing is between the application and the database for typical transactions. The driver also participates in failover control, manageability and stats.
Cory Isaacson on January 26th, 2011 12:40 am

One other quick comment re: dbShards write performance. At peak load the performance is about the same as the native DBMS can handle, generally 2000 writes/second with MySQL as an example.
The Continuent Tungsten MySQL replication story | DBMS 2 : DataBase Management System Services on February 6th, 2011 12:37 am

[…] to other MySQL scale-out vendors too. I’ve already posted accordingly about CodeFutures (the dbShards guys) and ScaleBase. Now it’s late-responding Continuent’s […]
Clarification on dbShards’ shard replication | DBMS 2 : DataBase Management System Services on February 9th, 2011 4:36 am

[…] I posted recently about dbShards, a Very Smart Commenter emailed me with the challenge “but each individual shard is still […]
Transparent sharding | DBMS 2 : DataBase Management System Services on March 1st, 2011 3:20 am

[…] Sharding, commonly used to describe scaled-out MySQL in Internet Request Processing use cases. This one has the advantage of being concise, but is beginning to mean two different things, in that it is used both when the data is REALLY in separate databases on different machines (i.e., the application has to explicitly reference the shard it wants to talk to) and also when the database is transparently distributed (e.g. via dbShards). […]
Sharding 101 « IT Primer on August 13th, 2011 3:19 am

[…] dbShards […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

dbShards update

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin