Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:
- Facebook invented and is adopting Cassandra.
- Twitter is adopting Cassandra.
- Digg is adopting Cassandra.
- LinkedIn invented and is adopting Voldemort.
- Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.
But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. Following those trails gets one to, among other things:
- A September, 2009 post outlining Digg’s reasons for moving to Cassandra. The core idea is that joining two tables is expensive; it’s cheaper to store the results prejoined on disk. Details are provided.
- A February, 2010 post outlining Twitter’s reasons for moving to Cassandra. They boil down to “sufficiently scalable, sufficiently simple, sufficiently robust, robustly open source.”
- A Flickr slide presentation saying “normalization is for wimps”. They seemed to be staying with MySQL, but lusting after XPath.
- A nice Cassandra technical overview by Evan Weaver of Twitter.
I also recall seeing something that said “We have 13X as many queries as updates, so of course we should optimize for reads,” but I can’t find that now. The classical OLTP answer to that would probably be “Yeah, but by the time you’re two-phase-committing and integrity-checking all the part of that update, it turns out updates are still what you should optimize for.” Well, what if the update is so simple that that’s no longer a valid argument?
There certainly seem to be some non-obvious technical choices being made here, with options being conflated that perhaps shouldn’t be. In particular, I wonder whether things are being written to cheap disk in a really fast way when it might be better to keep them in more expensive RAM or, perhaps better yet, solid-state memory. Perhaps then the functionality/performance tradeoff wouldn’t be so painful.
On the other hand, the designers of the world’s most scalable websites — e-commerce sites perhaps excepted — seem pretty unanimous in thinking it’s best to bake some database/integrity management into the applications, rather than offload it all to an RDBMS. Why? Because the transactions are so simple that hand-coding all that isn’t prohibitive. And of course because of their extreme performance and scalability needs.
I’m not sure on what basis one could argue that they’re wrong.