March 2, 2010

Cassandra and the NoSQL scalable OLTP argument

Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:

Facebook invented and is adopting Cassandra.
Twitter is adopting Cassandra.
Digg is adopting Cassandra.
LinkedIn invented and is adopting Voldemort.
Gee, it seems as if the super-scalable website biz has moved beyond MySQL/Memcached.

But in addition, he provides a lot of useful links, which DBMS-oriented folks such as myself might have previously overlooked. Following those trails gets one to, among other things:

A September, 2009 post outlining Digg’s reasons for moving to Cassandra. The core idea is that joining two tables is expensive; it’s cheaper to store the results prejoined on disk. Details are provided.
A February, 2010 post outlining Twitter’s reasons for moving to Cassandra. They boil down to “sufficiently scalable, sufficiently simple, sufficiently robust, robustly open source.”
A Flickr slide presentation saying “normalization is for wimps”. They seemed to be staying with MySQL, but lusting after XPath.
A nice Cassandra technical overview by Evan Weaver of Twitter.

I also recall seeing something that said “We have 13X as many queries as updates, so of course we should optimize for reads,” but I can’t find that now. The classical OLTP answer to that would probably be “Yeah, but by the time you’re two-phase-committing and integrity-checking all the part of that update, it turns out updates are still what you should optimize for.” Well, what if the update is so simple that that’s no longer a valid argument?

There certainly seem to be some non-obvious technical choices being made here, with options being conflated that perhaps shouldn’t be. In particular, I wonder whether things are being written to cheap disk in a really fast way when it might be better to keep them in more expensive RAM or, perhaps better yet, solid-state memory. Perhaps then the functionality/performance tradeoff wouldn’t be so painful.

On the other hand, the designers of the world’s most scalable websites — e-commerce sites perhaps excepted — seem pretty unanimous in thinking it’s best to bake some database/integrity management into the applications, rather than offload it all to an RDBMS. Why? Because the transactions are so simple that hand-coding all that isn’t prohibitive. And of course because of their extreme performance and scalability needs.

I’m not sure on what basis one could argue that they’re wrong.

Categories: Cassandra, Data models and architecture, NoSQL, OLTP, Open source, Parallelization, Specific users, Theory and architecture

Subscribe to our complete feed!

Comments

16 Responses to “Cassandra and the NoSQL scalable OLTP argument”

Jonathan Ellis on March 2nd, 2010 6:33 pm

Wow, I think yours is the first post I’ve seen from the “relational camp,” if there is such a thing, that doesn’t conclude that “if only those idiots had known about $favorite_vendor, they wouldn’t have bothered with this new-fangled nosql business.”

So, thanks for that. 🙂
Curt Monash on March 3rd, 2010 4:07 am

Jonathan,

Take a look in http://www.dbms2.com/category/database-theory-practice/database-diversity/ and you’ll see I’m no relational purist. 😉

Everybody else,

Jonathan’s blog looks like it has a lot of good crunch on Cassandra. I was going to edit in a link to it, only to show up here and see that he already beat me to it. 🙂
Colin on March 4th, 2010 2:44 pm

I don’t see MongoDB on your list of Products/Companies. I would be interested in hearing about MongoDB in relation to other offerings, in addition to Cassandra.
Heinz Roggenkemper on March 4th, 2010 4:59 pm

The ’13 SELECT’s per I/U/D’ is on slide 22 in the Flickr presentation that you mention.
Matt on March 5th, 2010 8:45 am

The Flickr presentation is 5.5 years old.
ac on March 5th, 2010 11:52 am

> Because the transactions are so simple that hand-coding all that isn’t prohibitive. And of course because of their extreme performance and scalability needs.

Wrong. The true reason is because there is no money involve in those “transactions”. Nobody cares unless you really screw up. You don’t need the database to save your ass.
Curt Monash on March 5th, 2010 2:36 pm

@ac,

It’s both. Data integrity in these big web apps is commonly less than the minimum acceptable for financial transactions. But losing data is still frowned upon.
NoSQL Is Not SQL And That’s A Problem | CloudAve on March 5th, 2010 11:00 pm

[…] the NoSQL movement. While some are announcing an end of era for MySQL and memcached others are questioning the arguments behind Cassandra’s OLTP claims and scalability and universal applicability of NoSQL. It is great to see innovative data […]
Richard Gowan on March 11th, 2010 12:56 am

With the proviso that I’ve not read all the arguments yet… It looks to me (so far) that the transaction volumes are not as difficult as are made out.

Is it perhaps more true that particular companies and businesses do not have big up-front analysis. And possibly for good reason.

Their business model seems to be more around attracting users, continually modifying the system to keep them interested, then finding revenue streams later.

Then once in production… they are unlikely to radically change the model/schema – or system architecture.
Some NoSQL links | DBMS2 -- DataBase Management System Services on March 12th, 2010 7:51 pm

[…] Callaghan hit back against the NoSQL movement, and in particular against the MySQL/memcached is passe‘ meme. On the other hand, he also bemoaned many failings of MySQL. On the third hand, he […]
Memcached-based company NorthScale launches | DBMS2 -- DataBase Management System Services on March 16th, 2010 1:53 pm

[…] based around memcached, has just launched, two weeks after the Todd Hoff’s post arguing the MySQL/memcached combo is passe’. NorthScale wouldn’t necessarily argue with Todd, arguing that what you really should use […]
NoSQL Is Not SQL And That’s A Problem on September 27th, 2010 10:33 pm

[…] the NoSQL movement. While some are announcing an end of era for MySQL and memcached others are questioning the arguments behind Cassandra’s OLTP claims and scalability and universal applicability of NoSQL. It is great to see innovative data […]
ehcache.net on January 8th, 2011 11:05 am

Cassandra and the NoSQL scalable OLTP argument…

Todd Hoff put up a provocative post on High Scalability called MySQL and Memcached: End of an Era? The post itself focuses on observations like:

Facebook invented and is adopting Cassandra.
Twitter is adopting Cassandra.
Digg is adopting Cassandra. …
Confluence: Andromeda on July 15th, 2011 4:39 pm

The Brisk Hadoop Ring…

The Brisk Hadoop Ring Jake Luciani, DataStax Brisk…
What Cloud Computing Can Learn from NoSQL — GigaOM Research on October 18th, 2013 5:54 pm

[…] it might be upon us), but others note that advances in flash memory and solid-state storage could mitigate perceived performance hits with relational databases. Even the most ardent MySQL supporters acknowledge the product has shortcomings, some of which have […]
Today in Cloud — Gigaom Research on October 23rd, 2013 6:37 am

[…] world. Today alone, I’ve read posts bemoaning the analysis of log files at web scale, comparing NoSQL against OLTP systems, and questioning Dell’s stance of mandating expensive hard drives in new servers. The […]

Leave a Reply

Search our blogs and white papers

Monash Research blogs

DBMS 2 covers database management, analytics, and related technologies.
Text Technologies covers text mining, search, and social software.
Strategic Messaging analyzes marketing and messaging strategy.
The Monash Report examines technology and public policy issues.
Software Memories recounts the history of the software industry.

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.

Links
- Monash Research
- White Papers
Admin
- Log in

Cassandra and the NoSQL scalable OLTP argument

Comments

Search our blogs and white papers

Monash Research blogs

User consulting

Vendor advisory

Monash Research highlights

Recent posts

Categories

Date archives

Admin