July 21, 2008

Project Cassandra — Facebook’s open sourced quasi-DBMS

Edit: I posted much fresher information about Cassandra in July, 2010.

Facebook has open-sourced Project Cassandra, an imitation of Google’s BigTable.  Actual public information about Facebook’s Cassandra seems to reside in a few links that may be found on the Cassandra Project’s Google code page.  All the discussion I’ve seen seems to be based solely on some slides from a SIGMOD presentation. In particular, Dare Obasanjo offers an excellent overview of Cassandra.  To wit:

The entire system is a giant table with lots of rows. Each row is identified by a unique key. Each row has a column family, which can be thought of as the schema for the row. A column family can contain thousands of columns which are a tuple of {name, value, timestamp} and/or super columns which are a tuple of {name, column+} where column+ means one or more columns. …

… Cassandra has several optimizations to make writes cheaper. When a write operation occurs, it doesn’t immediately cause a write to the disk. Instead the record is updated in memory and the write operation is added to the commit log. … Additionally, the writes are sorted so that the disk is written to sequentially thus significantly improving seek time on the hard drive and reducing the impact of random writes to the system.

Cassandra is described as “always writable” which means that a write operation always returns success even if it fails internally to the system. This is similar to the model exposed by Amazon’s Dynamo which has an eventual consistency model.  From what I’ve read, it isn’t clear how writes operations that occur during an internal failure are reconciled and exposed to users of the system.

Ocstatic has a shorter post about Cassandra, the meat of which is:

Relational database purists may feel queasy at some of the tradeoffs that this design involves – such as the loss of atomicity and the fact that consistency between cluster members is statistical rather than deterministic. But it’s hard to argue with success: Facebook has used Cassandra to scale out a tremendous amount of data without apparent major issues.

To a first approximation, it’s pretty obvious what’s going on — the usual tradeoff of achieving web megascalability at the expense of traditional RDBMS’ flexibility and guaranteed data integrity.  But beyond that, I’m confused.  For example, Slide 17 offers some performance benchmarks — and the queries are text search.  Huh?  Unless I’m missing something, that doesn’t seem like a natural fit for this data model.  And Slide 14 looks to me as of the “for any” in the fourth bullet point is a typo for “there exists a.”

Maybe things would be clearer if one read either the Google Groups linked on the project page, or the actual code.  But I’ve done neither …

Comments

9 Responses to “Project Cassandra — Facebook’s open sourced quasi-DBMS”

  1. Log Buffer #107: A Carnival of the Vanities for DBAs on July 25th, 2008 12:05 pm

    […] in 1987 by Jim Grey. And finally, before we dive into the specific server news, here is a post on Facebook’s project to build a distributed database similar to Google’s […]

  2. Bookmarks about Google on October 23rd, 2008 12:15 pm

    […] http://www.bos89.nl/1324 – bookmarked by 1 members originally found by mmrvka on 2008-10-11 Project Cassandra — Facebook’s open sourced quasi-DBMS http://www.dbms2.com/2008/07/21/project-cassandra-facebook-open-sourced-quasi-dbms/ – bookmarked […]

  3. NoSQL: Die schlanke Zukunft dicker Datenbanken | silicon.de on February 15th, 2010 1:06 pm

    […] Open-Source-Projekte haben sich bereits um dieses Thema herum gebildet: Cassandra von Facebook, Apache HBase, CouchDB, Hadoop, Memcached, Tokyo Cabinet, MongoDB und LinkedIn hostet […]

  4. More on NoSQL and HVSP (or OLRP) | DBMS 2 : DataBase Management System Services on August 26th, 2010 5:14 am

    […] however, it’s true that Cassandra inventor Facebook has stopped working on Cassandra, and Facebook’s core Cassandra developers have shifted over […]

  5. Kommen NoSQL-Datenbanken 2011 im Mittelstand an? - Seite 1 von 1 - Technologien | IT-Business | ZDNet.de on March 14th, 2011 9:06 am

    […] […]

  6. DataStax pivots back to its original strategy | DBMS 2 : DataBase Management System Services on September 22nd, 2011 6:23 pm

    […] Cassandra was originally developed and revealed at Facebook, to much early NoSQL fanfare. Facebook later backed away from Cassandra use. […]

  7. DataStax/Cassandra update | DBMS 2 : DataBase Management System Services on December 8th, 2013 1:06 pm

    […] revolve around geo-distribution. Netflix, probably the flagship Cassandra user — since Cassandra inventor Facebook adopted HBase instead — actually hasn’t been using the geo-distribution feature. […]

  8. how can i get my ex boyfriend back if he has a girlfriend on December 9th, 2013 11:18 am

    Terrific article! That is the kind of information that should be shared
    around the internet. Shame on the search engines for now not positioning this submit higher!
    Come on over and discuss with my site . Thanks =)

  9. Sbobet on May 17th, 2014 4:21 am

    Learn how to explode your players’ skills and make training more fun in record time.
    They develop a sense of controlling and maneuvering the ball amidst an opposition attack.

    This is because they lacked the mental edge or ‘killer instinct’
    to produce when it mattered. The tournament is held every four years, with the Olympics in between.

    Here is my blog; Sbobet

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.