March 21, 2012

DataStax Enterprise 2.0

Edit: Multiple errors in the post below have been corrected in a follow-on post about DataStax Enterprise and Cassandra.

My client DataStax is announcing DataStax Enterprise 2.0. The big point of the release is that there’s a bunch of stuff integrated together, including at least:

DataStax stresses that all this runs on the same cluster, with the same administrative tools and so on. For example, on a single cluster:

No matter what is going on at a node, I gather that data is stored in the same Cassandra file format, which DataStax calls CFS (Cassandra File System). Edit: Not true. See the follow-on post. DataStax stresses that a node can have a choice of at least two “personalities”, namely:

New in DataStax 2.0, there’s elasticity between these “personalities”; you can fire up a different kind of processing on a node, while leaving the data untouched. DataStax wasn’t able to say what typical replication factors are for the data — e.g., is it 3 on Cassandra nodes plus 3 more on Hadoop nodes, or might the total be less than 6? I’m guessing it’s really 3 on Cassandra nodes, so as to get failure-tolerant RYW consistency, but Hadoop nodes might not necessarily bring the total up to 6.

Other NoSQL vendors portray Cassandra as likely to win when a cluster needs to be spread around multiple data centers, but not a major contender otherwise. DataStax disputes this, but does cite a need for “continuous availability” as a key driver of adoption.

As you’ve probably gathered by now, I like the core DataStax story — and indeed had some influence on it — but roll my eyes somewhat at the work-in-progress as to how it is phrased and told. The other regrettable fuzziness in DataStax messaging is around customer count. DataStax cites >140 “customers”, but that includes every last outfit that bought a single day of training. On the plus side, DataStax cites a firm figure of 45 employees, and has lots of production use cases it can talk about and extrapolate from.

In particular, DataStax cites customers in areas that include:

Indeed, Netflix should probably be regarded as the single flagship Cassandra user, even ahead of Twitter (not a DataStax customer). Netflix recently wrote:

We now have over 55 Cassandra clusters in the cloud and are moving our source of truth from our Datacenter to these Cassandra clusters.

which compares pretty favorably to an earlier estimate of

7 clusters in production by end of 2011

Comments

5 Responses to “DataStax Enterprise 2.0”

  1. Jeremy on March 21st, 2012 4:35 am

    Thanks for the information!

  2. Stu Hood on March 21st, 2012 5:22 am

    > DataStax wasn’t able to say what typical replication factors are for the data — e.g., is it 3 on Cassandra nodes plus 3 more on Hadoop nodes, or might the total be less than 6?
    CFS is an HDFS implementation, so usually HDFS would not be in use at all. Technically you could have exactly 3 replicas and gain all the benefits of both systems, but you might want to have 1 analytics-only replica for isolation purposes.

  3. Curt Monash on March 21st, 2012 7:41 am

    Hi Stu,

    So was I wrong to say that Cassandra runs over CFS?

  4. Jeremy Hanna on March 21st, 2012 3:27 pm

    nice overview, though CFS is an HDFS API equivalent distributed filesystem built on top of Cassandra and is only available in DSE. Cassandra itself runs on XFS or whatever on each node.

    Re: twitter vs netflix as a cassandra user. Surely netflix does a lot with cassandra, but I think twitter is an interesting case as they 1) run on real metal and 2) have the largest number of nodes they run of anyone I’m aware of. Over 1000 according to https://dev.twitter.com/blog/cassie-scala-client-for-cassandra Twitter also has a fraction of the staff maintaining those clusters. So both are interesting.

  5. Stu Hood on March 21st, 2012 3:53 pm

    > So was I wrong to say that Cassandra runs over CFS?
    Right: it is the other way around: CFS is an HDFS replacement that is hosted on Cassandra: where HDFS has a Namenode and Datanodes to store the blocks and inodes of a distributed filesystem, CFS uses only a Cassandra cluster to do the same thing (albeit without transactional move/rename semantics).

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.