March 27, 2012

DataStax Enterprise and Cassandra revisited

My last post about DataStax Enterprise and Cassandra didn’t go so well. As follow-up, I chatted for two hours with Rick Branson and Billy Bosworth of DataStax. Hopefully I can do better this time around.

For starters, let me say there are three kinds of data management nodes in DataStax Enterprise:

Cassandra, Solr, Lucene, and Hadoop are all Apache projects.

If we look at this from the standpoint of DML (Data Manipulation Language) and data access APIs:

In addition, it is sometimes recommended that you use “in-entity caching”, where an entire data structure (e.g. in JSON) winds up in a single Cassandra column.

The two main ways to get direct SQL* access to data in DataStax Enterprise are:

*or very SQL-like, depending on how you view things

Before going further, let’s recall some Cassandra basics:

The story for Solr/Lucene indexing, beyond text search and so on, goes like this:

Notes on Hadoop-on-Cassandra include:

DataStax emphasizes the point that DSE (DataStax Enterprise) lets you do multiple things on “the same cluster”, thus gaining operational simplicity. The essence of this claim is:

Vanilla Cassandra and Hadoop-on-Cassandra nodes can be combined in a single logical data center because they manage the same data structures. The two big gotchas in that are:

So in particular:

By way of contrast, Solr-on-Cassandra nodes have additional data structures, specifically indexes, which is probably why they don’t have the same degree of interoperability with other kinds of nodes at this time. Solandra, not to be confused with Solyndra, is a different kind of Solr/Cassandra combination, without this problem. But in not using the Lucene indexes it has other issues, such as performance, and is no longer part of the DataStax offering.

On the business side, DataStax declines to follow-up on its figure of >50 subscription customers over a year ago, and merely cites a figure of 140ish total customers, which apparently includes every outfit that’s bought at least one day of training.


6 Responses to “DataStax Enterprise and Cassandra revisited”

  1. DataStax Enterprise 2.0 : DBMS 2 : DataBase Management System Services on March 27th, 2012 3:47 pm

    […] Edit: Multiple errors in the post below have been corrected in a follow-on post about DataStax Enterprise and Cassandra. […]

  2. Michelle Agul on March 27th, 2012 5:25 pm

    Curt: Can you further clarify your third bullet under ‘Notes on Hadoop-on-Cassandra…’??
    Not sure I follow your Cassandra/CFS analogy to HBase/HDFS.

    Lastly, would you consider DataStax DSE to be a Hadoop Distribution since it utilizes MapReduce (but not HDFS); Therefore, similar to MapR replacing HDFS with NFS??

  3. Curt Monash on March 27th, 2012 5:35 pm


    HBase is implemented as a layer on HDFS.

    But CFS is implemented as a layer on Cassandra.

  4. Curt Monash on March 27th, 2012 5:36 pm

    I try to stay out of the definitional jockeying as to which pieces of Hadoop are required before you can claim that something is a Hadoop distribution.

  5. Joe on March 28th, 2012 10:00 am

    FWIW, Gartner considers DataStax Enterprise a Hadoop distribution.

    I would, too, given that it’s the same API. Just one commenter’s opinion…

  6. “Enterprise-ready Hadoop” | DBMS 2 : DataBase Management System Services on June 19th, 2012 8:42 pm

    […] Amazon) cloud, or in some cases on a cluster shared with another data management systems. (E.g. DataStax/Cassandra, Hadapt/PostgreSQL, or IBM Netezza.) Anyhow, requiring a dedicated cluster isn’t a […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.