March 17, 2014

Notes and comments, March 17, 2014

I have ever more business-advice posts up on Strategic Messaging. Recent subjects include pricing and stealth-mode marketing. Other stuff I’ve been up to includes:

The Spark buzz keeps increasing; almost everybody I talk with expects Spark to win big, probably across several use cases.

Disclosure: I’ll soon be in a substantial client relationship with Databricks, hoping to improve their stealth-mode marketing. 😀

The “real-time analytics” gold rush I called out last year continues. A large fraction of the vendors I talk with have some variant of “real-time analytics” as a central message.

Basho had a major change in leadership. A Twitter exchange ensued. 🙂 Joab Jackson offered a more sober — figuratively and literally — take.

Hadapt laid off its sales and marketing folks, and perhaps some engineers as well. In a nutshell, Hadapt’s approach to SQL-on-Hadoop wasn’t selling vs. the many alternatives, and Hadapt is doubling down on poly-structured data*/schema-on-need.

*While Hadapt doesn’t to my knowledge use the term “poly-structured data”, some other vendors do. And so I may start using it more myself, at least when the poly-structured/multi-structured distinction actually seems significant.

WibiData is partnering with DataStax, WibiData is of course pleased to get access to Cassandra’s user base, which gave me the opportunity to ask why they thought Cassandra had beaten HBase in those accounts. The answer was performance and availability, while Cassandra’s traditional lead in geo-distribution wasn’t mentioned at all.

Disclosure: My fingerprints are all over that deal.

In other news, WibiData has had some executive departures as well, but seems to be staying the course on its strategy. I continue to think that WibiData has a really interesting vision about how to do large-data-volume interactive computing, and anybody in that space would do well to talk with them or at least look into the open source projects WibiData sponsors.

I encountered another apparently-popular machine-learning term — bandit model. It seems to be glorified A/B testing, and it seems to be popular. I think the point is that it tries to optimize for just how much you invest in testing unproven (for good or bad) alternatives.

I had an awkward set of interactions with Gooddata, including my longest conversations with them since 2009. Gooddata is in the early days of trying to offer an all-things-to-all-people analytic stack via SaaS (Software as a Service). I gather that Hadoop, Vertica, PostgreSQL (a cheaper Vertica alternative), Spark, Shark (as a faster version of Hive) and Cassandra (under the covers) are all in the mix — but please don’t hold me to those details.

I continue to think that computing is moving to a combination of appliances, clusters, and clouds. That said, I recently bought a new gaming-class computer, and spent many hours gaming on it just yesterday.* I.e., there’s room for general-purpose workstations as well. But otherwise, I’m not hearing anything that contradicts my core point.

*The last beta weekend for The Elder Scrolls Online; I loved Morrowind.


12 Responses to “Notes and comments, March 17, 2014”

  1. Vlad Rodionov on March 17th, 2014 2:03 pm

    On Cassandra advantage in performance, availability and multi DC replication… It would be nice to see any benchmark numbers confirming DataStax statements. The only area where Cassandra has a clear leg right now is availability (It is much easier to implement this when you are “eventually consistent”). HBase is pretty good right now in both performance and replication.

    Hey, Cassandra, how about atomic counters?

  2. Curt Monash on March 17th, 2014 2:51 pm


    Until recently, the consensus for Cassandra having a large speed advantage over HBase was so strong that I don’t think there’s much point questioning it.

    That said, I was referring to a set of fairly established Cassandra users, to the extent that any Cassandra or HBase user can be said to be “established” given the newness of the technologies. So you’re entitled to ask whether the consensus remains accurate at this point.

  3. MattK on March 17th, 2014 2:56 pm

    > “PostgreSQL (a cheaper Vertica alternative)”

    Can you explain that one?

  4. Curt Monash on March 17th, 2014 3:02 pm

    Cheap Gooddata plans have data marts in PostgreSQL. More expensive plans have Vertica in that role.

    Separately, there’s seems to be another instance of Vertica that is used in any case.

    It’s a confusing architecture.

  5. John on March 17th, 2014 6:43 pm


    Is NoSQL playing any role in real time analytics or is it primarily still a use case for Newsql? Any examples of use case will be appreciated.


  6. Curt Monash on March 17th, 2014 6:54 pm


    1. There’s a big role in aggregates like counters and so on. Think game leaderboards, or network monitoring.

    2. Look also at WibiData. 🙂

  7. Vlad Rodionov on March 18th, 2014 4:41 pm

    >> The consensus for Cassandra having a large speed >> advantage over HBase was so strong that I don’t >> think there’s much point questioning it.

    People do not like *consensus*, people like real benchmark results. All benchmark results I have seen so far were mixed (in some tests Cassandra was better, in other HBase) and the difference was quite marginal in most cases. I would like to note that NONE of these benchmarks have used proper HBase configuration (no table pre-splitting, for example). With a proper table pre-splitting, HBase 0.94 on standard YCSB easily makes 8K insert/updates per sec per one node (AWS, m1.xlarge). These numbers are way above those I have seen in all comparison benchmarks for HBase and way above those for Cassandra. With some upcoming features in new HBase, (Check HBaseCon 2014 HBase: Extreme makeover upcoming presentation), HBase is going to have a clear advantage in read intensive workloads pretty soon
    as well. It has already :), take that Cassandra.
    Cassandra’s marketing PR is better for sure, I agree.

  8. Vlad Rodionov on March 18th, 2014 4:43 pm
  9. Patrick McFadin on March 18th, 2014 5:41 pm


    WibiData (Kiji in open source) benefits from a lot more than just user base with Cassandra. The management nightmare of HBase can be pretty daunting when looking at an implementation. Users want to be bored by a database and Cassandra just works. HBase has a lot of moving parts that need to work together or you are offline. That and programmer friendly features like collections and defined schema make it easy to work with.

    You pointed out the availability and that is what management wants now. Eventual consistency isn’t evil and I’m glad to see the Apache HBase team sees this too:


  10. Vlad Rodionov on March 18th, 2014 8:20 pm

    >> The management nightmare of HBase can be pretty
    >>daunting when looking at an implementation.

    Here starts the holy flame war.

    >> Cassandra just works.

    Sometimes works, sometimes does not. Should we search “OOME Cassandra”, “Cassandra very slow” or “GC long pause Cassandra”?


  11. Curt Monash on March 18th, 2014 8:23 pm

    Will the hits be about recent-release DataStax Cassandra distros, or about other stuff?

    It seems fair to ask, since — for example — nobody seems to be disputing that at some point in time Cassandra had a large advantage over HBase in performance. 🙂

  12. Real-time analytics for everybody, uniquely from us!! | DBMS 2 : DataBase Management System Services on March 18th, 2014 11:54 pm

    […] my latest post, I noted that The “real-time analytics” gold rush I called out last year […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.