March 5, 2015

Cask and CDAP

For starters:


So far as I can tell:

I’d didn’t push the competition point hard (the call was generally a bit rushed due to a hard stop on my side), but:

To reiterate part of that last bullet — like much else we’re hearing about these days, CDAP is focused on operational apps, perhaps with a streaming aspect.

To some extent CDAP can be viewed as restoring the programmer/DBA distinction to the non-SQL world and streaming worlds. That is:

Further notes on CDAP data access include:

Examples of things that Cask supposedly makes easy include:

Tidbits as to how Cask perceives or CDAP plays with other technologies include:

Cask has ~40 people, multiple millions of dollars in trailing revenue, and — naturally — high expectations for future growth. I neglected, however, to ask how that revenue was split between subscription, professional services and miscellaneous. Cask expects to finish 2015 with a healthy two-digit number of customers.

Cask’s customers seem concentrated in usual-suspect internet-related sectors, although Cask gave it a bit of an enterprise-y spin by specifically citing SaaS (Software as a Service) and telecom. When I asked who else seems to be a user or interested based on mailing list activity, Cask mentioned a lot of financial services and some health care as well.

Related link


5 Responses to “Cask and CDAP”

  1. David Gruzman on March 5th, 2015 3:53 pm

    After reading a bit of sources I got to the filling that it is something like Posix – giving standard access to the various capabilities with different implementations. Is it right?
    What also interesting – if it is possible some way wrap existing zoo of MR jobs, Spark jobs, hive scripts etc into CDAP in automatic or semi-automatic manner?

  2. Henry on March 5th, 2015 8:23 pm

    Very good insights on Cask and CDAP, but small comment that Apache Flink is not simply alternative of Spark, it is more like alternative to MapReduce to do distributed data processing outside the MapReduce paradigm the right way. Both Flink and Spark try to solve similar problem but tackle them is very different ways. Simple Google about Flink should help give more insights the differences.
    The origin of both projects almost start at the same time and just happen that Spark went to ASF first and most initial contributor reside in US so the project get more exposure and usages.

  3. Curt Monash on March 6th, 2015 7:06 am


    Your description of Flink is also a description of Spark. So Flink is indeed an alternative to Spark. 🙂

    As for why Spark is winning — that has a lot to do with Cloudera’s embrace, and also with the happy good fortune of stumbling into the streaming use case, which Mike Franklin of course was in an excellent position to recognize when it arose.

  4. Henry on March 6th, 2015 6:01 pm

    Hi Curt, thx for the response.

    I did not mean that the analogy is wrong, but what I was saying the way Flink was created was never meant to be “alternative” of Spark bc both ideas came up at similar time in different places. I believe any new concept or solution to solve large data via distributed systems to skip the MapReduce limitations should be embraced and not dismissed it as “unneeded”.
    It is actually similar to what happened when Spark first came out, lots of people said it is not needed because we already have Hadoop, and look at it now =)
    Why Spark is winning? I would not say it is winning, it is more like popular, and as we all know from high school, popularity contest never good indicator of success later in life ^_*
    I am not even sure what does winning mean here? More than one alternatives for solving problems always good for consumers/developers.

    Enough about Flink from me, the blog was about Cask and CDAP and I would love to see more great stuff coming from them ^_^

  5. A new logical data layer? | DBMS 2 : DataBase Management System Services on March 23rd, 2015 1:38 am

    […] of those plans are fully baked yet. That said, there’s an aspect of logical data layer to CDAP, and to Kiji as well. And of course it’s central to BI (Business Intelligence) and ETL […]

Leave a Reply

Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:


Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.