August 21, 2016

Introduction to data Artisans and Flink

data Artisans and Flink basics start:

Like many open source projects, Flink seems to have been partly inspired by a Google paper.

To this point, data Artisans and Flink have less maturity and traction than Databricks and Spark. For example: 

Per Kostas, about half of Flink committers are at data Artisans; others are at Cloudera, Hortonworks, Confluent, Intel, at least one production user, and some universities. Kostas provided about 5 examples of production Flink users, plus a couple of very big names that were sort-of-users (one was using a forked version of Flink, while another is becoming a user “soon”).

The technical story at data Artisans/Flink revolves around the assertion “We have the right architecture for streaming.” If I understood data Artisans co-founder Stephan Ewen correctly on a later call, the two key principles in support of that seem to be:

In particular:

The upshot, Flink partisans believe, is to match the high throughput of Spark Streaming while also matching the low latency of Storm.

The Flink folks naturally have a rich set of opinions about streaming. Besides the points already noted, these include:

We discussed joins quite a bit, but this was before I realized that Flink didn’t have much SQL support. Let’s just say they sounded rather primitive even when I assumed they were done via SQL.

Our discussion of windowing was more upbeat. Flink supports windows based either on timestamps or data arrival time, and these can be combined as needed. Stephan thinks this flexibility is important.

As for Flink use cases, they’re about what you’d expect:

But Flink doesn’t have all the capabilities one would want for the kinds of investigative analytics commonly done on Spark.

Related links

Comments

3 Responses to “Introduction to data Artisans and Flink”

  1. Kostas Tzoumas on August 22nd, 2016 5:33 am

    Thank you Curt for the article!

    One clarification regarding Flink production users: several companies have talked publicly about their use of Flink in production, including Alibaba, King, Zalando, Bouygues Telecom, ResearchGate, and Otto group.

    The Flink community maintains a larger directory that collects such use cases here: http://flink.apache.org/poweredby.html

  2. David Gruzman on August 23rd, 2016 1:37 am

    In my view, it is very interesting to compare, as deep as possible, implementation of the shuffle in Spark and Flink.
    I suggest to focus on this side of comparison because when job became less trivial and need for shuffle appears – then this process will dominate both performance and stability…
    Have to be said that in streaming case shuffle get one more requirement – low latency, and it is interesting how it was solved.

  3. Rapid analytics | DBMS 2 : DataBase Management System Services – Cloud Data Architect on October 22nd, 2016 1:32 am

    […] General streaming. Some of my posts on that subject are linked at the bottom of my August post on Flink. […]

Leave a Reply




Feed: DBMS (database management system), DW (data warehousing), BI (business intelligence), and analytics technology Subscribe to the Monash Research feed via RSS or email:

Login

Search our blogs and white papers

Monash Research blogs

User consulting

Building a short list? Refining your strategic plan? We can help.

Vendor advisory

We tell vendors what's happening -- and, more important, what they should do about it.

Monash Research highlights

Learn about white papers, webcasts, and blog highlights, by RSS or email.