Dataflow is a unified programming model and a managed service for
developing and executing a wide range of data processing patterns
using batch and stream processing techniques. By default, applications
developed using Dataflow run on top of the Google Cloud Platform.
Using the open source Dataflow SDK, Data Artisans created an
integration with Dataflow which enables to run Dataflow applications
on top of Apache Flink. This provides users with a way to run Dataflow
applications independently of the Google infrastructure either in a
Flink standalone cluster or using Flink’s support for Apache YARN.
In this talk, we will take a look at Dataflow API concepts for batch
computing. We will learn how the support for Dataflow on top of Apache
Flink was realized and what the current features and limitations are.
In a demo, we will see how easy it is to develop and execute Dataflow
programs that run on top of Apache Flink.
About the speaker
Max is an Apache Flink committer and works as a software developer at
Data Artisans. He studied at Free University of Berlin and Istanbul
He previously worked as a research assistant in parallel
and distributed computing at Zuse Institute Berlin.