Session starts - 14:45

Juggling with Bits and Bytes – How Apache Flink operates on binary data

Many popular open source data processing systems are implemented in JVM-based languages. Storing and operating on large amounts of data in-memory is a common challenge for this kind of data analysis systems. The most straight-forward approach to process data in a JVM is to put it as objects on the heap and directly work with these objects. However, this approach has several notable drawbacks including controlling the memory consumption and reducing the garbage collection overhead.

This talk presents Apache Flink’s approach to address this challenge. We discuss Flink’s active memory management, its custom data serialization framework, and its techniques to efficiently operate on binary data. The talk concludes with benchmark results that compare the straight-forward object-on-heap and Flink’s approach.

Show in schedule

About the speaker

Fabian Hueske

Fabian Hueske is a PMC member of Apache Flink. He started working on this project as part of his PhD studies at TU Berlin in 2009. Fabian did internships with IBM Research, SAP Research, and Microsoft Research and is a co-founder of data Artisans, a Berlin-based start-up devoted to foster Apache Flink. He is frequently giving talks on Apache Flink at conferences and meetups. Fabian is interested in distributed data processing and query optimization.



We've added you to our Newsletter.

Feel free to unsubscribe at any time through the link provider in the bottom of our e-mails.


You're already on the list