Stateful stream processing is a common use case of big data analytics. State is not just a byproduct of the computation, but oftentimes serves as an output or can even directly affect the computation itself. Thus this state needs to be persisted and automatically restored in case of failure in a consistent manner, while preferably providing exactly once semantics.
This talk describes three open source approaches to tackle this problem, namely the mechanism provided by Trident, Spark Streaming and Flink. The three systems’ architecture and performence is assessed in terms of common stateful streaming use cases.
About the speaker
MÃ¡rton Balassi is a PMC member at Apache Flink and researcher at the Hungarian Academy of Sciences. He has worked for data Artisans in Berlin. His main expertise and interest is real-time distributed data processing frameworks. His current work includes research and development on mapping the models and guarantees of different streaming systems.
MÃ¡rton has been a speaker at ApacheCon, Hadoop Summit and numerous Big Data related meetups recently.