In the Hadoop ecosystem, new computing engines such as Tez, Spark, and Flink have emerged to replace the MapReduce engine. These candidates for the next-generation computing engine allow computations represented as DAGs instead of two-level MapReduce programs and also exploit advanced techniques such as in-memory computing.
In this talk, we compare the performance of major computing engines with two prototypical benchmark programs, sorting and hash join, and analyze the results. For both programs, Flink is the fastest thanks to its own memory management and pipelined data transfer between mappers and reducers. Flink utilizes hardware resources more efficiently than other systems.
In the last part of the talk, we present an experimental computing engine which uses a push model, instead of a traditional pull model, for data transmission between nodes. Experimental results on the new computing engine may shed light on how current computing engines can further improve their performance.
About the speaker
Dongwon Kim is a postdoctoral researcher at Pohang University of Science and Technology (POSTECH).
He received a PhD in Computer Science and Engineering from POSTECH. His doctoral thesis is about designing a fault-tolerant MapReduce engine that uses the push model for high performance. Currently he has participated in a research project to develop a new computing engine for Hadoop that supports DAG processing.