In this talk, we will describe how we used Flink at Telefonica Research to automatically identify trackers on the web. We analyzed several months of user traffic logs to mine web tracker behavior and gain insight on their network structure.
We will discuss how we modeled the data as a graph of web accesses and how we used Gelly«‹, Flink’s graph processing API,«‹ to exploit the graph structure and build a classifier that automatically discovers trackers. The talk will focus on the data analysis pipeline implementation and Flink’s features and APIs that enable this kind of exploratory data analysis.«‹
About the speaker
Vasia Kalavri is a PhD student at KTH, Stockholm, doing research on distributed data processing, systems optimization and large-scale graph analysis. She is also a PMC member of Apache Flink, mainly working on Flink’s graph processing API, Gelly.