Proteomics is a branch of Life Sciences concerned with studying the abundance and type of proteins found in cells. As the Human Genome project mapped out the genes making up human DNA, proteomics researchers are mapping out the proteins which are created from the information in the genes. The experiments using Mass Spectrometers typically create large volumes of data which need complex pre-processing steps in order to be useful.
In this talk we look at how a parallel algorithm was developed and a process initially created for Hadoop MapReduce has been run successfully on Apache Flink. Comparing the streaming process with a batch based Hadoop job and the implications this could have for large scale proteomics laboratories.
About the speaker
Christopher is currently studying part-time for a PhD in Data Science at the University of Dundee applying Big Data analytics to the data produced from experimentation into the Human Proteome. For a day job Christopher Hillman is Principal Data Scientist in the International Advanced Analytics team at Teradata based in London and working in the international region. He has 20+ years experience working with data, information and analytics