Efficient analysis of genomic data


Technological advances in DNA sequencing are making it easier to decode the genome of numerous organisms. The challenge that this mass of variable-quality data presents for biologists is how to analyse it efficiently and consistently.

Portrait / project description (ongoing research project)

First of all, the project will focus on developing tools capable of organising genomic data and deducing comparable biological elements from it, such as genes that are similar between different species. Using different types of genomic data, these tools will make it possible to analyse more species, which is important for gaining a better understanding of the processes involved in the evolution of species. The second area of focus will consist in developing new machine learning algorithms capable of identifying which of the tens of thousands of genes present in the genomes show the most interesting characteristics. Studying them in depth with the help of modelling methods will enable their interactions and evolution to be understood.


Identifying the genes that are key to an organism’s development enables scientists to determine which genes relate to functions that are essential to the organism’s survival. In medicine, for example, it is vital to know whether a gene identified in a model organism such as a mouse has the same function in human beings. Answering questions of this kind requires complex computing methods and high-quality data. Such questions are therefore restricted to a small number of organisms that have been studied in great depth and ignore the enormous quantity of poorer-quality data that is currently being generated.


This project aims to develop new computational approaches capable of processing genomic data of variable quality in order to compare the genomes of different organisms. Modelling the interactions between genes with the help of machine learning methods will make it possible to understand, for example, the evolution of groups of genes involved in metabolic processes.


The project’s scope is in full conformity with the issue of Big Data, since it addresses the size, heterogeneity and quality of genomic data in biology. It also has implications that go beyond this single discipline, since establishing approaches for managing and comparing data is essential in other fields, such as language analysis. Moreover, machine learning is a key component of computational sciences.

Original title

Efficient and accurate comparative genomics to make sense of high volume low quality data in biology

Project leaders

  • Professeur Nicolas Salamin, Département d'Ecologie et d'Evolution, Faculté de Biologie et de Médecine, Université de Lausanne
  • Dr. Marc Robinson-Rechavi, Département d'Ecologie et d'Evolution, Faculté de Biologie et de Médecine, Université de Lausanne
  • Professeur Bastien Chopard, Centre Universitaire d'Informatique, Université de Genève
  • Professeur Christophe Dessimoz, Département d'Ecologie et d'Evolution, Faculté de Biologie et de Médecine, Université de Lausanne



Further information on this content


Professeur Nicolas Salamin Département d'Ecologie et d'Evolution
Faculté de Biologie et de Médecine
Université de Lausanne Biophore
1015 Lausanne