Efficient deployment of Big Data on large-scale computing systems


Applications that analyse Big Data run on very large systems, such as public clouds and private data centres. These systems should be able to guarantee predictable performance and utilisation costs. This project will achieve this with techniques that optimise the deployment of Big Data applications on hybridisation (i.e. adapted) systems.

Portrait / project description (ongoing research project)

The deployment of BDAs on large-scale systems such as public clouds, private clusters or even “crowd” computing resources should be offered with guarantees of predictable performance and use cost. Currently, this is not possible because we lack the technology at the level of modelling and analytics that identifies the key characteristics of BDAs and their impact on performance. There is also little information that addresses the role of system operation and infrastructure in overall performance. This project will provide a deeper understanding of this issue using a novel combination of Big Data analytics and modelling and prediction methods, as well as schemes that orchestrate execution by adapting (i.e. extending or changing) the system. The project will focus on optimising exemplary applications in high-energy physics from CERN and bioinformatics from Vital-IT.


Analysis of Big Data typically produces and processes enormous amounts of data and also involves complex and long-running computations. Detailed modelling of performance with respect to the utilisation of the underlying system parts (e.g. storage, memory, network) suggests opportunities for optimisation for both system providers and users. System providers can predict bottlenecks and employ system hybridisations that may alleviate them. Providers can offer cost guarantees within service-level agreements. Users can take informed decisions in selecting the resources needed for their applications.


The project will enable optimised deployment of exemplary Big Data Applications (BDAs) in large-scale computing environments that conduct significant data analytics, such as applications of high-energy physics from CERN (Europe’s particle-physics research centre) and bioinformatics applications from Vital-IT (part of the Swiss Institute of Bioinformatics). It will also investigate the limitations of performance optimisation based on generic guidelines for profiling, prediction and deployment, with respect to the specific characteristics of the BDAs and the system.


Data-driven science has a substantial socioeconomic impact. Speeding it up while reducing its cost will lead to better scientific products, such as medical treatments and improvement of time to market. The project contributes to realising the socioeconomic potential of a wide range of research by boosting data-driven science efficiency and cost-effectiveness. Moreover, the results will enable the creation of a new service ecosystem. This development will have a positive effect on businesses in sectors that handle Big Data applications, such as retail, finance, healthcare and energy.

Original title

Deployment Optimization for Big Data Applications on Hybrid Large-Scale Computing Infrastructures

Project leader

Professeur Vasiliki Kantere, Data-Intensive Applications and Systems Lab, Centre Universitaire d'Informatique, Université de Genève



Further information on this content


Professeur Vasiliki Kantere Data-Intensive Applications and Systems Lab
Centre Universitaire d'Informatique
Université de Genève
Bâtiment Battelle A
Route de Drize 7 1227 Carouge

On this Subject