Machine learning to predict the properties of chemical compounds


The number of theoretically possible chemical compounds is too great to systematically calculate them in advance. This project seeks to combine machine learning with the modern approximation processes used in quantum chemistry so that sensible predictions can nonetheless be made.

Project description (completed research project)

Machine learning is a mathematical process that enables computers to acquire knowledge independently from given data sets. This technique is already being used successfully to predict the properties of chemical compounds. However, predictions of this kind are not particularly accurate, because the number of chemical compounds is extremely large. This is due to the fact that the available data sets are either sufficiently accurate but too small, or large enough but too inaccurate. This is why, in this project, we are trying to develop an improved prediction model by using a clever combination of a few highly accurate data and a lot of less accurate data.


It takes a great deal of time and money to synthesise and test new materials in the chemical industry or new medicines in the pharmaceutical sector. This outlay could be reduced substantially if it were possible to control the complexity of chemical compounds. This project aims to use improved mathematical processes to enable the targeted development of chemical compounds with the desired properties.


The goal of this project is to develop a highly effective process capable of predicting the properties of chemical compounds. In this context, effective means that the properties of any chemical compound can be predicted with great accuracy after an extremely short calculation time.


This project provides experimental chemists with a new tool that can guide their efforts to identify, design, synthesise and characterise novel and interesting compounds by means of immediate predictions. In addition, the success of a model like this implies an improved quantitative understanding of the relationship between chemical structures and their properties.


Three parts constitute the achievements within this project.

The first part is the construction of training / test data. All data sets have been published and made available for scientific purposes.

The second part is about quantum machine learning models which utilise multiple fidelities of quantum reference data of varying computational cost / accuracy. We were able to greatly improve the so-called learning curve, which means that our quantum machine learning model is much more powerful now.

The third part is the mathematical foundation and development of numerical methods for big data problems.

These findings help to further improve machine learning models.

Original title

Big Data for Computational Chemistry: Unified machine learning and sparse grid combination technique for quantum based molecular design

Project leaders

  • Prof. Helmut Harbrecht, Fachbereich Mathematik, Departement Mathematik und Informatik, Universität Basel
  • Prof. Otto Anatole von Lilienfeld, Institut der Physikalischen Chemie, Departement Chemie, Universität Basel



Further information on this content


Prof. Helmut Harbrecht Fachbereich Mathematik
Departement Mathematik und Informatik
Universität Basel
Spiegelgasse 1 4051 Basel

On this Subject