Evidence-based policy: uncovering causality from data
Big Data improves both the forecasting of economic developments and the analysis of their impact. Whereas dramatic advances in forecasting have been made in the past, use of this data for measuring impacts was still in its infancy at the beginning of the project. The aim of this project was therefore to refine and extent impact measurement methods and apply them to a selected sample of research questions.
Portrait / project description (completed research project)
In the first part of the project, we combined causal analysis methods from micro-econometrics with the statistical methods of machine learning. We first examined the properties of the resulting new statistical processes using simulation methods. We then investigated the practical feasibility of the methods, extended them where needed, and applied them to specific empirical applications.
In recent years, micro-econometric research has made great advances in the development of methodological tools for answering causal questions. These methods – e.g. for the assessment of economic policy measures – have been successfully employed. Unfortunately, these tools are largely unsuitable for analysing complex data volumes and do not exploit latest advances in machine learning. Can methods be enhanced in such a way as to significantly advance the use of Big Data and Machine Learning for impact measurement?
The goal of the present project is to combine the microeconometric methods of causal analysis (impact measurement) and the statistical forecasting models of machine learning to be able to use large-volume data sets to substantially improve the impact analysis of decisions taken by economic policymakers and private sector actors.
The outcome of the project leads to more reliable statements of the impact of individual measures and decisions in numerous economic contexts. While this facilitates more efficient (since evidence-based) economic policymaking for the public sector, companies in the private sector will also benefit from improved decision-making tools.
In the first part of this project, we evaluated existing methods of causal machine learning by simulation methods, and subsequently extended these methods, and developed new ones. The main goal of most of these extensions and new developments, based on double machine learning as well as causal forests, were to obtain one consistent set of methods that allows to estimate relevant causal parameters on different aggregation levels in a coherent way, as well as to perform optimal policy analysis. The latter is based on allocating the ‘policy’ or treatment to some population in order to maximise some objective function, like profits of a firm or some well-defined welfare of a policy maker. Respective computer code in R or Python was created and made publicly available (free of charge) via PyPy and CRAN.
The new methods were applied to several economics questions:
- What are the (heterogeneous) effects of participating in active labour market programmes for the specific type of unemployed? In this case, similar analyses were conducted with administrative data from Flanders and Germany.
- What are the effects of environmental regulations on offer prices of used cars? This analysis was based on data from a German online portal.
- Is there any favouritism of referees in soccer games towards teams coming from the same Swiss language region? This analysis was based on Swiss soccer data from the two top leagues.
- Additional empirical analysis applied the methodology to questions of the effects of practicing music on child development, the effects of being sporty on the success in online dating platforms, the effects of news sentiment about earnings announcements on stock market indicators, as well on questions concerning the so-called ‘resource curse’ in developing countries.
In all these applications coming from very different fields, the new methods showed their substantial value-added compared to the existing empirical tool kid. Thus, they lead to a substantial improvement of the value of empirical studies for decision making.
Causal analysis with Big Data