Navigation

Making sense of metadata by means of automatic integration

 

Integrating metadata is currently an expensive and tedious task because it has proved very difficult to automate. This project aims to develop new techniques for the efficient, automatic integration of metadata taken from the Web or social networks, for example.

Portrait / project description (ongoing research project)

This project is divided into two parts. The first part consists in developing and then testing new techniques for extracting data in order to characterise the available data automatically, understand the relationships between pieces of data and model their value distribution. Second, this information will be used to facilitate the analysis and integration of the available data. It will be necessary to develop new techniques capable of creating data patterns on demand and providing abstraction layers. The ultimate goal is to provide processes which allow data sets to be easily combined while preserving their specific features and history.

Background

One of the cornerstones of Big Data consists in combining several sources of information in order to model a specific phenomenon. Most current methods are based on analysis of data patterns, and particularly on the metadata that unambiguously defines the structure of the information to be combined. Nevertheless, in practice these patterns often turn out to be incomplete, e.g. for data originating from social networks or the Web. Given that it is currently impossible to combine this data automatically, experts have no choice other than to prepare and integrate it manually. The resulting loss of time is one of the major problems of Big Data.

Aim

The aim of this project is to devise new techniques for the automatic or semi-automatic integration of data. Because the data structure is often not defined in advance, the central challenge for our research is to understand it retrospectively, by reconstructing patterns using the available data.

Relevance/application

This project is particularly important because of the disproportion between the ever-increasing volume of data available and the limited time available for analysts to process it. The results of this project will help to substantially speed up the process of turning raw data into models and visualisations. Numerous fields that require the combination of heterogeneous data sets (e.g. smart cities, personalised healthcare and e-science) stand to benefit from new methods of combining different data sets, resulting in more powerful analyses and models.

Original title

Tighten-it-All: Big Data Integration for Loosely-Structured Data

Project leader

Professeur Philippe Cudré-Mauroux, Département d'Informatique, Université de Fribourg

 

 

Further information on this content

 Contact

Professeur Philippe Cudré-Mauroux Département d'Informatique Université de Fribourg Boulevard de Pérolles 90 1700 Fribourg phil@exascale.info

On this Subject