Privacy-preserving, stream analytics for non-computer scientists


Society produces data continuously, and at unprecedented speed. As a result, it is increasingly unrealistic to educate a sufficient number of skilled computer scientists to collect and analyse these data. Instead, we need new ways to analyse data as it is being produced.

Portrait / project description (ongoing research project)

In this project we develop a petabyte-scale, privacy-preserving processing system for commodity (i.e. standard) hardware. First, we provide a user-friendly programming language based on traditional querying but with extensions for statistical operations and capacity for real-time operations. Second, the language permits users to specify the desired level of privacy. Third, the system compiler translates the statistical functions and privacy specifications into executable computations. Finally, the runtime environment selects the best approach for optimising execution using existing systems (e.g. Apache Flink, Spark Streaming or Storm).


Production of Big Data will soon outpace the availability of both storage and computer science experts who know how to handle such data. Moreover, society is increasingly concerned about data protection. Addressing these issues requires so-called stream-processing systems that continuously analyse incoming data (rather than store it) and allow non-computer scientists to specify its analysis in a privacy-preserving manner. This project could vastly simplify the development of new, societally acceptable applications of real-time data analytics.


We will develop a petabyte-scale analytics system (i.e. processing millions of gigabytes) that enables non-computer scientists to analyse high-performance data streams. Our solution will support real-time advanced statistical operations and ensure the privacy of the data. To evaluate the robustness and functionality of our system, we will replicate the processing pipeline for the Australian Square Kilometre Array Pathfinder radio telescope. This will generate up to 2.5 gigabytes per second of raw data. To evaluate privacy preservation, we will analyse the TV viewing habits of around 3 million individuals.


The ubiquity of electronic devices and sensors is leading society to a data deluge. The results of this project will allow non-computer scientists to efficiently analyse and explore these ever-increasing data sources while adhering to data protection laws.

Original title

Privacy Preserving, Peta-scale Stream Analytics for Domain-Experts

Project leaders

  • Prof. Michael Böhlen, Institut für Informatik, Universität Zürich
  • Prof. Abraham Bernstein, Institut für Informatik, Universität Zürich



Further information on this content


Professor Michael Böhlen Institut für Informatik Universität Zürich
BIN 2.E.13
Binzmühlestrasse 14 8050 Zürich

On this Subject