Colloquium: Preserving Data Privacy in the Big Data Era

Computer Science Colloquium

                
                          3:30 p.m.
                   Davis Marksbury Theater

                        Dr. Yeye He
                 University of Wisconsin-Madison


            Preserving Data Privacy in the Big Data Era

Abstract:
In this era of big-data, the tension between doing useful data analysis
and preserving data privacy has grown significantly, and the problem of
data privacy has become ever more important. Unfortunately, existing
anonymization techniques cannot handle or do not consider a number of
important data processing tasks. To address this problem, my
dissertation analyzes challenges and proposes anonymization techniques
in the context of three fundamental data models: relational data,
set-valued data, and streaming event data.

In this talk, I will focus on a new privacy problem motivated by
hospital applications of the streaming model called Complex Event
Processing. Despite the popularity of this event processing model, so
far its privacy implication has been overlooked. I will describe the
fundamental structure of the problem and discuss its theoretical
properties. I will also present real-time privacy-aware event processing
techniques that serve as a promising step towards a full privacy
solution in a streaming environment.

                        Refreshments Served
                              3:00p.m.
                        Marksbury Atrium

Bio:
Yeye He is a PhD candidate advised by Professor Jeffrey Naughton in the
Department of Computer Science at University of Wisconsin-Madison. His
thesis work is in the area of preserving data privacy, which is
motivated by diverse real-world applications including streaming event
processing, market basket analysis and machine learning using medical
records.

Yeye has completed several industrial internships at Microsoft Research
and Google. In addition to his dissertation work, he has worked on a
wide range of projects: SEISA, a set expansion system using
semi-structured Web data; Keyword++, a framework to improve keyword
search over entity databases; and EntityCrawl, a deep-web crawling
system optimized for entity-oriented content. Before starting his PhD
work, he worked on performance tuning for data warehousing benchmarks
and participated in the development of the TPC-DS benchmark as a Member
of Technical Staff at Oracle Corporation.