14 September 2011
The Open University, Israel
Tamir Tassa

Privacy Preserving Data Publishing (PPDP) is an evolving research field that is targeted at developing anonymization
techniques to enable publishing data so that privacy is preserved while data distortion is minimized. Up until recently
most of the research on PPDP considered partition-based anonymization models. The approach in such models is to
partition the database records into groups and then homogeneously generalize the quasi-identifiers in all records within
a group, as a countermeasure against linking attacks. We describe in this talk alternative anonymization models which
are not based on partitioning and homogeneous generalization. Such models extend the set of acceptable anonymizations of
a given table, whence they allow achieving similar privacy goals with much less information loss. We shall briefly
review the basic models of homogeneous anonymization (e.g. k-anonymity and l-diversity) and then define non-homogeneous
anonymization, discuss its privacy, describe algorithms and demonstrate the advantage of such anonymizations in reducing
the information loss. We shall then discuss the usefulness of those models for data mining purposes. In particular, we
will show that the reduced information loss that characterizes such anonymizations translates also to enhanced accuracy
when using the anonymized tables to learn classification models.

Based on joint works with Aris Gionis, Arnon Mazza, Mark Last and Sasha Zhmudyak

Institution department: 
Department of Mathematics and Computer Science,