The issue of data privacy is considered a significant hindrance to the development and industrial applications of database publishing and data mining algorithms. Among many privacy-preserving methodologies, data perturbation is a popular technique for achieving the balance between data utilities and information privacy and security. It is known that the attacker's background or reference information about the original data can play a significant role in breaching data privacy. In this paper, we study the situation in which data privacy may be compromised with the leakage of a few original data records. In detail, we consider one situation in which the data owner publishes a perturbed database and the attacker knows exactly one or a few records of the original data. We find out that the remaining original data may be breached by a combination of the attacker's reference information and the perturbed data. We consider a potential privacy vulnerability with reference information in privacy-preserving database publishing and data mining based on the eigenspace of the perturbed data under some constraints. We then show that a general data perturbation model is vulnerable from this type of reference privacy breach.
Mathematics Subject Classification:
The research work of J. Zhang was supported in part by NSF under grants CCF-0527967 and CCF 0727600, in part by NIH under grant 1R01HL086644-01, in part by Alzheimer's Association under grant NIGR-06-25460, and in part by KSEF under grant KSEF-148-502-06-186.