Laboratory for High Performance Scientific Computing and Computer Simulation

Department of Computer Science

University of Kentucky

Lexington, KY 40506-0046, USA

Random rotation is one of the common perturbation approaches for privacy preserving data classification, in which the data matrix is multiplied by a random rotation matrix before publishing in order to preserve data privacy. One distinct advantage of this approach is that it can maintain the geometric properties of the data matrix, so several categories of classifiers that are based on the geometric properties of the data can achieve similar accuracy on the transformed data as that on the original data. In this paper, we generalize this idea to the situation where the data matrix is assumed to be vertically partitioned into several submatrices and held by different owners. Each data holder can choose a rotation matrix randomly and independently to perturb their individual data. Then they all send the transformed data to a third party, who collects all of them and forms a whole data set for data mining or other analysis purposes. We show that under such a scheme the geometric properties of the data set is also preserved and thus it can maintain the accuracy of many classifiers and clustering techniques applied on the transformed data as on the original data. Experiments on two real data sets show that such generalization is effective for vertically partitioned data sets.

**Mathematics Subject Classification**:

Download the the PDF file lin1.pdf.

Technical Report CMIDA-HiPSCCS 008-08, Department of Computer Science, University of Kentucky, KY, 2008.

The research work of J. Zhang was supported in part by NSF under grants CCF-0527967 and CCF-0727600, in part by NIH under grant 1R01HL086644-01, in part by Alzheimer's Association under grant NIGR-06-25460, and in part by KSEF under grant KSEF-148-502-06-186.