We have developed privacy-preserving methods that combine the compressive mechanism and the Laplace mechanism.
Our method could be used if we can assume that the top and the rest (e.g., significant and non-significant SNP groups) rarely vary between neighboring datasets. (2023/06)
This contains the Python codes for our experiments in which we evaluated our mechanisms in terms of accuracy, rank error, and run time.
The finding from this study is that, when data can be divided into two groups and we can assume that the partitioning does not vary between the neighboring datasets, the output accuracy may increase by adding different noise to each group. In this study, by adding larger noise to the significant group (while satisfying the same
In the future, using this study as a starting point, we intend to develop a highly accurate method that satisfies
Further directions include a close examination of the conditions under which assumptions about data partitioning are valid, the establishment of new methods for quantitatively evaluating the quality of
・When we can assume that the partitioning does not vary between neighboring datasets (for example, when we can assume that the set of significant SNPs does not change even if a single individual in the analysis varies), our method can satisfy
・Need more rigorous evaluation of reconstruction errors in the compressive mechanism. (Pure differential privacy might not be completely satisfied. Need to utilize and introduce relaxed concepts of DP(?) (Should consider how to set the threshold between significant and non-significant groups.))
・Need a close examination of the distribution of random noise (and sensitivity in the compressive mechanism). (More noise than expected seems to be added to elements in the sig group.)
---> I intend to develop methods that truly satisfy
(・We should consider other information compression techniques and noise distributions.)
・(2023/05) Probably, more noise than necessary has been added to the significant data, and coupled with reconstruction error in the Compressed Sensing (as briefly mentioned in Conclusion), about half of those statistics become too large, especially for small
(・It would be extremely important to add no more noise than necessary, i.e., to add only the minimal amount of noise as possible, both in terms of accuracy and privacy assurance.)
・While this study was kind of a proof-of-concept, we intend to develop more reliable methods by investigating better information compression techniques and considering varying noise distributions smoothly at the boundary to truly satisfy differential privacy.
・As for the shortcomings of the proof about that our method is
(・Is it possible to develop privacy-preserving methods that maintain a certain accuracy regardless of the privacy level (like the results in this study)? (It doesn't matter whether they use differential privacy or not.))
For details of our mechanisms, please see our paper entitled "Privacy-Preserving Statistical Analysis of Genomic Data using Compressive Mechanism with Haar Wavelet Transform" (https://doi.org/10.1089/cmb.2022.0246) published in Journal of Computational Biology. Supplemental Material provides the proofs of our theorems and other supplemental information. This study was also presented at Privacy and Security Workshop at RECOMB'22.
Akito Yamamoto
Division of Medical Data Informatics, Human Genome Center,
the Institute of Medical Science, the University of Tokyo