An Efficient Solution for Privacy-preserving Naïve Bayes Classification in Fully Distributed Data Model

Authors

  • Vu Duy Hien
  • Luong The Dung
  • Hoang Duc Tho

DOI:

https://doi.org/10.54654/isj.v1i15.840

Keywords:

privacy-preserving data mining and machine learning, secure multi-party computation, Naïve Bayes classification, Homomorphic encryption, Data privacy

Tóm tắt

AbstractRecently, privacy preservation has become one of the most important problems in data mining and machine learning. In this paper, we propose a novel privacy-preserving Naïve Bayes classifier for the fully distributed data scenario where each record is only kept by a unique owner. Our proposed solution is based on a secure multi-party computation protocol, so that it has the capability to securely protect each data owner’s privacy, as well as accurately guarantee the classification model. Furthermore, our experimental results show that the new solution is efficient enough for practical applications.

Downloads

Download data is not yet available.

References

Y. Lindell and B. Pinkas, “Secure Multiparty Computation for Privacy-Preserving Data Mining,” J. Priv. Confidentiality, vol. 1, no. 1, pp. 59–98, 2009, doi: https://doi.org/10.29012/jpc.v1i1.566.

M. Kantarcıoˇglu, J. Vaidya, and C. Clifton, “Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data,” presented at the IEEE ICDM workshop on privacy preserving data mining, 1-7, 2003. [Online]. Available: http://www.cis.syr.edu/~wedu/ppdm2003/papers /1.pdf

J. Vaidya, M. Kantarcioglu, and C. Clifton, “Privacy-preserving Naïve Bayes classification,” VLDB J., vol. 17, pp. 879–898, 2008, doi: https://doi.org/10.1007/s00778-006-0041-y.

B. Schneier, Applied Cryptography, 2nd ed. John Wiley & Sons, 1996.

C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu, “Tools for Privacy Preserving Distributed Data Mining,” ACM SIGKDD Explor. Newsl., vol. 4, no. 2, pp. 28–34, 2002, doi: https://doi.org/10.1145/772862.772867.

Z. Yang, S. Zhong, and R. N. Wright, “PrivacyPreserving Classiflcation of Customer Data without Loss of Accuracy,” in Proceedings of the 2005 SIAM International Conference on Data Mining, 2005, pp. 92–102. doi: https://doi.org/10.1137/1.9781611972757.9.

X. Yi and Y. Zhang, “Privacy-preserving Naive Bayes classification on distributed data via semitrusted mixers,” Inf. Syst., vol. 34, pp. 371–380, 2009, doi: https://doi.org/10.1016/j.is.2008.11.001.

M. E. Skarkala, M. Maragoudakis, S. Gritzalis, and L. Mitrou, “PPDM-TAN: A PrivacyPreserving Multi-Party Classifier,” Computation, vol. 9, no. 6, pp. 1–25, 2021, doi: https://doi.org/10.3390/computation9010006.

P. Paillier, “Public-Key Cryptosystems Based on Composite Degree Residuosity Classes,” in International Conference on the Theory and Applications of Cryptographic Techniques, 1999, pp. 223–238. doi: https://doi.org/10.1007/3-540- 48910-X_16.

C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proceedings of the forty-first annual ACM symposium on Theory of computing, 2009, pp. 169–178. doi: https://doi.org/10.1145/1536414.1536440.

P. Li, J. Li, Z. Huang, C.-Z. Gao, W.-B. Chen, and K. Chen, “Privacy-preserving outsourced classification in cloud computing,” Clust. Comput., vol. 21, pp. 277–286, 2018, doi: https://doi.org/10.1007/s10586-017-0849-9.

M. Huai, L. Huang, W. Yang, L. Li, and M. Qi, “Privacy-preserving Naive Bayes classification,” in International conference on knowledge science, engineering and management, 2015, pp. 627–638. doi: https://doi.org/10.1007/978-3-319- 25159-2_57.

T. Li, J. Li, Z. Liu, P. Li, and C. Jia, “Differentially private Naive Bayes learning over multiple data sources,” Inf. Sci., vol. 444, pp. 89– 104, 2018, doi: https://doi.org/10.1016/j.ins.2018.02.056.

P. Li, T. Li, H. Ye, J. Li, X. Chen, and Y. Xiang, “Privacy-preserving machine learning with multiple data providers,” Future Gener. Comput. Syst., vol. 87, pp. 341–350, 2018, doi: https://doi.org/10.1016/j.future.2018.04.076.

V. Duy Hien, L. The Dung, and H. Tu Bao, “An efficient approach for secure multi-party computation without authenticated channel,” Inf. Sci., vol. 527, pp. 356–368, 2020, doi: https://www.doi.org/10.1016/j.ins.2019.07.031.

O. Goldreich, “Basic Applications,” in Foundations of Cryptography, vol. II, Cambridge University Press, 2004.

F. Hao, P. Y. A. Ryan, and P. Zielin´ski, “Anonymous voting by two-round public discussion,” IET Inf. Secur., vol. 4, no. 2, pp. 62– 67, 2010, doi: https://doi.org/10.1049/ietifs.2008.0127.

H. Hofmann, “Statlog (German Credit Data) Data Set,” 1994. https://archive.ics.uci.edu/ml/datasets/statlog+(g erman+credit+data)

Downloads

Abstract views: 211 / PDF downloads: 13

Published

2022-06-08

How to Cite

Hien, V. D., Dung, L. T. ., & Tho, H. D. (2022). An Efficient Solution for Privacy-preserving Naïve Bayes Classification in Fully Distributed Data Model. Journal of Science and Technology on Information Security, 1(15), 56-61. https://doi.org/10.54654/isj.v1i15.840

Issue

Section

Papers