An Efficient Solution for Privacy-preserving Naïve Bayes Classification in Fully Distributed Data Model
DOI:
https://doi.org/10.54654/isj.v1i15.840Keywords:
privacy-preserving data mining and machine learning, secure multi-party computation, Naïve Bayes classification, Homomorphic encryption, Data privacyTóm tắt
Abstract—Recently, privacy preservation has
become one of the most important problems in
data mining and machine learning. In this paper,
we propose a novel privacy-preserving Naïve
Bayes classifier for the fully distributed data
scenario where each record is only kept by a
unique owner. Our proposed solution is based on
a secure multi-party computation protocol, so that
it has the capability to securely protect each data
owner’s privacy, as well as accurately guarantee
the classification model. Furthermore, our
experimental results show that the new solution is
efficient enough for practical applications.
Tóm tắt—Gần đây, bảo vệ tính riêng tư đã trở
thành một trong những vấn đề quan trọng nhất
trong khai phá dữ liệu và học máy. Trong bài báo
này, chúng tôi đề xuất một bộ phân lớp Naïve
Bayes đảm bảo tính riêng tư mới cho kịch bản dữ
liệu phân tán đầy đủ trong đó mỗi bản ghi chỉ
được giữ bởi một người sở hữu duy nhất. Giải
pháp nhóm tác giả đề xuất được dựa trên tính
toán bảo mật nhiều thành viên nên nó có khả năng
bảo vệ an toàn sự riêng tư của mỗi người sở hữu
dữ liệu cũng như đảm bảo tính chính xác của mô
hình phân lớp. Hơn thế nữa, các kết quả thực
nghiệm của chúng tôi chỉ ra rằng giải pháp mới
đủ hiệu quả trong các ứng dụng thực tế.
Downloads
References
Y. Lindell and B. Pinkas, “Secure Multiparty
Computation for Privacy-Preserving Data
Mining,” J. Priv. Confidentiality, vol. 1, no. 1, pp.
–98, 2009, doi:
https://doi.org/10.29012/jpc.v1i1.566.
M. Kantarcıoˇglu, J. Vaidya, and C. Clifton,
“Privacy Preserving Naive Bayes Classifier for
Horizontally Partitioned Data,” presented at the
IEEE ICDM workshop on privacy preserving
data mining, 1-7, 2003. [Online]. Available:
http://www.cis.syr.edu/~wedu/ppdm2003/papers
/1.pdf
J. Vaidya, M. Kantarcioglu, and C. Clifton,
“Privacy-preserving Naïve Bayes classification,”
VLDB J., vol. 17, pp. 879–898, 2008, doi:
https://doi.org/10.1007/s00778-006-0041-y.
B. Schneier, Applied Cryptography, 2nd ed. John
Wiley & Sons, 1996.
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin,
and M. Y. Zhu, “Tools for Privacy Preserving
Distributed Data Mining,” ACM SIGKDD
Explor. Newsl., vol. 4, no. 2, pp. 28–34, 2002,
doi: https://doi.org/10.1145/772862.772867.
Z. Yang, S. Zhong, and R. N. Wright, “PrivacyPreserving Classiflcation of Customer Data
without Loss of Accuracy,” in Proceedings of the
SIAM International Conference on Data
Mining, 2005, pp. 92–102. doi:
https://doi.org/10.1137/1.9781611972757.9.
X. Yi and Y. Zhang, “Privacy-preserving Naive
Bayes classification on distributed data via semitrusted mixers,” Inf. Syst., vol. 34, pp. 371–380,
, doi:
https://doi.org/10.1016/j.is.2008.11.001.
M. E. Skarkala, M. Maragoudakis, S. Gritzalis,
and L. Mitrou, “PPDM-TAN: A PrivacyPreserving Multi-Party Classifier,” Computation,
vol. 9, no. 6, pp. 1–25, 2021, doi:
https://doi.org/10.3390/computation9010006.
P. Paillier, “Public-Key Cryptosystems Based on
Composite Degree Residuosity Classes,” in
International Conference on the Theory and
Applications of Cryptographic Techniques, 1999,
pp. 223–238. doi: https://doi.org/10.1007/3-540-
-X_16.
C. Gentry, “Fully homomorphic encryption using
ideal lattices,” in Proceedings of the forty-first
annual ACM symposium on Theory of
computing, 2009, pp. 169–178. doi:
https://doi.org/10.1145/1536414.1536440.
P. Li, J. Li, Z. Huang, C.-Z. Gao, W.-B. Chen, and
K. Chen, “Privacy-preserving outsourced
classification in cloud computing,” Clust.
Comput., vol. 21, pp. 277–286, 2018, doi:
https://doi.org/10.1007/s10586-017-0849-9.
M. Huai, L. Huang, W. Yang, L. Li, and M. Qi,
“Privacy-preserving Naive Bayes classification,”
in International conference on knowledge
science, engineering and management, 2015, pp.
–638. doi: https://doi.org/10.1007/978-3-319-
-2_57.
T. Li, J. Li, Z. Liu, P. Li, and C. Jia,
“Differentially private Naive Bayes learning over
multiple data sources,” Inf. Sci., vol. 444, pp. 89–
, 2018, doi:
https://doi.org/10.1016/j.ins.2018.02.056.
P. Li, T. Li, H. Ye, J. Li, X. Chen, and Y. Xiang,
“Privacy-preserving machine learning with
multiple data providers,” Future Gener. Comput.
Syst., vol. 87, pp. 341–350, 2018, doi:
https://doi.org/10.1016/j.future.2018.04.076.
V. Duy Hien, L. The Dung, and H. Tu Bao, “An
efficient approach for secure multi-party
computation without authenticated channel,” Inf.
Sci., vol. 527, pp. 356–368, 2020, doi:
https://www.doi.org/10.1016/j.ins.2019.07.031.
O. Goldreich, “Basic Applications,” in
Foundations of Cryptography, vol. II, Cambridge
University Press, 2004.
F. Hao, P. Y. A. Ryan, and P. Zielin´ski,
“Anonymous voting by two-round public
discussion,” IET Inf. Secur., vol. 4, no. 2, pp. 62–
, 2010, doi: https://doi.org/10.1049/ietifs.2008.0127.
H. Hofmann, “Statlog (German Credit Data) Data
Set,” 1994.
https://archive.ics.uci.edu/ml/datasets/statlog+(g
erman+credit+data)
Downloads
Abstract views: 0 / PDF downloads: 0
Published
How to Cite
Issue
Section
License
Proposed Policy for Journals That Offer Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Proposed Policy for Journals That Offer Delayed Open Access
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).