A novel method for detecting URLs phishing using hybrid machine learning algorithm

Nguyen Manh Thang; Le Quang Anh; Hua Song Toan; Nguyen Quoc Trung

doi:10.54654/isj.v2i19.978

A novel method for detecting URLs phishing using hybrid machine learning algorithm

Authors

Nguyen Manh Thang Academy of Cryptography Techniques
Le Quang Anh
Hua Song Toan
Nguyen Quoc Trung

DOI:

https://doi.org/10.54654/isj.v2i19.978

Keywords:

URL, phishing, SVM, Naive Bayes, machine learning

Tóm tắt

Abstract— The phishing attack is the type of cyberattack that targets people’s trust by masking the malicious intent of the attack as communications from reputable sources. The goal is to steal sensitive data from the victim(s) (banking information, social identification, credentials, etc.) for various purposes (selling for monetary gain, performing identity thief, using as a lever for escalation attack). In 2022, the number of reported phishing attacks will reach a whopping 255 million cases, an increment of 61% compared to 2021. Existing methods of phishing URL detection have limitations. The article proposes a method to increase the accuracy of detecting malicious URL by using machine learning methods Linear Support Vector Classification and multinomial Naive Bayes with voting mechanisms.

Downloads

Download data is not yet available.

References

. What is URL phishing [Digital resource].– URL: https://surfshark.com/blog/what-is-url-phishing (access date: 15.12.2022).

. Charan A. N. S., Chen Y. H., Chen J. L. Phishing Websites Detection using Machine Learning with URL Analysis /2022 IEEE World Conference on Applied Intelligence and Computing (AIC).– IEEE, 2022.– P. 808-812.

. Uddin M. M. et al. A Comparative Analysis of Machine Learning-Based Website Phishing Detection Using URL Information //2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI).– IEEE, 2022.– P. 220-224.

. Sindhu S. et al. Phishing detection using random forest, SVM and neural network with backpropagation //2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE).– IEEE, 2020, – P. 391-394.

. Athulya A. A., Praveen K. Towards the detection of phishing attacks //2020 4th international conference on trends in electronics and informatics (ICOEI)(48184).– IEEE, 2020, – P. 337-343.

. Bouijij H., Berqia A. Machine learning algorithms evaluation for phishing URL classification //2021 4th International Symposium on Advanced

Electrical and Communication Technologies (ISAECT).– IEEE, 2021.– P. 01-05.

. Amen K., Zohdy M., Mahmoud M. Machine Learning for Multiple Stage Phishing URL Prediction //2021 International Conference on

Computational Science and Computational Intelligence (CSCI).– IEEE, 2021.– P. 794-800.

. Dr U. S., Patil A., Mohana M. Malicious URL Detection and Classification Analysis using Machine Learning Models //2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT).– IEEE, 2023.– P. 470-476.

. Phising and Benign Websites [Digital resource] ¬– URL: https://www.kaggle.com/datasets/peyamowar/phishing-and-benign- website (access date: 15.12.2022.).

. Phising Site URL [Digital resource].– URL: https://www.kaggle.com/datasets/taruntiwarihp/phishing-site-URL (access date 15.12.2022).

. Urlib.parse library [Digital resource].– URL: https://docs.python.org/3/library/urllib.parse.html (access date: 15.12.2022).

. Linear support vector classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.svm.LinearSVC.html (access date: 15.12.2022).

. Logistic regression [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.linear model.LogisticRegression.html (access date: 15.12.2022).

. Multinomial naive Bayes [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.naive bayes.MultinomialNB.html (access date: 15.12.2022).

. Decision tree classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

(access date: 15.12.2022).

. Random forest classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (access date: 15.12.2022)

. Voting classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html (access date: 15.12.2022).

. Thang, N. M., & Luong, T. T. (2022). Algorithm for detecting attacks on Web applications based on machine learning methods and attributes queries. Journal of Science and Technology on Information Security, 2(14), 26-34.

Downloads

Abstract views: 1378 / PDF downloads: 414

Published

2023-10-11

How to Cite

Thang, N. M., Anh, L. Q., Toan, H. S., & Trung, N. Q. (2023). A novel method for detecting URLs phishing using hybrid machine learning algorithm. Journal of Science and Technology on Information Security, 2(19), 15-28. https://doi.org/10.54654/isj.v2i19.978

Download Citation

Issue

No 2. CS (19) 2023

Section

Papers

License

Proposed Policy for Journals That Offer Open Access

Authors who publish with this journal agree to the following terms:

1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Proposed Policy for Journals That Offer Delayed Open Access

Authors who publish with this journal agree to the following terms:

1. Authors retain copyright and grant the journal right of first publication, with the work [SPECIFY PERIOD OF TIME] after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

A novel method for detecting URLs phishing using hybrid machine learning algorithm

Authors

DOI:

Keywords:

Tóm tắt

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Information

An toàn thông tin