A novel method for detecting URLs phishing using hybrid machine learning algorithm


  • Nguyen Manh Thang Academy of Cryptography Techniques
  • Le Quang Anh
  • Hua Song Toan
  • Nguyen Quoc Trung




URL, phishing, SVM, Naive Bayes, machine learning

Tóm tắt

Abstract— The phishing attack is the type of cyberattack that targets people’s trust by masking the malicious intent of the attack as communications from reputable sources. The goal is to steal sensitive data from the victim(s) (banking information, social identification, credentials, etc.) for various purposes (selling for monetary gain, performing identity thief, using as a lever for escalation attack). In 2022, the number of reported phishing attacks will reach a whopping 255 million cases, an increment of 61% compared to 2021. Existing methods of phishing URL detection have limitations. The article proposes a method to increase the accuracy of detecting malicious URL by using machine learning methods Linear Support Vector Classification and multinomial Naive Bayes with voting mechanisms.


Download data is not yet available.


. What is URL phishing [Digital resource].– URL: https://surfshark.com/blog/what-is-url-phishing (access date: 15.12.2022).

. Charan A. N. S., Chen Y. H., Chen J. L. Phishing Websites Detection using Machine Learning with URL Analysis /2022 IEEE World Conference on Applied Intelligence and Computing (AIC).– IEEE, 2022.– P. 808-812.

. Uddin M. M. et al. A Comparative Analysis of Machine Learning-Based Website Phishing Detection Using URL Information //2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI).– IEEE, 2022.– P. 220-224.

. Sindhu S. et al. Phishing detection using random forest, SVM and neural network with backpropagation //2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE).– IEEE, 2020, – P. 391-394.

. Athulya A. A., Praveen K. Towards the detection of phishing attacks //2020 4th international conference on trends in electronics and informatics (ICOEI)(48184).– IEEE, 2020, – P. 337-343.

. Bouijij H., Berqia A. Machine learning algorithms evaluation for phishing URL classification //2021 4th International Symposium on Advanced

Electrical and Communication Technologies (ISAECT).– IEEE, 2021.– P. 01-05.

. Amen K., Zohdy M., Mahmoud M. Machine Learning for Multiple Stage Phishing URL Prediction //2021 International Conference on

Computational Science and Computational Intelligence (CSCI).– IEEE, 2021.– P. 794-800.

. Dr U. S., Patil A., Mohana M. Malicious URL Detection and Classification Analysis using Machine Learning Models //2023 International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT).– IEEE, 2023.– P. 470-476.

. Phising and Benign Websites [Digital resource] ¬– URL: https://www.kaggle.com/datasets/peyamowar/phishing-and-benign- website (access date: 15.12.2022.).

. Phising Site URL [Digital resource].– URL: https://www.kaggle.com/datasets/taruntiwarihp/phishing-site-URL (access date 15.12.2022).

. Urlib.parse library [Digital resource].– URL: https://docs.python.org/3/library/urllib.parse.html (access date: 15.12.2022).

. Linear support vector classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.svm.LinearSVC.html (access date: 15.12.2022).

. Logistic regression [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.linear model.LogisticRegression.html (access date: 15.12.2022).

. Multinomial naive Bayes [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.naive bayes.MultinomialNB.html (access date: 15.12.2022).

. Decision tree classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

(access date: 15.12.2022).

. Random forest classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (access date: 15.12.2022)

. Voting classifier [Digital resource].– URL: https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html (access date: 15.12.2022).

. Thang, N. M., & Luong, T. T. (2022). Algorithm for detecting attacks on Web applications based on machine learning methods and attributes queries. Journal of Science and Technology on Information Security, 2(14), 26-34.


Abstract views: 421 / PDF downloads: 72



How to Cite

Thang, N. M., Anh, L. Q., Toan, H. S., & Trung, N. Q. (2023). A novel method for detecting URLs phishing using hybrid machine learning algorithm. Journal of Science and Technology on Information Security, 2(19), 15-28. https://doi.org/10.54654/isj.v2i19.978