National Security and Strategic Planning

Национальная безопасность и стратегическое планирование

2307-1400

89512

10.37468/2307-1400-2024-2-13-24

Информационная безопасность

Information Security

Информационная безопасность

Using machine learning algorithms to recognize phishing resources

Использование алгоритмов машинного обучения для распознавания фишинговых ресурсов

Котиков

Никита Михайлович

Kotikov

Nikita M.

kotikov@mirea.ru

https://orcid.org/0000-0001-8788-4256

Максимова

Елена Александровна

Maksimova

Elena A.

maksimova@mirea.ru

доктор технических наук;

doctor of technical sciences;

Русаков

Алексей Михайлович

Rusakov

Alexey Mikhailovich

rusakov_a@mirea.ru

МИРЭА-Российский технологический университет Россия MIREA-Russian Technological University Russian Federation

МИРЭА-Российский технологический университет Москва Россия MIREA - Russian Technological University Moscow Russian Federation

Московский технический университет связи и информатики Россия Moscow technical university of communications and informatics Russian Federation

МИРЭА-Российский технологический университет Москва Россия MIREA-Russian Technological University Moscow Russian Federation

30 06 2024

2024 2 13 24 16 03 2024 23 06 2024

https://futurepubl.ru/en/nauka/article/89512/view

Целью работы является исследование популярных методов машинного обучения, применяемых для обеспечения безопасности информационных систем и их пользователей от фишинга. В настоящей статье рассматриваются актуальные технологии злоумышленников для проведения атак с использованием методов социальной инженерии, меры защиты, позволяющие обеспечить безопасность корпоративных пользователей, а также классификация методов обнаружения нелегитимных интернет-ресурсов с использованием технологий машинного обучения. В качестве существующих алгоритмов машинного обучения, позволяющих производить идентификацию опасных ресурсов в статье представлены: теорема Байеса, принцип классификатора, алгоритм k-ближайших соседей и логистическая регрессия, а также приведена статистическая информация в отношении частоты обнаружения популярных признаков фишинговых и зловредных ресурсов. По результатам исследования в статье обоснована необходимость использования комплексного подхода к обеспечению защиты инфраструктуры с учетом многовекторного анализа как достаточно востребованного как в теоретическом, таки в практическом плане.

The purpose of the work is to study popular machine learning methods used to ensure the security of information systems and their users from phishing. This article discusses the current technologies of intruders to carry out attacks using social engineering methods, security measures to ensure the security of corporate users, as well as the classification of methods for detecting illegitimate Internet resources using machine learning technologies. As existing machine learning algorithms that allow the identification of dangerous resources, the article presents: Bayes' theorem, the classifier principle, the k-nearest neighbor algorithm and logistic regression, as well as statistical information on the frequency of detection of popular signs of phishing and malicious resources. The article concludes that an integrated approach to ensuring infrastructure protection, taking into account a multi-vector analysis.

классификация фишинг информационная безопасность машинное обучение

classification phishing information security machine learning

Импортозамещение на рынке информационной безопасности [Электронный ресурс]. – Режим доступа: https://habr.com/ru/articles/676664/

Import substitution in the information security market [Electronic resource]. URL: https://habr.com/ru/articles/676664/

Угрозовая активность: как связаны хакерские атаки на сеть «Верный» и СДЭК [Электронный ресурс]. – Режим доступа: https://iz.ru/1706284/ivan-chernousov-valerii-kodachigov-evgeniia-pertceva/ugrozovaia-aktivnost-kak-sviazany-khakerskie-ataki-na-set-vernyi-i-sdek

Criminal activity: how are hacker attacks on the Verny network and SDEK related? [Electronic resource]. URL: https://iz.ru/1706284/ivan-chernousov-valerii-kodachigov-evgeniia-pertceva/ugrozovaia-aktivnost-kak-sviazany-khakerskie-ataki-na-set-vernyi-i-sdek

Хакеры открывают сезон распродаж [Электронный ресурс]. – Режим доступа: https://www.kommersant.ru/doc/4548082

Hackers open the sales season [Electronic resource]. URL: https://www.kommersant.ru/doc/4548082

Gophish – фреймворк для фишинга. Как писать фейковые письма и обманывать своих сотрудников [Электронный ресурс]. – Режим доступа: https://xakep.ru/2016/12/07/gophish-phishing-framework-howto/

Gophish is a phishing framework. How to write fake emails and deceive your employees [Electronic resource]. URL: https://xakep.ru/2016/12/07/gophish-phishing-framework-howto/

Сбербанк создал flash-игру для сотрудников после фишинговых «писем Грефа» [Электронный ресурс]. – Режим доступа: https://www.rbc.ru/technology_and_media/15/02/2017/58a430e69a79472ba6d0aаd?from=newsfeed

Sberbank has created a flash game for employees after phishing "Gref letters" [Electronic resource]. URL: https://www.rbc.ru/technology_and_media/15/02/2017/58a430e69a79472baa6d0ad?from=newsfeed/

Наивный алгоритм Байеса в машинном [Электронный ресурс]. – Режим доступа: https://www.guru99.com/ru/naive-bayes-classifiers.html

Naive Bayes algorithm in machine learning [Electronic resource]. URL: https://www.guru99.com/ru/naive-bayes-classifiers.html

Метод k-ближайших соседей (k-nearest neighbour) [Электронный ресурс]. – Режим доступа: https://proglib.io/p/metod-k-blizhayshih-sosedey-k-nearest-neighbour-2021-07-19

The k-nearest neighbor method (k-nearest neighbour) [Electronic resource]. URL: https://proglib.io/p/metod-k-blizhayshih-sosedey-k-nearest-neighbour-2021-07-19

Logistic Regression in Machine Learning [Электронный ресурс]. – Режим доступа: https://www.geeksforgeeks.org/understanding-logistic-regression/

Logistic Regression in Machine Learning [Electronic resource]. URL: https://www.geeksforgeeks.org/understanding-logistic-regression/

Punycode [Электронный ресурс]. – Режим доступа: https://ru.wikipedia.org/wiki/Punycode/

Punycode [Electronic resource]. URL: https://ru.wikipedia.org/wiki/Punycode/

10.

Phishing with Unicode Domains [Электронный ресурс]. – Режим доступа: https://www.xudongz.com/blog/2017/idn-phishing/

Phishing with Unicode Domains [Electronic resource]. URL: https://www.xudongz.com/blog/2017/idn-phishing/

11.

Rao R.S., Pais A.R. Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach // Journal of Ambient Intelligence and Humanized Computing. – 2020. – V. 11. – No. 9. – P. 3853-3872. DOI: https://doi.org/10.1007/s12652-019-01637-z

12.

Nagaraj K., Bhattacharjee B., Sridhar A., Sharvani G.S. Detection of phishing websites using a novel twofold ensemble model // Journal of Systems and Information Technology. – 2018. – V. 20. – No 3. – P. 321-357. DOI: https://doi.org/10.1108/JSIT-09-2017-0074

13.

Sönmez Y., Tuncer T., Gökal H., Avci E. Phishing web sites features classification based on extreme learning machine // 2018 6th International Symposium on Digital Forensic and Security (ISDFS). – 2018. – P. 1-5. DOI: https://doi.org/10.1109/ISDFS.2018.8355342

14.

Zamir A., Khan H.U., Iqbal T., Yousaf N., Aslam F., Anjum A., Hamdani M. Phishing web site detection using diverse machine learning algorithms // The Electronic Library. – 2020. – V. 38. – No 1. – С. 65-80. DOI: https://doi.org/10.1108/EL-05-2019-0118

Zamir A., Khan H.U., Iqbal T., Yousaf N., Aslam F., Anjum A., Hamdani M. Phishing web site detection using diverse machine learning algorithms // The Electronic Library. – 2020. – V. 38. – No 1. - P. 65-80. DOI: https://doi.org/10.1108/EL-05-2019-0118

15.

Sonowal G., Kuppusamy K.S. PhiDMA - A Phishing Detection Model with Multi-filter Approach // Journal of King Saud University-Computer and Information Sciences. – 2020. – V. 32. – No. 1. – P. 99-112. DOI: https://doi.org/10.1016/j.jksuci.2017.07.005

16.

Purwanto R., Paly A., Blair A., Jha S. PhishZip: A New Compression-based Algorithm for Detecting Phishing Websites // 2020 IEEE Conference on Communications and Network Security (CNS). – IEEE, 2020. – P. 1-9. DOI: https://doi.org/10.1109/CNS48642.2020.9162211