Hostname: page-component-77c78cf97d-5vn5w Total loading time: 0 Render date: 2026-05-05T01:30:58.737Z Has data issue: false hasContentIssue false

Machine learning for detecting fake accounts and genetic algorithm-based feature selection

Published online by Cambridge University Press:  14 March 2024

Amine Sallah*
Affiliation:
Department of Computer Science, Faculty of Sciences and Techniques, Moulay Ismail University, Meknes, Morocco
El Arbi Abdellaoui Alaoui
Affiliation:
Department of Sciences, Ecole Normale Supérieure, Moulay Ismail University, Meknes, Morocco
Stéphane C.K. Tekouabou
Affiliation:
Research Laboratory in Computer Science and Educational Technologies (LITE), University of Yaoundé 1, Yaoundé, Cameroon Department of Computer Science and Educational Technologies (DITE), Higher Teacher Training College (HTTC), University of Yaoundé 1, Yaoundé, Cameroon
Said Agoujil
Affiliation:
Department of Sciences, École Nationale de Commerce et de Gestion, Moulay Ismail University, Meknes, Morocco
*
Corresponding author: Amine Sallah; Email: aminefste@gmail.com

Abstract

People rely extensively on online social networks (OSNs) in Africa, which aroused cyber attackers’ attention for various nefarious actions. This global trend has not spared African online communities, where the proliferation of OSNs has provided new opportunities and challenges. In Africa, as in many other regions, a burgeoning black-market industry has emerged, specializing in the creation and sale of fake accounts to serve various purposes, both malicious and deceptive. This paper aims to build a set of machine-learning models through feature selection algorithms to predict the fake account, increase performance, and reduce costs. The suggested approach is based on input data made up of features that describe the profiles being investigated. Our findings offer a thorough comparison of various algorithms. Furthermore, compared to machine learning without feature selection and Boruta, machine learning employing the suggested genetic algorithm-based feature selection offers a clear runtime advantage. The final prediction model achieves AUC values between 90% and 99.6%. The findings showed that the model based on the features chosen by the GA algorithm provides a reasonable prediction quality with a small number of input variables, less than 31% of the entire feature space, and therefore permits the accurate separation of fake from real users. Our results demonstrate exceptional predictive accuracy with a significant reduction in input variables using the genetic algorithm, reaffirming the effectiveness of our approach.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Proposed system detection.

Figure 1

Figure 2. Count plot of the number of instances of the partitions of the Facebook and Instagram dataset for each class.

Figure 2

Table 1. Features that characterize each user profile on Instagram

Figure 3

Figure 3. Heat map data and the correlation between different variables in Instagram.

Figure 4

Figure 4. Screenshot of Facebook dataset.

Figure 5

Figure 5. Heat map data and the correlation between different variables in Facebook.

Figure 6

Figure 6. Standard procedure of genetic algorithm.

Figure 7

Table 2. Hyperparameters used for implementing feature selection methods and classifiers

Figure 8

Table 3. Overall results of our experiment using full feature(selection)

Figure 9

Table 4. Ranking features by Boruta algorithm in Instagram

Figure 10

Table 5. Ranking features by Boruta algorithm in Facebook

Figure 11

Table 6. Overall results of our experiment using Boruta

Figure 12

Table 7. Selected features by genetic algorithm in Instagram

Figure 13

Table 8. Selected features by genetic algorithm in Facebook

Figure 14

Table 9. Overall results of our experiment using genetic algorithm

Submit a response

Comments

No Comments have been published for this article.