LDA-Driven TextCNN with CNN2D Enhancement for Accurate Dark Web Service Classification
DOI:
https://doi.org/10.64751/Abstract
The Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks due to the challenges in tracking its users. This paper employs a deep learning algorithm to predict Dark Web services primarily used for attacks, enabling precautionary measures. To enhance prediction accuracy, we utilize Topic Modelling weights as training features, assigning higher weights to frequently co-occurring words. Traditional algorithms like TF-IDF, Document Matrix, and Latent Semantic Analysis often fail to exclude irrelevant data, which hampers machine learning accuracy. Our approach involves data collection dataset, preprocessing to clean the data, and applying Latent Dirichlet Allocation (LDA) to extract 90 topic weights. The LDA weights are then input into a proposed TextCNN algorithm for classification. We compare the performance of this algorithm against KNN, Random Forest, and others, achieving a prediction accuracy of 95%. A hybrid model utilizing features from TextCNN and trained with a CNN2D algorithm further enhances accuracy to 96%, showcasing the effectiveness of the proposed methods for Dark Web classification.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.







