Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset

dc.contributor.authorสุระสิทธิ์ ทรงม้า
dc.contributor.authorธีระ สาธุพันธ์
dc.contributor.authorธนากร ปามุทา
dc.contributor.authorsurasit songma
dc.contributor.authorTheera Sathuphan
dc.contributor.authorThanakorn Pamutha
dc.date.accessioned2025-01-27T04:18:59Z
dc.date.available2025-01-27T04:18:59Z
dc.date.issued2023-11-24
dc.descriptionThis article examines intrusion detection systems in depth using the CSE-CIC-IDS-2018 dataset. The investigation is divided into three stages: to begin, data cleaning, exploratory data analysis, and data normalization procedures (min-max and Z-score) are used to prepare data for use with various classifiers; second, in order to improve processing speed and reduce model complexity, a combination of principal component analysis (PCA) and random forest (RF) is used to reduce non-significant features by comparing them to the full dataset; finally, machine learning methods (XGBoost, CART, DT, KNN, MLP, RF, LR, and Bayes) are applied to specific features and preprocessing procedures, with the XGBoost, DT, and RF models outperforming the others in terms of both ROC values and CPU runtime. The evaluation concludes with the discovery of an optimal set, which includes PCA and RF feature selection.
dc.description.abstractThis article examines intrusion detection systems in depth using the CSE-CIC-IDS-2018 dataset. The investigation is divided into three stages: to begin, data cleaning, exploratory data analysis, and data normalization procedures (min-max and Z-score) are used to prepare data for use with various classifiers; second, in order to improve processing speed and reduce model complexity, a combination of principal component analysis (PCA) and random forest (RF) is used to reduce non-significant features by comparing them to the full dataset; finally, machine learning methods (XGBoost, CART, DT, KNN, MLP, RF, LR, and Bayes) are applied to specific features and preprocessing procedures, with the XGBoost, DT, and RF models outperforming the others in terms of both ROC values and CPU runtime. The evaluation concludes with the discovery of an optimal set, which includes PCA and RF feature selection.
dc.identifier.citationSongma, S.; Sathuphan, T.; Pamutha, T. Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset. Computers 2023, 12, 245. https://doi.org/10.3390/computers12120245
dc.identifier.urihttps://repository.dusit.ac.th//handle/123456789/2390
dc.language.isoen
dc.publisherComputers
dc.subjectintrusion detection system
dc.subjectmachine learning techniques
dc.subjectexploratory data analysis
dc.subjectcyber security
dc.titleOptimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset
dc.typeArticle
mods.location.urlhttps://www.mdpi.com/2574372
Files
Collections