Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset
dc.contributor.author | สุระสิทธิ์ ทรงม้า | |
dc.contributor.author | ธีระ สาธุพันธ์ | |
dc.contributor.author | ธนากร ปามุทา | |
dc.contributor.author | surasit songma | |
dc.contributor.author | Theera Sathuphan | |
dc.contributor.author | Thanakorn Pamutha | |
dc.date.accessioned | 2025-01-27T04:18:59Z | |
dc.date.available | 2025-01-27T04:18:59Z | |
dc.date.issued | 2023-11-24 | |
dc.description | This article examines intrusion detection systems in depth using the CSE-CIC-IDS-2018 dataset. The investigation is divided into three stages: to begin, data cleaning, exploratory data analysis, and data normalization procedures (min-max and Z-score) are used to prepare data for use with various classifiers; second, in order to improve processing speed and reduce model complexity, a combination of principal component analysis (PCA) and random forest (RF) is used to reduce non-significant features by comparing them to the full dataset; finally, machine learning methods (XGBoost, CART, DT, KNN, MLP, RF, LR, and Bayes) are applied to specific features and preprocessing procedures, with the XGBoost, DT, and RF models outperforming the others in terms of both ROC values and CPU runtime. The evaluation concludes with the discovery of an optimal set, which includes PCA and RF feature selection. | |
dc.description.abstract | This article examines intrusion detection systems in depth using the CSE-CIC-IDS-2018 dataset. The investigation is divided into three stages: to begin, data cleaning, exploratory data analysis, and data normalization procedures (min-max and Z-score) are used to prepare data for use with various classifiers; second, in order to improve processing speed and reduce model complexity, a combination of principal component analysis (PCA) and random forest (RF) is used to reduce non-significant features by comparing them to the full dataset; finally, machine learning methods (XGBoost, CART, DT, KNN, MLP, RF, LR, and Bayes) are applied to specific features and preprocessing procedures, with the XGBoost, DT, and RF models outperforming the others in terms of both ROC values and CPU runtime. The evaluation concludes with the discovery of an optimal set, which includes PCA and RF feature selection. | |
dc.identifier.citation | Songma, S.; Sathuphan, T.; Pamutha, T. Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset. Computers 2023, 12, 245. https://doi.org/10.3390/computers12120245 | |
dc.identifier.uri | https://repository.dusit.ac.th//handle/123456789/2390 | |
dc.language.iso | en | |
dc.publisher | Computers | |
dc.subject | intrusion detection system | |
dc.subject | machine learning techniques | |
dc.subject | exploratory data analysis | |
dc.subject | cyber security | |
dc.title | Optimizing Intrusion Detection Systems in Three Phases on the CSE-CIC-IDS-2018 Dataset | |
dc.type | Article | |
mods.location.url | https://www.mdpi.com/2574372 |