Skip to content

SrivathsanP23/Classification-of-Microsoft-Cybersecurity-Incidents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 

Repository files navigation

🧿🔥 Classification of Microsoft's CyberSecurity Threats 🔥🧿

#### 🚀 Classification done using RandomForestClassifier

Dataset Link

🧑‍💻 - Data

🎯 Project Overview:

🧑‍💻 This project, Classification of Microsoft Cybersecurity Incidents , aims to classify cybersecurity incidents using machine learning. A Random Forest model was trained to predict incident severity based on features, achieving 94% accuracy during validation and 88% on the test dataset. The model is deployed for real-time predictions using an interactive Jupyter Notebook.

Steps That I Followed :

  1. Data Preprocessing :

➡️Loading the Dataset: Loaded the dataset from the source and explored its structure.

➡️Data Cleaning: Handled missing values by using appropriate imputation techniques and removed irrelevant columns to enhance model performance.

➡️Feature Encoding: Applied OneHotEncoding to categorical features to convert them into a suitable format for model training.

➡️Feature Scaling: Scaled numerical features using StandardScaler through a ColumnTransformer to ensure that all features contributed equally to the model.

  1. Exploratory Data Analysis (EDA):

Statistical Testing:

➡️F-test: Conducted F-tests for categorical columns to assess the relationship between categorical features and the target variable. This helped in understanding which categorical features had significant effects on the severity of cybersecurity incidents.

➡️Z-test: Performed Z-tests for numerical columns to determine if the means of the different groups (based on the target variable) were statistically significantly different. This analysis guided feature selection and helped identify impactful numerical features.

  1. Model Training:

➡️Random Forest Classifier: Trained a Random Forest classifier on the preprocessed data to leverage its ensemble learning capabilities 🚀.

  1. Model Evaluation:

➡️Validation: Evaluated the model on a validation set, achieving 94% accuracy on the training data and 88% accuracy on the test dataset. Metrics such as precision, recall, and F1-score were also calculated to assess the model's effectiveness.