The Bank Indessa has not done well in the last 3 quarters. Their NPAs (Non Performing Assets) have reached all time high. It is starting to lose the confidence of its investors. As a result, it’s stock has fallen by 20% in the previous quarter alone.
After careful analysis, it was found that the majority of NPA was contributed by loan defaulters. With the messy data collected over all the years, this bank has decided to use machine learning to figure out a way to find these defaulters and devise a plan to reduce them.
This bank uses a pool of investors to sanction their loans. For example: If any customer has applied for a loan of $20000, along with the bank, the investors perform due diligence on the requested loan application. Keep this in mind while understanding data. In this challenge, you will help this bank by predicting the probability that a member will default.
Download the dataset from the following link:
- Import data and keep the relevant columns
- Data transformation
- Impute missing data
- Create new features using existing data
- Visualize the preprocessed data
- Split - train and test validation sets
- Build model
- Predict, Accuracy, ROC-AUC
- Analyze the results using Confusion matrix
- Predict the test set and save results
[3] https://www.projectmaths.ie/documents/modulars/4/FinancialMathsExtraQuestions.pdf
[4] Scikit Learn Python