-
This repository contains a data science project focused on understanding the key factors influencing US home prices over the last 20 years.
-
The project utilizes the S&P Case-Schiller Home Price Index as a proxy for home prices and explores various economic indicators to build a predictive model.
-
The chosen model is a Random Forest Regressor, achieving a remarkable R2 score of 99.87%
-
The objective is to identify and analyze factors that significantly impact home prices in the United States.
-
By leveraging publicly available data, we aim to build a robust predictive model that explains the variations in the S&P Case-Schiller Home Price Index over the past two decades.
The following features were selected for the predictive model:
- Median market value of all homes.
- Reflects overall trends in home values, providing insights into market conditions and price movements.
- Reflects the average cost of newly constructed homes.
- Influences perceptions of the affordability of new housing.
- Provides insights into the typical price of newly sold houses.
- Helps understand the distribution of new home prices.
- Reflects the level of investment in residential construction.
- Influences housing supply, potentially affecting home prices.
- Influences economic conditions and affects consumer confidence in buying houses.
- Indicates inflation in housing costs, potentially impacting home prices.
- Population growth influences housing demand, potentially affecting home prices.
- Property taxes influence the overall cost of living.
- Authorized housing units can influence home prices and increase demand.
- Reflects the balance between housing supply and demand.
- Provides insights about a fixed interest rate for 30 years that buyers can afford for a loan.
- Provides insights into consumer sentiment about the economy and the housing market.
- Influences job security; more job security improves purchasing ability.
- Provides insights into vacant unit availability, market conditions, and supply and demand balance.
- Changing in the federal reserve; adjusts the federal funds rate may influence the mortgage rate.
- Utilized machine learning models to impute missing values in the dataset.
- Conducted thorough EDA to understand the relationships between features and the target variable.
- Engineered relevant features to improve model performance.
- Chose the Random Forest Regressor based on its outstanding R2 score of 99.87%.
- Utilized R2 score, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE) for model evaluation.
- After training the Random Forest Regressor model, the feature importance analysis has done to reveal the predictor variables that contributes more when predicting target variable.
-
Programming Langauge : Python
-
Code Notebook : Google Colab-notebook
-
Data Collection: FRED (Federal Reserve Economic Data)
-
Data Cleaning: Machine Learning Imputation
-
Exploratory Data Analysis: Pandas, Matplotlib, Seaborn
-
Model Building: Scikit-Learn (Random Forest Regressor)
-
Model Evaluation: R2 Score, MAE, RMSE
- Model Fit (R2 Score): 99.87%
- MAE (Mean Absolute Error): 1.39
- RMSE (Root Mean Squared Error): 2.37
- The Random Forest Regressor demonstrated superior performance in minimizing errors and capturing the variance in the target variable compared to other models