This capstone project involves performing linear regression analysis on the Boston housing dataset (boston.csv
). The goal is to build a predictive model for housing prices using independent variables such as the average number of rooms (rm
) and the proportion of the population with a low status (lstat
). The project includes data cleaning, exploratory data analysis (EDA), model building, and evaluation.
-
Clone the Repository
- Clone this repository to your local machine:
git clone https://github.com/vswapna3202/L2T14.git
- Clone this repository to your local machine:
-
Navigate to the Project's Folder
- Navigate to the folder containing the project:
cd L2T14 or cd <your-task-folder>
- Navigate to the folder containing the project:
-
Install Dependencies
- Ensure you have Python installed on your system. You can download it from the official Python website.
- Install Jupyter Notebook to run the provided notebook:
pip install notebook
- Install necessary libraries such as
pandas
,numpy
,matplotlib
, andscikit-learn
:pip install pandas numpy matplotlib scikit-learn
-
Reading and Preparing the Data
- Read the
boston.csv
file into the Jupyter notebookCapstone Linear Regression.ipynb
. - Clean and prepare the dataset if necessary.
- Read the
-
Exploratory Data Analysis (EDA)
- Perform EDA to gain insights into the dataset.
- Visualize the distributions of the dependent variable and independent variables.
- Identify any patterns or trends in the data.
-
Linear Regression Model
- Use the following independent variables:
rm
(average number of rooms) andlstat
(proportion of the population with low status). - Split the dataset into the independent variables and the single dependent variable.
- Generate plots to explore the relationships between the independent variables and the dependent variable.
- Use the following independent variables:
-
Model Building and Evaluation
- Split the data into training and test sets using a split ratio of 75:25.
- Build a multiple linear regression model using the training set with all the independent variables.
- Print out the intercept and coefficients of the trained model.
- Generate predictions for the test set.
- Evaluate the model's performance by computing the mean squared error (MSE) or root mean squared error (RMSE) on the test set.
- Generate an error plot to visualize the differences between the predicted and actual values in the test set.
- Print the coefficients and interpret them within the context of the median value prediction.
-
Summary
- Summarize the findings from the analysis, including insights from the exploratory data analysis, model performance, and any notable observations within the notebook.
- Start Jupyter Notebook:
jupyter notebook
- Open the Jupyter notebook
Capstone Linear Regression.ipynb
.
boston.csv
: Contains data on various attributes of houses in Boston, including features like average number of rooms (rm
), proportion of population with low status (lstat
), and median value of owner-occupied homes (medv
). Source