This data analytics project offers strategic insights and actionable recommendations for real estate investments in King County, USA. Through a comprehensive analysis of housing data, we have identified key factors that influence property prices, including location, property size, condition, and timing. Leveraging these insights, we have developed targeted strategies for our client, William Rodriguez, who seeks to optimize investment by purchasing two properties for himself and his wife: one in an urban area and one in the countryside.
For the city property, William’s priority is a centrally located, ready-to-move-in home. For the countryside property, he is targeting a non-renovated home with the goal of purchasing at the optimal time.
(Note: William Rodriguez is a fictional client; any resemblance to actual persons is purely coincidental.)
Our key findings suggest that purchasing smaller, well-maintained homes in centrally located city areas, and non-renovated properties in desirable suburban areas during the winter, can lead to significant cost savings.
We recommend setting aside a total budget of $665,000 for the two properties, with additional funds allocated for renovating the countryside home. These recommendations have been validated through analysis of average house prices across various zip codes, demonstrating their effectiveness across different regions.
- Introduction
- Project Structure
- Installation
- Usage
- Results and Insights
- Final Recommendations
- License
- Acknowledgments
The King County Housing Data Analysis project is designed to assist in making informed real estate investment decisions. Through detailed analysis of housing data, the project identifies key factors that influence property prices and availability, providing actionable recommendations tailored to specific investment strategies.
This project is particularly useful for real estate investors, data analysts, and professionals in the housing market who seek to leverage data-driven insights for better decision-making.
The project is organized into the following Jupyter notebooks:
- 01_data_collection.ipynb: Collects and compiles the necessary dataset for the project.
- 02_data_preparation.ipynb: Prepares the data by cleaning and transforming it into a suitable format for analysis.
- 03_exploratory_data_analysis.ipynb: Conducts exploratory data analysis to uncover trends and patterns in the data.
- 04_hypothesis_analysis_and_insights.ipynb: Tests hypotheses and answers key business questions related to real estate investment.
- 05_final_recommendations_and_strategy.ipynb: Summarizes the findings and provides final recommendations for the client.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/johannesgooth/da-king-county-housing.git cd king-county-housing-analysis
-
Create and activate a virtual environment (optional but recommended):
python -m venv myenv source myenv/bin/activate # On Windows: myenv\Scripts\activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Launch Jupyter Notebook:
jupyter notebook
- Data Collection: Start with
01_data_collection.ipynb
to gather the necessary data from the specified sources. - Data Cleaning and Preprocessing: Use
02_data_cleaning_and_preprocessing.ipynb
to clean and prepare the data. - Exploratory Data Analysis: Run
03_exploratory_data_analysis.ipynb
to explore the data and uncover insights. - Hypothesis Testing: Test key hypotheses using
04_hypothesis_testing_and_insights.ipynb
. - Final Recommendations: Review the final strategic recommendations in
05_final_Recommendations_and_strategy.ipynb
.
A Streamlit dashboard can be run using the provided app.py
file to visualize and interact with the analysis:
streamlit run app.py
The analysis provides key insights into the King County housing market, including:
- The impact of location on property prices.
- The correlation between property size and value.
- The effect of property condition and renovation on market prices.
- Optimal timing for purchasing properties in different areas.
These insights are used to formulate strategic recommendations that can guide real estate investments.
Based on the analysis, we recommend setting aside a budget of $665,000 for the purchase of two properties, one in the city and one in the countryside, with additional funds for renovation costs. These recommendations are validated against average house prices across various zip codes and are shown to perform exceptionally well in all areas.
Comprehensive testing has been implemented to ensure the reliability and accuracy of data processing and model predictions. The tests/
directory contains unit and integration tests that validate the functionality of various components within the project. These tests help in maintaining code quality and facilitate future enhancements.
To run the tests, navigate to the project directory and execute:
pytest tests/
Ensure that you have pytest
installed, which can be added to your requirements.txt
or installed separately:
pip install pytest
This project is licensed under the MIT License.
Special thanks to NeueFische GmbH for providing the dataset hosted on their AWS server, and to the contributors and the open-source community for their invaluable tools and resources, which made this project possible.