This dataset came from Kaggle and titled US Health Insurance Dataset. In this dataset has health insurance premium charges and other variables related to life style and health. For example, bmi, number of children, region of care, etc. We chose this topic because of how unique healthcare prices are in the United States. We chose this dataset because the techniques used for exploring the data and modeling can be used on other health insurance datasets, beyond the scope of this project.
This is the final project of the UC Berkeley Stat 159/259 class taught in Spring 2023.
data
: contains the Kaggle dataset
figures
: contains the figures produced by eda.ipynb
instools
: functions created for eda and modeling section; also contains the tests for these functions
eda.ipynb
: Jupytern Notebook that does exploratory data analysis on the dataset. Plots show the relationship between all seven variables provided.
main.ipynb
: Main Jupyter Notebook that futher explains the data analysis process and significance. Also contains information on workflow and contribution.
modeling.ipynb
: Jupyter Notebook that expands on the analysis of the data set. Prediction and inference are used to predict a variety of relationships.
Our project employs the BSD 3-Clause License.