Skip to content

UCB-stat-159-s23/project-group16

Repository files navigation

Review Assignment Due Date

Predicting Health Insurance Charges: An Exploratory Analysis of Demographic and Lifestyle Factors

Group 16: Claire Mai, Avery Klauke, Gilberto Perezalonso, and Prasaan Guruprasad

Binder link: Binder

Data Description and Significance

This dataset came from Kaggle and titled US Health Insurance Dataset. In this dataset has health insurance premium charges and other variables related to life style and health. For example, bmi, number of children, region of care, etc. We chose this topic because of how unique healthcare prices are in the United States. We chose this dataset because the techniques used for exploring the data and modeling can be used on other health insurance datasets, beyond the scope of this project.

This is the final project of the UC Berkeley Stat 159/259 class taught in Spring 2023.

Repository Structure

data: contains the Kaggle dataset

figures: contains the figures produced by eda.ipynb

instools: functions created for eda and modeling section; also contains the tests for these functions

eda.ipynb: Jupytern Notebook that does exploratory data analysis on the dataset. Plots show the relationship between all seven variables provided.

main.ipynb: Main Jupyter Notebook that futher explains the data analysis process and significance. Also contains information on workflow and contribution.

modeling.ipynb: Jupyter Notebook that expands on the analysis of the data set. Prediction and inference are used to predict a variety of relationships.

License

Our project employs the BSD 3-Clause License.