This repository includes analysis to predict the crime rates across 50 states using a wide array of machine learning and data science techniques, including, but not limited to:
- Principal Component Analysis
- Decision Trees
- Random Forest
- Linear and Logistic Regression
- Variable Selection (LASSO, Ridge, Step, etc.)
- Outlier Detection
The repository is organized as follows:
Folder | Description |
---|---|
Code | This section includes all of the data wrangling and predictive modeling R code used on the U.S. Crime dataset. |
Data | This section includes the main dataset used throughout the analyses. |
Viz | This section includes all of the main visualizations generated by the R Code in the Code section. |
Note: The data can be found in the data folder as well as at the following location: http://www.statsci.org/data/general/uscrime.txt
The description for the dataset can be found at the following location: http://www.statsci.org/data/general/uscrime.html