INFO7374 Data Analysis Using Python - Final Project

Homicide Data Analysis (Final)

I am using the pattern 1 of the project where I have downloaded the CSV files and performed 5 analysis on Homiside dataset. Also combined weapon registration, Population, Weather(Temaparature) these 3 data sets with the main himicide data set

Data Used:

For this analysis used Homicide dataset in .CSV format as main dataset you can see the files here Filename Database and database1 are main dataset files
Also used Weapons Registration dataset for analysis 1 statewise you can find the data Here. Weapons data i have got from National Instant Criminal Background Check System (NICS)
For analysis 4 used wheather dataset to combine with homicide dataset to see how homicide rate gets effected with change of wheather you can find wheather data Here. Wheather and temparature data was downloaded from https://www.ncdc.noaa.gov/
For analysis 3 used population data statewise to combine with homicide data you can find population data here. Population data is fetched from https://catalog.data.gov/dataset?res_format=CSV
used relative path for storing and reading the files of data

Purpose for doing analysis on homicide:

Nowadays homicide crimes in USA has increased. Wanted to find out factors that contribute in those crimes.

Analysis Performed:

Analysis 1 :

Preferred weapon for homicide and genderwise weapon of choice. Why that weapon is used statewise breakdown

Steps:

The data files containing homicide and Weapons Registration data is read in to a dataframe
Filter the date get only entries where crime is solved. So that we know which weapon is used. Also filter wntries where gender is UNKNOWN
For each weapon count the number of incidents and plot the grapth .
used seaborn bargrapg to plot the graph
Split the data based on gender and count the number of incidents group by Weapon for male and female separately ans plot the grapth
split the seaborn figure in 2 and plotted the male female distribution
read the second data set in dataframe
From second dataset get Avg number of Handgun and LongGun Issued by each state . Because those are the 2 most used weapon for crime.
Plot the graph of count of gun lisence issued by each state. The data is present for 20 years
For plotting the graph used seaborn liabrary and splitted the figure in 2 subplots to show count fo each gun type
Also used ploly for plotting the male female distribution stacked graph
get the count of each weapon used grouped by state
From main dataset plot the heatmap for statewise weapon used.
saved the figure in specifiied folder using plt.savefig(path)

Output:

Output Files

Plots:

conclusion:

That Higher Rates of Gun Ownership Lead to Higher Rates of Violent Crime Rifle Association and other gun-rights proponents, who have steadfastly pushed the idea that a society with more guns leads to less crime, and that “the only way to stop a bad guy with a gun is a good guy with a gun.”
shows that gun ownership is more often a catalyst than a deterrent to crime.
As seen in graphs California, Texas issues most license and the murder count in those states are more
More number of registration issued the crime is increasing. So with that I can conclude that the gun which is given for the safety and self defence is being used for commiting crimes.

Analysis 2:

Factors Affecting solving murder crimes.

Steps:

The data files containing homicide and Weapons Registration data is read in to a dataframe
for reading the and creating dataframe used pd.read_csv(filepath)
get the unique value in perpatrator column
Count the number of cases solved for each race known and count the number of cases unsolved
Count the number of cases solved and unsolved if the race is UNKNOWN
plot the graph for Perpatrator race vs number of cases
Follow same procedure for Victim sex and Relationship
Plot the graphs for both
used seaborn liabrary to plot the graphs
Used random forest algorithm to check wheather the factors I am getting with graph match with feature ranking of algorithm

output:

Output Files

Plot

Algorithm Result:

Conclusion:

Based on the result in the graph Most valuable features above 15%: Perpetrator Sex, Perpetrator Age, Perpetrator Race,Relationship
I have also plotted the the gapth for victim race but that factor is not making significance difference. Even if the victim race is known there are several cases that has gone unsolved
With perpatrator race known most of the cases are solved in that case
With relationship is also same sinario . Most of cases go unsolved if relationship between victim and perpatrator is unknown
With this analysis I conclude that Perpatrator race, Relation are two most important factors to solve the case.
Even with relationship if the relationship is stranger then also cases are solved but if its unknown then the rate of solving decreases
The results that I am getting from random forest are same that I have got in the graph .

Analysis 3:

most safe and Unsafe state in united states based on rate of homicide.

conclusion based on (Number of crimes per 10,000 people for each state)

Steps:

First downloaded and cleaned data for population for each state and calculate avg of population for each state
Read the data from both the dataset. 1) Homocide dataset and 2) Population Dataset
Load the data in panad dataframe
Calculate the number of cases grouped by state
Then calculate the ave population per state
sorth the dataframe to get top and bottom values
combine both dataset in to single frame and plot the graph
used basemap to generate the heatmap on USA geographical structure
for using the basemap first installed it using conda install -c conda-forge basemap=1.0.8.dev0
For plotting the bargraph used seaborn liabrary
saved figures using plt.savefig(path)

Output

Output Files

Plot

Conclusion:

Two of most unsafe states where crime rates are high are D.C. and Louisiana
North Dakota and New Hampshire are the 2 safe states according to crime rate
-The FBI's crime report for 2012 found nearly 68% of all homicides in America involved a firearm, and Louisiana fiercely protects the right to bear arms. The state passed an amendment in November making gun ownership a "fundamental right" like free speech and making it extremely difficult to pass laws that step on that right. Louisiana also passed a law recently that lets its citizens apply for concealed carry handgun permits that last their entire lifetimes. Louisianans who want to walk around and openly carry their guns don't need a permit at all under the state's open carry law.
This is the link for above statastics http://www.businessinsider.com/why-is-the-murder-rate-high-in-louisiana-2013-9
Hence because of the very low regulation the homicide rate is high in Louisiana.

Analysis 4 :

Crime rate with change of wheather.

How whather is affecting crime in United States

Combined Wheather data with Homicide data to see the pattern and also Used state abbrivation data to match the statenames

Steps:

First collected the data for wheather for the link mentioned above
In that data the teparature for state is given monthwise
Converted the interger number month column in to word format
The state names mentioned are in abbrivated format in temp data
downloaded and used a US STATE and there abbrivation CSV file to merge with temparature data so that I have statename in full format now
calculated the avg temp for each state the temparature is splited monthwise because we wanted to see the pattern when wheather changes
The calculate the number of incidents for each state monthwise
For that counted the incidents grouped by month and state
Merged both data based on state and month . Statenames are matched using the abbrivation file used
Then splitted the data in separate datafrome statewise
Each state will have 12 months of data in a dataframe
plotted a line chart for each state to see the progression in number of crimes with change in wheather

Output

Output Files

Plots:

conclusion:

As seen in the plots with california,Texas,New york all show increase in crime rate with increase in temparature
Louisiana which has hieghest crime rate in US (According to findings in previous analysis) surprisingly dosent show any pattern.
hot temperatures increase irritability, which in turn increases aggressive behavior, including violent crime.
"More people out - more crime, less people out - less crime."

Analysis 5:

Homicide distribution according to relationship in USA

Age distribution of each relationship category

Steps:

First read the data from CSV file in to a dataframe
read all the relationship between victim and perpatrator
categorize the relationship in criterial like male oartner, parents, childrens
create a new column in dataframe and all the relationship category values in to that column
Count the number of incidents for each category
get the total incidents from data frame
calculate the pecentage for each category
plot the pei chart for percentage
then get the age group involved in each category
plot a scotter plot to show density of age group.

Output:

Output Files

plots:

conclusion:

AS seen in the pie chart most of the relationships almost half are strangers
2nd most troubled group of relationship is Aquaitence
Most of homicide crimes are must be serial killers or gang wars hence relation category is stranger
By above graph it is also observed that homicide crimes based on some motive like most of partner or parent category crimes have motive or personal agenda . These crimes are less as compaired to other categories.
Scatter plot shows the age group involved in each categoty
There is a pattern for each category like in neighbour category there is no significant grouping. Whereas in sibling category we will find that maximum grouping is around age 20yrs to 40 yrs

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Final		Final
README.md		README.md

priyalchaudhari/Final_Priyal_chaudhari_homicide_analysis

Folders and files

Latest commit

History

Repository files navigation

INFO7374 Data Analysis Using Python - Final Project

Homicide Data Analysis (Final)

I am using the pattern 1 of the project where I have downloaded the CSV files and performed 5 analysis on Homiside dataset. Also combined weapon registration, Population, Weather(Temaparature) these 3 data sets with the main himicide data set

Data Used:

Purpose for doing analysis on homicide:

Analysis Performed:

Analysis 1 :

Preferred weapon for homicide and genderwise weapon of choice. Why that weapon is used statewise breakdown

Steps:

Output:

Plots:

conclusion:

Analysis 2:

Factors Affecting solving murder crimes.

Steps:

output:

Plot

Algorithm Result:

Conclusion:

Analysis 3:

most safe and Unsafe state in united states based on rate of homicide.

conclusion based on (Number of crimes per 10,000 people for each state)

Steps:

Output

Plot

Conclusion:

Analysis 4 :

Crime rate with change of wheather.

How whather is affecting crime in United States

Combined Wheather data with Homicide data to see the pattern and also Used state abbrivation data to match the statenames

Steps:

Output

Plots:

conclusion:

Analysis 5:

Homicide distribution according to relationship in USA

Age distribution of each relationship category

Steps:

Output:

plots:

conclusion:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages