title	author	output
Most Harmful Categories of Severe Weather Events in the US	Andrea Schioppa	html_document

Synopsis

In this report we investigate which severe weather events are most harmful to the population and which have the greatest economic consequences. Our analysis is based on data obtained from the US National Oceanic and Atmospheric Administration. From this data we find that the 5 categories of events with most human fatalities are: tornado, excessive heat, heat, flash flood and lightning. The 5 categories of events with most injured subjects are: tornado, thunderstorm wind, flood, excessive heat and lightning. The 5 categories of events which are most harmful to property are: flood, hurricane/typhoon, tornado, coastal flood and flash flood. The 5 categories of events most detrimental to agriculture are: drought, flood, hurricane/typhoon, ice and hail.

Data Processing

We download the data:

file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url, destfile = "StormData.bz2", method="curl")

and use bunzip2 (on the shell) to decompress the file. We then load libraries:

library(dplyr)
library(lubridate)
library(xtable)
library(ggplot2)

We report the Sys.info()

info_string <- Sys.info()
cat(paste(paste("sysname:", info_string["sysname"]), 
           paste("release:",info_string["release"]), 
            paste("machine:", info_string["machine"]), sep="\n"))

## sysname: Darwin
## release: 14.4.0
## machine: x86_64

R.version

##                _                           
## platform       x86_64-apple-darwin13.4.0   
## arch           x86_64                      
## os             darwin13.4.0                
## system         x86_64, darwin13.4.0        
## status                                     
## major          3                           
## minor          2.0                         
## year           2015                        
## month          04                          
## day            16                          
## svn rev        68180                       
## language       R                           
## version.string R version 3.2.0 (2015-04-16)
## nickname       Full of Ingredients

and then read the data in R; note that we filter in cleaning_data only the columns we are interested in and use lubridate to properly format the dates.

storm_data <- read.csv("storm_data.csv", header = TRUE, stringsAsFactors=FALSE)
names(storm_data)<-tolower(names(storm_data))
storm_data$evtype <- tolower(storm_data$evtype)
cleaning_data <- select(storm_data, evtype, bgn_date, end_date, fatalities, injuries, propdmg,
                        propdmgexp, cropdmg, cropdmgexp, state)
cleaning_data$bgn_date <- mdy(sub(patter = " 0:00:00", replacement = "", cleaning_data$bgn_date))
cleaning_data$end_date <- mdy(sub(patter = " 0:00:00", replacement = "", cleaning_data$end_date))

The data frame cleaning_data denotes the data that will be cleaned. We need to clean the column evtype, each of whose entries should take value in one of the "main categories" of severe weather events in Section 7 of the Storm Data Documentation. However, we have chosen to keep some additional categories. The details of the cleaning are discussed in the Section R Code. From now on we assume that cleaning_data has been cleaned.

To compute the values of property and crop damage, we need to convert the exponents in the columns propdmgexp, cropdmgexp to integral values. Note that the only values of the exponents which are meaningful are: H/h, k/K, m/M and B.

prop_vec <- c("-"=0, "?"=0, "+"=0, "0"=0, "1"=0, "2"=0, "3"=0, "4"=0, "5"=0, "6"=0, "7"=0,
              "8"=0, "B"=10^9, "h"=10^2, "H"=10^2, "K"=10^3, "M"=10^6, "m"=10^6, "k"=10^3)
cleaning_data$propdmgexp <- prop_vec[cleaning_data$propdmgexp]
cleaning_data$cropdmgexp <- prop_vec[cleaning_data$cropdmgexp]

We then create new columns property and crop which contain the damages created by the events.

cleaning_data <- mutate(cleaning_data, property = propdmg * propdmgexp, crop = cropdmg * cropdmgexp)

We finally create summaries by taking for the columns injuries, fatalities, property and crop the sum, mean and standard deviation across the events in each category. We then join the tables of means and standard deviations.

storm_totals <- group_by(cleaning_data, evtype) %>%
	         summarise(fatalities = sum(fatalities, na.rm=TRUE), injuries = sum(injuries, na.rm=TRUE),
		       property = sum(property, na.rm=TRUE), crop = sum(crop, na.rm=TRUE))
storm_means<- group_by(cleaning_data, evtype) %>%
               summarise(fatalities = mean(fatalities, na.rm=TRUE), injuries = mean(injuries, na.rm=TRUE),
	             property = mean(property, na.rm=TRUE), crop = mean(crop, na.rm=TRUE))
storm_sds<- group_by(cleaning_data, evtype) %>%
             summarise(sdfatalities = sd(fatalities, na.rm=TRUE), sdinjuries = sd(injuries, na.rm=TRUE),
	           sdproperty = sd(property, na.rm=TRUE), sdcrop = sd(crop, na.rm=TRUE))
storm_means <- inner_join(storm_means, storm_sds)

Results

Danger for humans

We select and sort the top 10 causes of total fatalities, and use total injuries to break ties.

topt_fatalities <- arrange(storm_totals, desc(fatalities), desc(injuries))[1:10,1:3]
topt_fatalities$fatalities <- as.integer(topt_fatalities$fatalities)
topt_fatalities$injuries <- as.integer(topt_fatalities$injuries)

We then show the results in a table.

options(xtable.comment=FALSE)
topt_fat_table <- xtable(topt_fatalities)
print(topt_fat_table, include.rownames=FALSE, type='html')

evtype	fatalities	injuries
tornado	5661	91407
excessive heat	2018	6680
heat	1116	2531
flash flood	1035	1802
lightning	817	5232
thunderstorm wind	730	9542
rip current	572	529
flood	518	6881
high wind	325	1585
cold/wind chill	237	41

We select and sort the top 10 causes of total injuries, and use total fatalities to break ties.

topt_injuries <- arrange(storm_totals, desc(injuries), desc(fatalities))[1:10,c(1,3,2)]
topt_injuries$fatalities <- as.integer(topt_injuries$fatalities)
topt_injuries$injuries <- as.integer(topt_injuries$injuries)

We then show the results in a table.

options(xtable.comment=FALSE)
topt_inj_table <- xtable(topt_injuries)
print(topt_inj_table, include.rownames=FALSE, type='html')

evtype	injuries	fatalities
tornado	91407	5661
thunderstorm wind	9542	730
flood	6881	518
excessive heat	6680	2018
lightning	5232	817
heat	2531	1116
ice	2159	101
flash flood	1802	1035
wildfire	1608	90
high wind	1585	325

An alternative approach is to find the events which in mean are most harmful. In fact, there might be events which are extremely dangerous, but also rare and so they are penalized in taking the sum of total fatalities / injuries. However, to account for variability, we also report standard deviations.

We find the top 10 events which have most fatalities in mean.

topm_fatalities <- arrange(storm_means, desc(fatalities), desc(injuries))[1:10,c(1,2,3,6,7)]

topm_fat_table <- xtable(topm_fatalities)
print(topm_fat_table, include.rownames=FALSE, type='html')

evtype	fatalities	injuries	sdfatalities	sdinjuries
tsunami	1.65	6.45	7.15	28.85
high seas	1.23	1.00	1.09	1.68
heat	1.17	2.65	18.99	19.62
excessive heat	1.06	3.49	4.71	24.59
rip current	0.74	0.68	0.63	2.29
avalanche	0.58	0.44	0.78	1.03
high water	0.50	0.00	0.84	0.00
hurricane/typhoon	0.44	4.44	1.59	48.88
coastal storm	0.36	0.18	0.50	0.40
marine strong wind	0.29	0.46	0.54	1.37

We find the top 10 events which have most injuries in mean.

topm_injuries <- arrange(storm_means, desc(injuries), desc(fatalities))[1:10,c(1,3,2,7,6)]

topm_inj_table <- xtable(topm_injuries)
print(topm_inj_table, include.rownames=FALSE, type='html')

evtype	injuries	fatalities	sdinjuries	sdfatalities
tsunami	6.45	1.65	28.85	7.15
glaze	4.80	0.16	12.02	0.77
hurricane/typhoon	4.44	0.44	48.88	1.59
excessive heat	3.49	1.06	24.59	4.71
heat	2.65	1.17	19.62	18.99
tornado	1.51	0.09	17.17	1.42
freezing fog	1.25	0.11	5.77	0.60
ice	1.02	0.05	34.23	0.33
dust storm	1.00	0.05	4.24	0.53
high seas	1.00	1.23	1.68	1.09

The next plot shows the top 10 causes of fatalities.

gfat <- barplot(topt_fatalities$fatalities, main = "Top 10 events by total fatalities",
     xlab = "event types", ylab = "total fatalities")
text(x = gfat, y = rep(2500,10), labels=topt_fatalities$evtype, srt=90)

Danger for property

We select the top 10 causes of damage to property, and use damage to crop to break ties.

topt_property <- arrange(storm_totals, desc(property), desc(crop))[1:10,c(1,4,5)]
topt_property$property <- format(topt_property$property, scientific=TRUE, digits=3)
topt_property$crop <- format(topt_property$crop, scientific=TRUE, digits=3)

We then show the results in a table (the units are USD).

topt_prp_table <- xtable(topt_property)
print(topt_prp_table, include.rownames=FALSE, type='html')

evtype	property	crop
flood	1.51e+11	1.09e+10
hurricane/typhoon	8.53e+10	5.51e+09
tornado	5.86e+10	4.17e+08
coastal flood	4.33e+10	5.00e+03
flash flood	1.69e+10	1.53e+09
hail	1.60e+10	3.05e+09
thunderstorm wind	1.10e+10	1.27e+09
wildfire	8.50e+09	4.03e+08
tropical storm	7.71e+09	6.95e+08
winter storm	6.69e+09	2.74e+07

We select the top 10 causes of damage to crop, and use damage to property to break ties.

topt_crop <- arrange(storm_totals, desc(crop), desc(property))[1:10,c(1,5,4)]
topt_crop$property <- format(topt_crop$property, scientific=TRUE, digits=3)
topt_crop$crop <- format(topt_crop$crop, scientific=TRUE, digits=3)

We then show the results in a table (the units are USD).

topt_crp_table <- xtable(topt_crop)
print(topt_crp_table, include.rownames=FALSE, type='html')

evtype	crop	property
drought	1.40e+10	1.05e+09
flood	1.09e+10	1.51e+11
hurricane/typhoon	5.51e+09	8.53e+10
ice	5.02e+09	3.96e+09
hail	3.05e+09	1.60e+10
frost/freeze	2.00e+09	1.93e+07
flash flood	1.53e+09	1.69e+10
extreme cold	1.31e+09	1.24e+08
thunderstorm wind	1.27e+09	1.10e+10
heavy rain	7.96e+08	3.25e+09

As for danger for humans, we also find the events which have greatest economic consequences in mean; for these we report the standard deviations to highlight the variability.

topm_property <- arrange(storm_means, desc(property), desc(crop))[1:10,c(1,4,5,8,9)]
topm_property$property <- format(topm_property$property, scientific=TRUE, digits=3)
topm_property$sdproperty <- format(topm_property$sdproperty, scientific=TRUE, digits=3)
topm_property$crop <- format(topm_property$crop, scientific=TRUE, digits=3)
topm_property$sdcrop <- format(topm_property$sdcrop, scientific=TRUE, digits=3)

We find the top 10 events which have the most property damage in mean:

topm_prp_table <- xtable(topm_property)
print(topm_prp_table, include.rownames=FALSE, type='html')

evtype	property	crop	sdproperty	sdcrop
hurricane/typhoon	3.82e+08	4.75e+07	1.59e+09	1.63e+08
coastal flood	2.48e+08	7.14e+02	2.51e+09	1.89e+03
storm tide	3.18e+07	6.25e+03	3.33e+08	6.48e+04
tropical storm	1.35e+07	1.62e+06	2.17e+08	1.20e+07
flood	7.85e+06	7.50e+05	8.31e+08	4.22e+07
tsunami	7.58e+06	1.05e+03	1.90e+07	4.59e+03
wildfire	3.69e+06	2.19e+05	4.70e+07	3.57e+06
ice	2.90e+06	5.20e+06	1.93e+07	1.61e+08
astronomical high tide	1.18e+06	0.00e+00	1.76e+06	NA
tornado	1.13e+06	4.35e+04	1.91e+07	1.04e+06

We find the top 10 events which have most crop damage in mean:

topm_crop <- arrange(storm_means, desc(crop), desc(property))[1:10,c(1,5,4,9,8)]
topm_crop$property <- format(topm_crop$property, scientific=TRUE, digits=3)
topm_crop$sdproperty <- format(topm_crop$sdproperty, scientific=TRUE, digits=3)
topm_crop$crop <- format(topm_crop$crop, scientific=TRUE, digits=3)
topm_crop$sdcrop <- format(topm_crop$sdcrop, scientific=TRUE, digits=3)

topm_crp_table <- xtable(topm_crop)
print(topm_crp_table, include.rownames=FALSE, type='html')

evtype	crop	property	sdcrop	sdproperty
hurricane/typhoon	4.75e+07	3.82e+08	1.63e+08	1.59e+09
extreme cold	9.65e+06	8.24e+05	5.58e+07	4.63e+06
drought	9.19e+06	7.43e+05	5.32e+07	1.75e+07
ice	5.20e+06	2.90e+06	1.61e+08	1.93e+07
frost/freeze	1.87e+06	1.87e+04	1.56e+07	2.82e+05
tropical storm	1.62e+06	1.35e+07	1.20e+07	2.17e+08
flood	7.50e+05	7.85e+06	4.22e+07	8.31e+08
excessive heat	6.88e+05	1.08e+04	1.83e+07	1.66e+05
heat	6.40e+05	1.93e+04	1.59e+07	2.10e+05
wildfire	2.19e+05	3.69e+06	3.57e+06	4.70e+07

The following plot illustrates the top 10 events for damage to property. On the y-axis we use a log10 scale.

gfat2 <- barplot(as.numeric(topt_property$property), ylog=TRUE, main = "Top 10 events by property damage",
      xlab="event types", ylab = "damage (logarithmic scale)")
text(x=gfat2, y=rep(5e10,10), labels=topt_property$evtype, srt=90)

Total Frequencies

We also compute the total frequencies and percentages of the events which occurr in one of the top 10 categories discussed in the previous sections.

top_names <- rbind(topm_crop$evtype, topm_fatalities$evtype, topm_property$evtype,
	            topm_injuries$evtype, topt_crop$evtype, topt_fatalities$evtype, topt_property$evtype,
	            topt_injuries$evtype)
top_names <- unique(as.vector(top_names))
top_freq <- filter(cleaning_data, evtype %in% top_names)
top_freq <- mutate(top_freq, freq=1)
total_freq <- group_by(top_freq, evtype) %>% summarise(freq=sum(freq))
total_freq <- mutate(total_freq, percentage = freq/sum(freq)*100)
total_freq <- arrange(total_freq, evtype)
total_freq$freq <- as.integer(total_freq$freq)

tot_freq_table <- xtable(total_freq)
print(tot_freq_table, include.rownames=FALSE, type='html')

evtype	freq	percentage
astronomical high tide	103	0.01
avalanche	388	0.05
coastal flood	261	0.03
coastal storm	11	0.00
cold/wind chill	1806	0.21
drought	2537	0.30
dust storm	438	0.05
excessive heat	1912	0.22
extreme cold	874	0.10
flash flood	55677	6.52
flood	30445	3.56
freezing fog	587	0.07
frost/freeze	1539	0.18
glaze	45	0.01
hail	288928	33.83
heat	954	0.11
heavy rain	12164	1.42
high seas	13	0.00
high water	6	0.00
high wind	22341	2.62
hurricane/typhoon	300	0.04
ice	2111	0.25
lightning	15778	1.85
marine strong wind	48	0.01
rip current	774	0.09
storm tide	154	0.02
thunderstorm wind	336724	39.43
tornado	60711	7.11
tropical storm	697	0.08
tsunami	20	0.00
wildfire	4239	0.50
winter storm	11436	1.34

R Code

Here is the code for cleaning cleaning_data$evtype.

In the first chunk we attempt to fit most entries in the categories described in Section 7 of the Storm Data Documentation.

## Distinguish tides between "astronomical" and "storm"
indx_astro <- grep("astronomical", cleaning_data$evtype)
indx_tide <- grep("tide", cleaning_data$evtype)
cleaning_data$evtype[setdiff(indx_tide, indx_astro)] <- "storm tide"
## Fix "avalanche" mispellings
indx_aval <- grep("aval", cleaning_data$evtype)
cleaning_data$evtype[indx_aval] <- "avalanche"
## Blizzards 
indx_blizzard <- grep("blizzard", cleaning_data$evtype)
indx_snow <- grep("snow", cleaning_data$evtype)
indx_wind <- grep("wind", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_blizzard, indx_snow)] <- "heavy snow"
cleaning_data$evtype[intersect(indx_blizzard, indx_wind)] <- "heavy wind"
indx_blizzard <- grep("blizzard", cleaning_data$evtype)
cleaning_data$evtype[indx_blizzard] <- "blizzard"
## Floods
indx_flood <- grep("flood", cleaning_data$evtype)
indx_coastal <- grep("coastal|beach|tidal|cstl", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_flood,indx_coastal)] <- "coastal flood"
## Find wind chill events
indx_wind <- grep("wind|cold", cleaning_data$evtype)
indx_chill <- grep("chill", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_wind, indx_chill)] <- "cold/wind chill"
## Debris flow
indx_flow <- grep("debris|flow", cleaning_data$evtype)
cleaning_data$evtype[indx_flow] <- "debris flow"
## Fog events
indx_dense_fog <- grep("dense fog", cleaning_data$evtype)
indx_other_fog <- grep("fog", cleaning_data$evtype)
cleaning_data$evtype[setdiff(indx_other_fog, indx_dense_fog)] <- "freezing fog"
## Decided to keep "dense fog" and "dense smoke" in new categories
cleaning_data$evtype[indx_dense_fog] <- "dense fog"
indx_smoke <- grep("smoke", cleaning_data$evtype)
cleaning_data$evtype[indx_smoke] <- "dense smoke"
## Drought
indx_drought <- grep("drought", cleaning_data$evtype)
indx_heat <- grep("excessive", cleaning_data$evtype)
cleaning_data$evtype[setdiff(indx_drought, indx_heat)] <- "drought"
indx_dev <- grep("dev", cleaning_data$evtype)
## Dust Devil
cleaning_data$evtype[indx_dev] <- "dust devil"
indx_dstorm <- grep("dust", cleaning_data$evtype)
cleaning_data$evtype[setdiff(indx_dstorm,indx_dev)] <- "dust storm"
## Classify "excessive" / "extreme" events
indx_ex <- grep("exc|ext", cleaning_data$evtype)
indx_heat <- grep("heat", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_ex, indx_heat)] <- "excessive heat"
indx_cold <- grep("cold|chill", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_ex, indx_cold)] <- "extreme cold"
## Lakeshore and flash floods
indx_flood <- grep("flood", cleaning_data$evtype)
indx_lake <- grep("lake", cleaning_data$evtype)
indx_flash <- grep("flash", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_flood, indx_lake)] <- "lakeshore flood"
cleaning_data$evtype[intersect(indx_flood, indx_flash)] <- "flash flood"
indx_lake <- union(indx_lake, indx_flash)
indx_flood <- setdiff(indx_flood, indx_lake)
cleaning_data$evtype[indx_flood] <- "flood"
## Classify "frost/freeze"
indx_freeze <- setdiff(grep("freez", cleaning_data$evtype),
                         grep("rain|fog|sleet|pre", cleaning_data$evtype))
indx_frost <- grep("frost", cleaning_data$evtype)
cleaning_data$evtype[union(indx_freeze, indx_frost)] <- "frost/freeze"
## Funnel
indx_funnel <- setdiff(grep("funnel", cleaning_data$evtype),
                          grep("thund|wat", cleaning_data$evtype))
cleaning_data$evtype[indx_funnel] <- "funnel cloud"
## Hail
indx_hail <- setdiff(grep("hail", cleaning_data$evtype),
                      grep("mar|thun|tor|tstm", cleaning_data$evtype))
cleaning_data$evtype[indx_hail] <- "hail"
indx_heat <- setdiff(grep("heat", cleaning_data$evtype), grep("ex", cleaning_data$evtype))
## Heat
cleaning_data$evtype[indx_heat] <- "heat"
indx_rain <- setdiff(grep("rain", cleaning_data$evtype),
                       grep("sleet|low|tstm|snow|uns|light", cleaning_data$evtype))
## Heavy rain		       
cleaning_data$evtype[indx_rain] <- "heavy rain"
indx_snow <- setdiff(grep("snow", cleaning_data$evtype),
                       grep("sleet|late|lake|mode|seas|lack|ligh|ear", cleaning_data$evtype))
## Heavy snow
cleaning_data$evtype[indx_snow] <- "heavy snow"
indx_surf <- grep("surf",cleaning_data$evtype)
## High surf
cleaning_data$evtype[indx_surf] <- "high surf"
indx_wind <- setdiff(grep("wind|wnd", cleaning_data$evtype),
                      grep("rain|tun|tstm|thun|chi|mir|mic|hail|low|thud|mar|strong",
		             cleaning_data$evtype))
## High wind			     
cleaning_data$evtype[indx_wind] <- "high wind"
## Hurricanes
indx_hurri <- grep("hurricane|typh", cleaning_data$evtype)
cleaning_data$evtype[indx_hurri] <- "hurricane/typhoon"
## Ice / Ice storms / Lake-effect snow
indx_icestm <- intersect(grep("ice", cleaning_data$evtype), grep("storm", cleaning_data$evtype))
cleaning_data$evtype[indx_icestm] <- "ice storm"
indx_lakesnow <- intersect(grep("lake", cleaning_data$evtype), grep("snow", cleaning_data$evtype))
cleaning_data$evtype[indx_lakesnow] <- "lake-effect snow"
## Lightning
indx_lig <- grep("ligh|lign", cleaning_data$evtype)
indx_good_lig <- grep("snow|northern|rain", cleaning_data$evtype)
cleaning_data$evtype[setdiff(indx_lig, indx_good_lig)] <- "lightning"
## Marine 
indx_mar_tst <- grep("marine (tst|thun)", cleaning_data$evtype)
cleaning_data$evtype[indx_mar_tst] <- "marine thunderstorm wind"
## Rip current
cleaning_data$evtype[grepl("rip", cleaning_data$evtype)] <- "rip current"
## Sleet
cleaning_data$evtype[grepl("sleet", cleaning_data$evtype)] <- "sleet"
## Strong wind
indx_strong <- setdiff(grep("strong", cleaning_data$evtype), grep("marine", cleaning_data$evtype))
cleaning_data$evtype[indx_strong] <- "strong wind"
## Thunderstorm wind
indx_tswind <- setdiff(grep("down|tst|thun|tun|gust", cleaning_data$evtype),
                          grep("summary|hail", cleaning_data$evtype))
cleaning_data$evtype[indx_tswind] <- "thunderstorm wind"
## Tornado
cleaning_data$evtype[grepl("tornado", cleaning_data$evtype)] <- "tornado"
## Tropical depression
indx_trop <- setdiff(grep("trop", cleaning_data$evtype), grep("depressio", cleaning_data$evtype))
cleaning_data$evtype[indx_trop] <- "tropical storm"
## Volcanic ash
cleaning_data$evtype[grepl("volc", cleaning_data$evtype)] <- "volcanic ash"
## Waterspout
cleaning_data$evtype[grepl("waterspout", cleaning_data$evtype)] <- "waterspout"
## Fires
indx_fire <- grep("fire", cleaning_data$evtype)
indx_wild <- grep("wild|brush|grass|red|forest", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_fire, indx_wild)] <- "wildfire"
## Winter storms and weather
indx_win <- grep("wint", cleaning_data$evtype)
indx_storm <- grep("storm", cleaning_data$evtype)
cleaning_data$evtype[intersect(indx_win, indx_storm)] <- "winter storm"
cleaning_data$evtype[setdiff(indx_win, indx_storm)] <- "winter weather"

In the second chunk we remove entries which were summaries of other events, since they are duplicated.

indx_sum <- grep("summary", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_sum,]

In the third chunk we refine the previous cleaning, mostly by "hand". Some changes involved correcting spelling errors, or looking in the original data set (under storm_data$remarks) to guess how to attribute the events. This was not possible for all events as there were cases where either it did not seem possible to fit them in a single category, or a meaningful description was missing. Events removed include:

"black ice";
"dam" incidents;
Some precipitations events which could not be mapped only to one of: "hail", "rain" or "snow";
"wet" and "unseasonal" events;
Specific accidents/mishap (either for boats or people dying of hypo/hyperthermia);
Some events whose label was like "other" and which lacked a description;

## Uncategorized events are removed
indx_uncat <- grep("\\?", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_uncat,]
## Abnormal warmth is excessive heat
indx_abwarm <- grep("abnormal warmth", cleaning_data$evtype)
cleaning_data$evtype[indx_abwarm] <- "excessive heat"
## Bursts are microbursts
cleaning_data$evtype[grep("burst", cleaning_data$evtype)] <- "microburst"
## Unseasonal events are unclear hence removed
indx_unsea <- grep("unsea", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_unsea,]
## Most "dry" are drought
indx_drought <- setdiff(grep("dry", cleaning_data$evtype), grep("mild", cleaning_data$evtype))
cleaning_data$evtype[indx_drought] <- "drought"
## Black ice and below normal precipitation are unclear
indx_bel <- grep("below|black", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_bel, ]
## Correct typo
indx_coastalstorm <- grep("coastalstorm", cleaning_data$evtype)
cleaning_data$evtype[indx_coastalstorm] <- "coastal storm"
## "apache county" was a "strong wind"
cleaning_data$evtype[grep("apac", cleaning_data$evtype)] <- "strong wind"
## Not clear where to fit beach/coastal erosion
indx_eros <- grep("eros", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_eros, ]
## "precipitations" which can be attributed to rain
indx_prec <- setdiff(grep("prec", cleaning_data$evtype), grep("snow|mix|nor", cleaning_data$evtype))
indx_remprec <- intersect(grep("prec", cleaning_data$evtype), grep("snow|mix|nor", cleaning_data$evtype))
cleaning_data$evtype[indx_prec] <- "heavy rain"
cleaning_data <- cleaning_data[-indx_remprec, ]
## Remove "wet" because not clear if it should be fit in "heat" or "rain"
indx_wet <- grep("wet", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_wet, ]
## More extreme cold
indx_cold <- setdiff(grep("cold", cleaning_data$evtype), grep("chill", cleaning_data$evtype))
cleaning_data$evtype[indx_cold] <- "extreme cold"
## Events which do not fit or describe accidents
indx_rem <- grep("early|drow|hyp", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem, ]
## "excessive" where  "excessive heat" (there was just one outlier)
indx_exc <- grep("excessive$", cleaning_data$evtype)
cleaning_data$evtype[indx_exc] <- "excessive heat"
## Hail, rain, glaze
indx_hail <- grep("hails", cleaning_data$evtype)
indx_sho <- grep("show", cleaning_data$evtype)
cleaning_data$evtype[indx_hail] <- "hail"
cleaning_data$evtype[indx_sho] <- "heavy rain"
indx_hail <- grep("hail st", cleaning_data$evtype)
cleaning_data$evtype[indx_hail] <- "hail"
indx_glz <- grep("glaze", cleaning_data$evtype)
cleaning_data$evtype[indx_glz] <- "glaze"
indx_hail <- intersect(grep("thunder",cleaning_data$evtype), grep("hail", cleaning_data$evtype))
cleaning_data$evtype[indx_hail] <- "hail"
## Record heat
indx_rechigh <- intersect(setdiff(grep("record",cleaning_data$evtype),
                           grep("snow|low|cool", cleaning_data$evtype)),
			    grep("warm",cleaning_data$evtype))
cleaning_data$evtype[indx_rechigh] <- "excessive heat"
## "urban floods" are floods
indx_flood <- grep("urban", cleaning_data$evtype)
cleaning_data$evtype[indx_flood] <- "flood"
## "landslides" mapped to "debris flow"
indx_debris <- grep("land", cleaning_data$evtype)
cleaning_data$evtype[indx_debris] <- "debris flow"
## remove "dam" events
indx_dam <- grep("dam", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_dam,]
## map to drought
indx_drmn <- grep("driest month", cleaning_data$evtype)
cleaning_data$evtype[indx_drmn] <- "drought"
## typo
indx_floo <- grep("flash floooding", cleaning_data$evtype)
cleaning_data$evtype[indx_floo] <- "flash flood"
## not clear where to fit "cool spells"
indx_cool <- grep("cool spell", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_cool, ]
## more rain
indx_hvyrain <- grep("heavy rain/lightning", cleaning_data$evtype)
cleaning_data$evtype[indx_hvyrain] <- "heavy rain"
## Classify these as heat (not clear if they could go to excessive heat)
cleaning_data$evtype[grep("(high temperature record|hot pattern|hot spell| hot weather)",
                       cleaning_data$evtype)] <- "heat"
## seas
cleaning_data$evtype[grep("(high|heavy) seas",cleaning_data$evtype)] <- "high seas"
## Ice is a new category
cleaning_data$evtype[grep("ice|icy", cleaning_data$evtype)] <- "ice"
## remove unclear events
indx_rem <- grep("(^high$|lack)", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem,]
## merge heavy and high swells
cleaning_data$evtype[grep("heavy swells", cleaning_data$evtype)] <- "high swells"
## "wall clouds" would probably fit in tornadoes
cleaning_data$evtype[grep("wall", cleaning_data$evtype)] <- "tornado"
## not clear were to fit late/light snow
indx_rem <- grep("(late|light) snow", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem, ]
## typos
cleaning_data$evtype[grep("lightning", cleaning_data$evtype)] <- "lightning"
cleaning_data$evtype[grep("low temp", cleaning_data$evtype)] <- "extreme cold"
## Remove some marine incidents
indx_rem <- grep("marine (acc|mish)",cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem,]
## This was a "thunderstorm wind" according to the original description
cleaning_data$evtype[grep("metro", cleaning_data$evtype)] <- "thunderstorm wind"
## remove moderate/mild and northern lights
indx_rem <- grep("^micr|north|mod|mild", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem,]
## these were cold
cleaning_data$evtype[grep("monthly temperature", cleaning_data$evtype)] <- "extreme cold"
## mud as debris
cleaning_data$evtype[grep("mud", cleaning_data$evtype)] <- "debris flow"
cleaning_data$evtype[grep("near record snow", cleaning_data$evtype)] <- "heavy snow"
## Unclassified events
indx_rem <- grep("(^no|no sever|other)",cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem,]
## Remove these events which do not fall in a single category or a main category
string_rem <- "(light freezing rain|rapidly rising water|record cool|record low rainfall"
string_rem <- paste(string_rem, "|red flag|severe turb|southeast)", sep="")
indx_rem <- grep(string_rem ,cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem,]
## more record temperatures
cleaning_data$evtype[grep("warm", cleaning_data$evtype)] <- "heat"
cleaning_data$evtype[grep("record high", cleaning_data$evtype)] <- "excessive heat"
cleaning_data$evtype[grep("record low$", cleaning_data$evtype)] <- "extreme cold"
cleaning_data$evtype[grep("record temperature", cleaning_data$evtype)] <- "excessive heat"
## floyd is a hurricane
cleaning_data$evtype[grep("floyd", cleaning_data$evtype)] <- "hurricane/typhoon"
## classify as debris
cleaning_data$evtype[grep("rock", cleaning_data$evtype)] <- "debris flow"
## put in high seas
cleaning_data$evtype[grep("rough seas", cleaning_data$evtype)] <- "high seas"
## 
cleaning_data$evtype[grep("seasonal snowfall", cleaning_data$evtype)] <- "heavy snow"
## These were floods (the name involves a "small")
cleaning_data$evtype[grep("^sm+", cleaning_data$evtype)] <- "flood"
## remove "late" events and wake wind
indx_rem <- grep("^late|^wake", cleaning_data$evtype)
cleaning_data <- cleaning_data[-indx_rem,]
## more corrections and fixing typos
cleaning_data$evtype[grep("rogue", cleaning_data$evtype)] <- "high waves"
cleaning_data$evtype[grep("storm surge", cleaning_data$evtype)] <- "coastal flood"
cleaning_data$evtype[grep("record temperature", cleaning_data$evtype)] <- "excessive heat" 
cleaning_data$evtype[grep("thud|tstm",cleaning_data$evtype)] <- "thunderstorm wind"
cleaning_data$evtype[grep("torndao",cleaning_data$evtype)] <- "tornado"
cleaning_data$evtype[grep("^wa",cleaning_data$evtype)] <- "waterspout"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project2.md

project2.md

Synopsis

Data Processing

Results

Danger for humans

Danger for property

Total Frequencies

R Code

Files

project2.md

Latest commit

History

project2.md

File metadata and controls

Synopsis

Data Processing

Results

Danger for humans

Danger for property

Total Frequencies

R Code