if (!file.exists("met_all.gz"))
+download.file(
+ url = "https://raw.githubusercontent.com/USCbiostats/data-science-data/master/02_met/met_all.gz",
+ destfile = "met_all.gz",
+ method = "libcurl",
+ timeout = 60
+
+ )<- data.table::fread("met_all.gz") met
Lab 4
+Learning Goals
+-
+
- Read in and prepare the meteorological dataset +
- Create several graphs with different
geoms()
inggplot2
+ - Create a facet graph +
- Conduct some customizations to the graphs +
- Create a more detailed map using
leaflet()
+
Lab Description
+We will again work with the meteorological data presented in lecture.
+The objective of the lab is to examine the association between weekly average dew point and wind speed in four regions of the US and by elevation.
+Per Wikipedia: “The dew point of a given body of air is the temperature to which it must be cooled to become saturated with water vapor. This temperature depends on the pressure and water content of the air.”
+Again, feel free to supplement your knowledge of this dataset by checking out the data dictionary.
+Steps
+1. Read in the data
+First download and then read in with data.table::fread()
2. Prepare the data
+-
+
- Remove temperatures less than -17C +
- Make sure there are no missing data in the key variables coded as 9999, 999, etc +
- Generate a date variable using the functions
as.Date()
(hint: You will need the following to create a datepaste(year, month, day, sep = "-")
).
+ - Using the
data.table::week
function, keep the observations of the first week of the month.
+ - Compute the mean by station of the variables
temp
,rh
,wind.sp
,vis.dist
,dew.point
,lat
,lon
, andelev
.
+ - Create a region variable for NW, SW, NE, SE based on lon = -98.00 and lat = 39.71 degrees +
- Create a categorical variable for elevation as in the lecture slides +
<- met[temp > -17][elev == 9999.0, elev := NA]
+ met # Removes temp less than -17C and changes incorrect elev values to NA
+:= week(as.Date(paste(year, month, day, sep = "-")))]
+ met[, week # Generates date variable for week
+<- met[week == min(week, na.rm = TRUE)]
+ met # Keeps only the data from the first week of the month
+<- met[,.(temp=mean(temp,na.rm=TRUE), rh=mean(rh,na.rm=TRUE), wind.sp=mean(wind.sp,na.rm=TRUE),
+ met_avg vis.dist=mean(vis.dist,na.rm=TRUE), dew.point = mean(dew.point, na.rm=TRUE), lat=mean(lat), lon=mean(lon),
+ elev=mean(elev,na.rm=TRUE)), by="USAFID"]
+ # Computes means by station in table met_avg
+$elev_cat <- ifelse(met_avg$elev> 252, "high", "low")
+ met_avg# Creates categorical variable for elevation
+$region <- ifelse(met_avg$lon > -98 & met_avg$lat >39.71, "north east",
+ met_avgifelse(met_avg$lon > -98 & met_avg$lat < 39.71, "south east",
+ ifelse(met_avg$lon < -98 & met_avg$lat >39.71, "north west", "south west")))
+ # Creates a region variable by categorizing data based on latitude and longitude
+table(met_avg$region)
+north east north west south east south west
+ 484 146 649 296
+3. Use geom_violin
to examine the wind speed and dew point by region
+You saw how to use geom_boxplot
in class. Try using geom_violin
instead (take a look at the help). (hint: You will need to set the x
aesthetic to 1)
-
+
- Use facets +
- Make sure to deal with
NA
s
+ - Describe what you observe in the graph +
|>
+ met_avg filter(!(region %in% NA)) |>
+ ggplot()+
+geom_violin(mapping = aes(y=wind.sp, x=1)) +
+ facet_wrap(~region, nrow=2)
Warning: Removed 13 rows containing non-finite outside the scale range
+(`stat_ydensity()`).
+# Violin geom for wind speed
+|>
+ met_avg filter(!(region %in% NA)) |>
+ ggplot()+
+geom_violin(mapping = aes(y=dew.point, x=1)) +
+ facet_wrap(~region, nrow=2)
# Violin geom for dew point
+|>
+ met_avg filter(!(region %in% NA)) |>
+ ggplot()+
+geom_boxplot(mapping = aes(y=dew.point, fill=region)) +
+ facet_wrap(~region, nrow=2)
# Boxplot for dew point for comparison
The violin geom is a better visual of the distribution of data. In the wind speed graph, we can see a more normal tendency of wind speed in the south west and slightly higher average wind speeds in the north and south west, with the highest density of wind speed around 3 m/s in the west, compared to 1.5 m/s in the east. Interestingly, the north east has a very high outlier for wind speed near 11 m/s, where none of the other regions have such an outlier.
+The shapes of dew points are much less regular. The east has a very small spread of dew point with high average densities around 16C in the north and 21C in the south. The west has very large spreads of dew point, with some very small values compared to the east, likely because of the many diferent climates in the west. While the average is around 13C in both the north and south west, the violin geoms of the west take up a lot more space and the averages are lowered by very small dew points, some of which are negative.
+4. Use geom_jitter
with stat_smooth
to examine the association between dew point and wind speed by region
+-
+
- Color points by region +
- Make sure to deal with
NA
s
+ - Fit a linear regression line by region +
- Describe what you observe in the graph +
|>
+ met_avg filter(!(region %in% NA)) |>
+ggplot(mapping = aes(x=dew.point, y=wind.sp, color=region))+
+ geom_jitter() +
+ stat_smooth(method=lm)
`geom_smooth()` using formula = 'y ~ x'
+Warning: Removed 13 rows containing non-finite outside the scale range
+(`stat_smooth()`).
+Warning: Removed 13 rows containing missing values or values outside the scale range
+(`geom_point()`).
+There is relatively good clustering of wind speeds and dew points. The southwest appears to have little to no correlation between wind speed and dew point with an almost horizontal linear regression line, meaning the wind speed is around 3 m/s in the south west, regardless of dew point. The north and south east both have slightly positive correlations between dew point and wind speed, with clustering at the higher end of the dew point. This means that as dew point increases in the east, wind speed tends to slightly increase as well. Although, there is a very large outlier in the north east with an extremely high wind speed and relatively low dew point. Finally, the north west also has a positive correlation between dew point and wind speed, with a much larger range of dew points than the eastern regions.
+5. Use geom_bar
to create barplots of the weather stations by elevation category colored by region
+-
+
- Bars by elevation category using
position="dodge"
+ - Change colors from the default. Color by region using
scale_fill_brewer
see this
+ - Create nice labels on the axes and add a title +
- Describe what you observe in the graph +
- Make sure to deal with
NA
values
+
|>
+ met_avg filter(!(region %in% NA)) |>
+ggplot()+
+ geom_bar(mapping=aes(x=elev_cat,fill=region), position = "dodge")+
+ scale_fill_brewer(palette = "PiYG")+
+ labs(title="Number of weather stations by elevation category and region", x="Elevation Category", y= "Count")+
+ theme_bw()
There are similar numbers of weather stations at low and high elevations in the north east. The number of weather stations in the north west is very low compared to the rest of the country, likely because of the low population density in this region. There are many more stations at a high altitude in the north west. There are more than three times as many weather stations at low altitude than at high altitude in the south west and the south west has the most weather stations of the four regions. Finally, the south west has more than double the amount of weather stations at high altitude than low altitude.
+6. Use stat_summary
to examine mean dew point and wind speed by region with standard deviation error bars
+-
+
- Make sure to remove
NA
s
+ - Use
fun.data="mean_sdl"
instat_summary
+ - Add another layer of
stats_summary
but change the geom to"errorbar"
(see the help).
+ - Describe the graph and what you observe +
|>
+ met_avg filter(!(region %in% NA)) |>
+ggplot(mapping=aes(x=region, y=dew.point)) +
+ stat_summary(fun.data="mean_sdl", geom="errorbar") +
+ stat_summary(fun.data="mean_sdl")
|>
+ met_avg filter(!(region %in% NA)) |>
+ggplot(mapping=aes(x=region, y=wind.sp)) +
+ stat_summary(fun.data="mean_sdl", geom="errorbar") +
+ stat_summary(fun.data="mean_sdl")
Warning: Removed 13 rows containing non-finite outside the scale range
+(`stat_summary()`).
+Removed 13 rows containing non-finite outside the scale range
+(`stat_summary()`).
+-
+
- Dew point is highly variable and on average lower in the west compared to the east. Dew point is relatively consistent in the east and slightly higher in the south east than the north east. +
- Wind speed is variable across the country and, on average, slightly higher in the west than the east. +
7. Make a map showing the spatial trend in relative humidity in the US
+-
+
- Make sure to remove
NA
s
+ - Use leaflet() +
- Make a color palette with custom colors +
- Use
addMarkers
to include the top 10 places in relative humidity (hint: this will be usefulrank(-rh) <= 10
)
+ - Add a legend +
<-met_avg[!is.na(rh)]
+ met_avg2
+# Top five
+<- met_avg2[rank(-rh) <= 10]
+ top5
+= colorNumeric(c('blue','purple','red'), domain=met_avg2$rh)
+ rh_pal leaflet(met_avg2) |>
+addProviderTiles('OpenStreetMap') |>
+ addCircles(lat=~lat, lng=~lon, color=~rh_pal(rh), label=~paste0(round(rh,2), ' rh'), opacity=1,fillOpacity=1, radius=500) |>
+ addMarkers(lat=~lat, lng=~lon, label=~paste0(round(rh,2), ' rh'), data = top5) |>
+ addLegend('bottomleft',pal=rh_pal, values=met_avg2$rh, title="Relative Humidity", opacity=1)
## ANSWERS TO QUESTION #8
+# Make a box plot of wind speed vs temp by region using ggpattern
+ ggplot(data = met_avg2) +
+ geom_boxplot_pattern(mapping=aes(x=temp, y=wind.sp,fill=region), pattern = "placeholder", pattern_type = "bear") +
+ labs(title="Wind speed verses temperature by region", x="Temperature (C)", y= "Wind Speed (m/s)") +
+ theme_minimal()
Warning: Removed 13 rows containing non-finite outside the scale range
+(`stat_boxplot()`).
+# Histogram of elevation by region using ggpattern
+ ggplot(data = met_avg2) +
+ geom_histogram_pattern(mapping=aes(x=elev,fill=region, pattern = "stripe"), position = "identity") +
+ labs(title="Histogram of elevation by region", x="Elevation (m)") +
+ theme_minimal()
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+-
+
Describe the trend in RH across the US
+The trend of relative humidity is an increase from west to east, disregarding the coastline in west. Generally, the relative humiditiy is highest in the midwest, south, north east, and the western coastline. The relative humidity is the lowest in the sun belt and moderate in the Rocky Mountain range. 6 of the top 10 relative humidities are near the coastline in the west and Florida.
+
8. Use a ggplot extension
+-
+
Pick an extension (except cowplot) from here and make a plot of your choice using the met data (or met_avg)
+Might want to try examples that come with the extension first (e.g. ggtech, gganimate, ggforce)
+RStudio keeps giving me errors here, so answers are above
+