-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
59a7e98
commit a0b1458
Showing
8 changed files
with
9,723 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
--- | ||
title: "Assignment 01 - Exploratory Data Analysis" | ||
author: "Ellya Gholmieh" | ||
format: html | ||
editor: visual | ||
embed-resources: true | ||
theme: hpstr | ||
--- | ||
|
||
## Exploratory Analysis | ||
|
||
```{r} | ||
#read in the data | ||
library(data.table) | ||
two <- data.table::fread(file.path('C:/Users/ellya/OneDrive/Desktop/PM566labs/2002_2.5PM_data.csv'), | ||
header = TRUE, sep = ',') | ||
ttwo <- data.table::fread(file.path('C:/Users/ellya/OneDrive/Desktop/PM566labs/2022_2.5PM_data.csv'), | ||
header = TRUE, sep = ',') | ||
#Check the dimenstions, headers, and footers | ||
dim(two) | ||
dim(ttwo) | ||
head(two) | ||
tail(two) | ||
head(ttwo) | ||
tail(ttwo) | ||
#quick look at the variables | ||
str(two) | ||
str(ttwo) | ||
#closer look at key variables | ||
summary(two$`Daily Mean PM2.5 Concentration`) | ||
head(two[order(two$`Daily Mean PM2.5 Concentration`), ]) | ||
tail(two[order(two$`Daily Mean PM2.5 Concentration`), ]) | ||
ttwo <- ttwo[ttwo$`Daily Mean PM2.5 Concentration` > 0, ] | ||
summary(ttwo$`Daily Mean PM2.5 Concentration`) | ||
head(ttwo[order(ttwo$`Daily Mean PM2.5 Concentration`), ]) | ||
tail(ttwo[order(ttwo$`Daily Mean PM2.5 Concentration`), ]) | ||
dim(ttwo) | ||
``` | ||
|
||
When looking more closely at the Daily Mean PM2.5 Concentration variable in both data sets, I noticed that the 2022 data set had PM2.5 values less than 0, which shouldn't be possible. I removed these observations from the data. Additionally, both data sets have maximum PM 2.5 values that are much higher than the 3rd quartile values. However, I could not find anything online that would prompt me to remove these high observations. There are 15976 observations for the 2002 data and the mean Daily Mean PM2.5 Concentration is 16.12. For the 2022 data, after removing the observations below zero, there are 59413 observations and the mean Daily Mean PM2.5 Concentration is 8.48. As the means are higher than the medians for both data sets, the data seems to be skewed right. |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
--- | ||
title: "lab-03" | ||
author: "Ellya Gholmieh" | ||
format: html | ||
editor: visual | ||
embed-resources: true | ||
theme: hpstr | ||
--- | ||
|
||
|
||
## 1.Read the Data | ||
|
||
|
||
```{r} | ||
file_path <- "C:/Users/ellya/OneDrive/Desktop/PM566labs/met_all.gz" | ||
met <- data.table::fread(file_path) | ||
``` | ||
|
||
|
||
## 2. Check the dimensions, headers, and footers. | ||
|
||
|
||
```{r} | ||
dim(met) | ||
``` | ||
|
||
|
||
There are 2,377,343 rows and 30 columns in this data set. | ||
|
||
## 3. Take a look at the variables. | ||
|
||
|
||
```{r} | ||
str(met) | ||
``` | ||
|
||
|
||
As the objectives of this lab are to find the weather station with the highest elevation and to look at patterns in the time series of its wind speed and temperature, the key variables are USAFID, elev (elevation), wind.sp (wind speed), temp (temperature), year, month, day, and hour. | ||
|
||
## 4. Take a closer look at the key variables. | ||
|
||
|
||
```{r} | ||
table(met$USAFID) | ||
table(met$year) | ||
table(met$month) | ||
table(met$day) | ||
table(met$hour) | ||
summary(met$temp) | ||
summary(met$elev) | ||
summary(met$wind.sp) | ||
met[met$elev==9999.0, ] <- NA | ||
summary(met$elev) | ||
``` | ||
|
||
|
||
### At what elevation is the highest weather station? | ||
|
||
|
||
```{r} | ||
tail(met[order(met$elev), ]) | ||
``` | ||
|
||
|
||
The highest weather station is at an elevation of 4113 feet. | ||
|
||
### How many missing values are there in the wind.sp variable? | ||
|
||
|
||
```{r} | ||
summary(met$wind.sp) | ||
``` | ||
|
||
|
||
There are 31582 missing values in the wind.sp variable. | ||
|
||
## 5. Check the data against an external data source. | ||
|
||
### Where was the location for the coldest temperature readings (-17.2C)? Do these seem reasonable in context? | ||
|
||
|
||
```{r} | ||
met <- met[met$temp > -40, ] | ||
mettemp <- met[order(met$temp),] | ||
head(mettemp)[,c(1,8:10,24)] | ||
|
||
``` | ||
|
||
|
||
The location for the coldest temperature readings corresponds to Yoder, Colorado. These readings do not seem reasonable for August considering for the last 14 years, the minimum temperature in August has not gone below 10 degrees Celsius (<https://www.worldweatheronline.com/v2/weather-averages.aspx?q=80864>) | ||
|
||
### Does the range of values for elevation make sense? Why or why not? | ||
|
||
|
||
```{r} | ||
met <- met[order(met$elev),] | ||
head(met)[,c(1,8:10,24)] | ||
tail(met)[,c(1,8:10,24)] | ||
``` | ||
|
||
|
||
The lowest elevation (-13) corresponds to the Naval Air Facility in Imperial, California. According to AirNav.com, the Naval Air Facility is approximately -13m below sea level. The highest elevation (4113) corresponds to Colorado Mines Peak. However, according to Google maps, the elevation at these coordinates is only 3572m. Thus, the reported maximum elevation is too high and does not make sense. | ||
|
||
## 6. Calculate summary statistics | ||
|
||
|
||
```{r} | ||
|
||
elev1 <- met[which(met$elev == max(met$elev, na.rm = TRUE)), ] | ||
summary(elev1) | ||
cor(elev1$temp, elev1$wind.sp, use="complete") | ||
cor(elev1$temp, elev1$hour, use="complete") | ||
cor(elev1$wind.sp, elev1$day, use="complete") | ||
cor(elev1$wind.sp, elev1$hour, use="complete") | ||
cor(elev1$temp, elev1$day, use="complete") | ||
``` | ||
|
||
|
||
## 7. Exploratory graphs | ||
|
||
|
||
```{r} | ||
hist(met$elev) | ||
hist(met$temp) | ||
hist(met$wind.sp) | ||
library(leaflet) | ||
leaflet(elev1) %>% | ||
addProviderTiles('OpenStreetMap') %>% | ||
addCircles(lat=~lat,lng=~lon, opacity=1, fillOpacity=1, radius=100) | ||
library(lubridate) | ||
elev1$date <- with(elev1, ymd_h(paste(year, month, day, hour, sep= ' '))) | ||
summary(elev1$date) | ||
elev1 <- elev1[order(elev1$date), ] | ||
head(elev1) | ||
``` | ||
|
||
|
||
### Use the plot function to make line graphs of temperature vs. date and wind speed vs. date | ||
|
||
|
||
```{r} | ||
plot(elev1$date, elev1$temp) | ||
plot(elev1$date, elev1$wind.sp) | ||
``` | ||
|
||
|
||
From the temperature vs. date plot, we can see that temperatures fluctuated throughout the month with a few days of high temperatures from August 19-21. We see that there were also fluctuations in wind speed throughout August, with a highs between August 12 and 19, lows from August 19-24, and highs again between August 25-27. | ||
|
||
## 8. Ask Questions | ||
|
||
Why is some data inputted incorrectly? Do higher latitude locations have lower temperatures? | ||
|
||
|
||
```{r} | ||
plot(met$lat, met$temp) | ||
``` | ||
|
||
|
||
Yes, higher latitudes have more variation in their temperatures and have a lower average temperature. | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.