Skip to content

Commit

Permalink
Lab 3 - HTML
Browse files Browse the repository at this point in the history
  • Loading branch information
KodzuKenma101 committed Sep 20, 2024
1 parent 59a7e98 commit a0b1458
Show file tree
Hide file tree
Showing 8 changed files with 9,723 additions and 0 deletions.
49 changes: 49 additions & 0 deletions Assignment01-ExploratoryDataAnalysis.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
title: "Assignment 01 - Exploratory Data Analysis"
author: "Ellya Gholmieh"
format: html
editor: visual
embed-resources: true
theme: hpstr
---

## Exploratory Analysis

```{r}
#read in the data
library(data.table)
two <- data.table::fread(file.path('C:/Users/ellya/OneDrive/Desktop/PM566labs/2002_2.5PM_data.csv'),
header = TRUE, sep = ',')
ttwo <- data.table::fread(file.path('C:/Users/ellya/OneDrive/Desktop/PM566labs/2022_2.5PM_data.csv'),
header = TRUE, sep = ',')
#Check the dimenstions, headers, and footers
dim(two)
dim(ttwo)
head(two)
tail(two)
head(ttwo)
tail(ttwo)
#quick look at the variables
str(two)
str(ttwo)
#closer look at key variables
summary(two$`Daily Mean PM2.5 Concentration`)
head(two[order(two$`Daily Mean PM2.5 Concentration`), ])
tail(two[order(two$`Daily Mean PM2.5 Concentration`), ])
ttwo <- ttwo[ttwo$`Daily Mean PM2.5 Concentration` > 0, ]
summary(ttwo$`Daily Mean PM2.5 Concentration`)
head(ttwo[order(ttwo$`Daily Mean PM2.5 Concentration`), ])
tail(ttwo[order(ttwo$`Daily Mean PM2.5 Concentration`), ])
dim(ttwo)
```

When looking more closely at the Daily Mean PM2.5 Concentration variable in both data sets, I noticed that the 2022 data set had PM2.5 values less than 0, which shouldn't be possible. I removed these observations from the data. Additionally, both data sets have maximum PM 2.5 values that are much higher than the 3rd quartile values. However, I could not find anything online that would prompt me to remove these high observations. There are 15976 observations for the 2002 data and the mean Daily Mean PM2.5 Concentration is 16.12. For the 2022 data, after removing the observations below zero, there are 59413 observations and the mean Daily Mean PM2.5 Concentration is 8.48. As the means are higher than the medians for both data sets, the data seems to be skewed right.
9,514 changes: 9,514 additions & 0 deletions lab-03.html

Large diffs are not rendered by default.

160 changes: 160 additions & 0 deletions lab-03.rmarkdown
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: "lab-03"
author: "Ellya Gholmieh"
format: html
editor: visual
embed-resources: true
theme: hpstr
---


## 1.Read the Data


```{r}
file_path <- "C:/Users/ellya/OneDrive/Desktop/PM566labs/met_all.gz"
met <- data.table::fread(file_path)
```


## 2. Check the dimensions, headers, and footers.


```{r}
dim(met)
```


There are 2,377,343 rows and 30 columns in this data set.

## 3. Take a look at the variables.


```{r}
str(met)
```


As the objectives of this lab are to find the weather station with the highest elevation and to look at patterns in the time series of its wind speed and temperature, the key variables are USAFID, elev (elevation), wind.sp (wind speed), temp (temperature), year, month, day, and hour.

## 4. Take a closer look at the key variables.


```{r}
table(met$USAFID)
table(met$year)
table(met$month)
table(met$day)
table(met$hour)
summary(met$temp)
summary(met$elev)
summary(met$wind.sp)
met[met$elev==9999.0, ] <- NA
summary(met$elev)
```


### At what elevation is the highest weather station?


```{r}
tail(met[order(met$elev), ])
```


The highest weather station is at an elevation of 4113 feet.

### How many missing values are there in the wind.sp variable?


```{r}
summary(met$wind.sp)
```


There are 31582 missing values in the wind.sp variable.

## 5. Check the data against an external data source.

### Where was the location for the coldest temperature readings (-17.2C)? Do these seem reasonable in context?


```{r}
met <- met[met$temp > -40, ]
mettemp <- met[order(met$temp),]
head(mettemp)[,c(1,8:10,24)]

```


The location for the coldest temperature readings corresponds to Yoder, Colorado. These readings do not seem reasonable for August considering for the last 14 years, the minimum temperature in August has not gone below 10 degrees Celsius (<https://www.worldweatheronline.com/v2/weather-averages.aspx?q=80864>)

### Does the range of values for elevation make sense? Why or why not?


```{r}
met <- met[order(met$elev),]
head(met)[,c(1,8:10,24)]
tail(met)[,c(1,8:10,24)]
```


The lowest elevation (-13) corresponds to the Naval Air Facility in Imperial, California. According to AirNav.com, the Naval Air Facility is approximately -13m below sea level. The highest elevation (4113) corresponds to Colorado Mines Peak. However, according to Google maps, the elevation at these coordinates is only 3572m. Thus, the reported maximum elevation is too high and does not make sense.

## 6. Calculate summary statistics


```{r}

elev1 <- met[which(met$elev == max(met$elev, na.rm = TRUE)), ]
summary(elev1)
cor(elev1$temp, elev1$wind.sp, use="complete")
cor(elev1$temp, elev1$hour, use="complete")
cor(elev1$wind.sp, elev1$day, use="complete")
cor(elev1$wind.sp, elev1$hour, use="complete")
cor(elev1$temp, elev1$day, use="complete")
```


## 7. Exploratory graphs


```{r}
hist(met$elev)
hist(met$temp)
hist(met$wind.sp)
library(leaflet)
leaflet(elev1) %>%
addProviderTiles('OpenStreetMap') %>%
addCircles(lat=~lat,lng=~lon, opacity=1, fillOpacity=1, radius=100)
library(lubridate)
elev1$date <- with(elev1, ymd_h(paste(year, month, day, hour, sep= ' ')))
summary(elev1$date)
elev1 <- elev1[order(elev1$date), ]
head(elev1)
```


### Use the plot function to make line graphs of temperature vs. date and wind speed vs. date


```{r}
plot(elev1$date, elev1$temp)
plot(elev1$date, elev1$wind.sp)
```


From the temperature vs. date plot, we can see that temperatures fluctuated throughout the month with a few days of high temperatures from August 19-21. We see that there were also fluctuations in wind speed throughout August, with a highs between August 12 and 19, lows from August 19-24, and highs again between August 25-27.

## 8. Ask Questions

Why is some data inputted incorrectly? Do higher latitude locations have lower temperatures?


```{r}
plot(met$lat, met$temp)
```


Yes, higher latitudes have more variation in their temperatures and have a lower average temperature.

Binary file added lab-03_files/figure-html/unnamed-chunk-10-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lab-03_files/figure-html/unnamed-chunk-10-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lab-03_files/figure-html/unnamed-chunk-10-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lab-03_files/figure-html/unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added lab-03_files/figure-html/unnamed-chunk-11-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a0b1458

Please sign in to comment.