Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEFS download fixes #3349

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from
7 changes: 7 additions & 0 deletions modules/data.atmosphere/NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# PEcAn.data.atmosphere 1.8.0.9000

## Fixed
* `download.NOAA_GEFS` is updated to work again with GEFS v12.3,
the current release as of this writing in July 2024 (#3349).

## Changed
* Removed `sitename` and `username` from the formal arguments of `download.NOAA_GEFS`.
Before they were silently ignored, now they're treated as part of `...` (which is also ignored!).

# PEcAn.data.atmosphere 1.8.0

Expand Down
184 changes: 82 additions & 102 deletions modules/data.atmosphere/R/GEFS_helper_functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,21 +17,11 @@ noaa_grid_download <- function(lat_list, lon_list, forecast_time, forecast_date,


download_grid <- function(ens_index, location, directory, hours_char, cycle, base_filename1, vars,working_directory){
#for(j in 1:31){
if(ens_index == 1){
base_filename2 <- paste0("gec00",".t",cycle,"z.pgrb2a.0p50.f")
curr_hours <- hours_char[hours <= 384]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my changes in this hunk are just to simplify the logic, but flagging that I can't figure out why this line was here at all: All ensemble members from a given forecast cycle have the same number of hours available, so my unconfident best guess is someone got confused between cycle ids and ensemble ids. But cycle 00 is the one with more hours, so this would have been wrong anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think there are some ensemble members that are the full 35 days, and others that are shorter, even for cycle 00

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point to any examples of the shorter members, or documentation that mentions them? I've been poking around and haven't yet found any yet, but if this is an intermittent thing it'd be easy for me to have missed it.

}else{
if((ens_index-1) < 10){
ens_name <- paste0("0",ens_index - 1)
}else{
ens_name <- as.character(ens_index -1)
}
base_filename2 <- paste0("gep",ens_name,".t",cycle,"z.pgrb2a.0p50.f")
curr_hours <- hours_char
}


member_type <- if (ens_index == 1) { "gec" } else { "gep" } # "_c_ontrol", "_p_erturbed"
ens_idxname <- stringr::str_pad(ens_index - 1, width = 2, pad = "0")
base_filename2 <- paste0(member_type,ens_idxname,".t",cycle,"z.pgrb2a.0p50.f")
curr_hours <- hours_char

for(i in 1:length(curr_hours)){
file_name <- paste0(base_filename2, curr_hours[i])

Expand Down Expand Up @@ -73,37 +63,12 @@ noaa_grid_download <- function(lat_list, lon_list, forecast_time, forecast_date,

model_dir <- file.path(output_directory, model_name_raw)

#Availability: most recent 4 days
curr_time <- lubridate::with_tz(Sys.time(), tzone = "UTC")
curr_date <- lubridate::as_date(curr_time)

noaa_page <- readLines('https://nomads.ncep.noaa.gov/pub/data/nccf/com/gens/prod/')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pages (here and on deleted line 91 below) load fine in a browser, but inside R and via curl they throw Stream error in the HTTP/2 framing layer. But all this scraping and extracting can be replaced by knowing that the forecasts are retained for four days -- am I missing cases where data posting isn't reliable enough to assume "today and the three days before that"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience is there can be delays in when files show up and occasionally gaps, which is why we didn't just assume forecasts are always there.

Copy link
Member Author

@infotroph infotroph Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know, but note that the existing code only checked whether there exists a date folder that matches the requested forecast date, not whether all forecasts (or even all cycles) are present in it. How often do delays/gaps occur at the whole-day level?

Each individual file download (line 50 above) is wrapped in a tryCatch, so for both old and new code the likeliest outcome when files are missing will be a series of "skipping" messages (possibly then followed by a failure when downscaling can't bridge the time gap). Is that acceptable for this PR or do we need to dig further in on pre-download availability checks?


potential_dates <- NULL
for(i in 1:length(noaa_page)){
if(stringr::str_detect(noaa_page[i], ">gefs.")){
end <- stringr::str_locate(noaa_page[i], ">gefs.")[2]
dates <- stringr::str_sub(noaa_page[i], start = end+1, end = end+8)
potential_dates <- c(potential_dates, dates)
}
}


last_cycle_page <- readLines(paste0('https://nomads.ncep.noaa.gov/pub/data/nccf/com/gens/prod/gefs.', dplyr::last(potential_dates)))

potential_cycle <- NULL
for(i in 1:length(last_cycle_page)){
if(stringr::str_detect(last_cycle_page[i], 'href=\"')){
end <- stringr::str_locate(last_cycle_page[i], 'href=\"')[2]
cycles <- stringr::str_sub(last_cycle_page[i], start = end+1, end = end+2)
if(cycles %in% c("00","06", "12", "18")){
potential_cycle <- c(potential_cycle, cycles)
}
}
}
infotroph marked this conversation as resolved.
Show resolved Hide resolved

potential_dates <- lubridate::as_date(potential_dates)

potential_dates = potential_dates[which(potential_dates == forecast_date)]
potential_dates <- curr_date - lubridate::days(3:0)

potential_dates <- potential_dates[which(potential_dates == forecast_date)]

if(length(potential_dates) == 0){PEcAn.logger::logger.error("Forecast Date not available")}

Expand All @@ -118,7 +83,10 @@ noaa_grid_download <- function(lat_list, lon_list, forecast_time, forecast_date,
floor(min(lat_list)))

base_filename1 <- "https://nomads.ncep.noaa.gov/cgi-bin/filter_gefs_atmos_0p50a.pl?file="
vars <- "&lev_10_m_above_ground=on&lev_2_m_above_ground=on&lev_surface=on&lev_entire_atmosphere=on&var_APCP=on&var_DLWRF=on&var_DSWRF=on&var_PRES=on&var_RH=on&var_TMP=on&var_UGRD=on&var_VGRD=on&var_TCDC=on"
vars <- paste0(
"&lev_10_m_above_ground=on&lev_2_m_above_ground=on&lev_surface=on&lev_entire_atmosphere=on",
"&var_APCP=on&var_DLWRF=on&var_DSWRF=on&var_PRES=on&var_RH=on&var_TMP=on",
"&var_UGRD=on&var_VGRD=on&var_TCDC=on")

for(i in 1:length(potential_dates)){

Expand All @@ -143,11 +111,11 @@ noaa_grid_download <- function(lat_list, lon_list, forecast_time, forecast_date,
print(paste("Downloading", forecast_date, cycle))

if(cycle == "00"){
hours <- c(seq(0, 240, 3),seq(246, 384, 6))
hours <- hours[hours<=end_hr]
hours <- c(seq(0, 240, 3),seq(246, 840, 6))
}else{
hours <- c(seq(0, 240, 3),seq(246, min(end_hr, 840) , 6))
hours <- c(seq(0, 240, 3),seq(246, 384 , 6))
}
hours <- hours[hours<=end_hr]
infotroph marked this conversation as resolved.
Show resolved Hide resolved
hours_char <- hours
hours_char[which(hours < 100)] <- paste0("0",hours[which(hours < 100)])
hours_char[which(hours < 10)] <- paste0("0",hours_char[which(hours < 10)])
Expand All @@ -163,12 +131,12 @@ noaa_grid_download <- function(lat_list, lon_list, forecast_time, forecast_date,

parallel::mclapply(X = ens_index,
FUN = download_grid,
location,
directory,
hours_char,
cycle,
base_filename1,
vars,
location = location,
directory = directory,
hours_char = hours_char,
cycle = cycle,
base_filename1 = base_filename1,
vars = vars,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarity

working_directory = model_date_hour_dir,
mc.cores = 1)
}else{
Expand All @@ -177,6 +145,9 @@ noaa_grid_download <- function(lat_list, lon_list, forecast_time, forecast_date,
}
}
}



#' Extract and temporally downscale points from downloaded grid files
#'
#' @param lat_list lat for site
Expand Down Expand Up @@ -222,23 +193,13 @@ process_gridded_noaa_download <- function(lat_list,
dlwrfsfc <- array(NA, dim = c(site_length, length(hours_char)))
dswrfsfc <- array(NA, dim = c(site_length, length(hours_char)))

if(ens_index == 1){
base_filename2 <- paste0("gec00",".t",cycle,"z.pgrb2a.0p50.f")
}else{
if(ens_index-1 < 10){
ens_name <- paste0("0",ens_index-1)
}else{
ens_name <- as.character(ens_index-1)
}
base_filename2 <- paste0("gep",ens_name,".t",cycle,"z.pgrb2a.0p50.f")
}
member_type <- if (ens_index == 1) { "gec" } else { "gep" } # "_c_ontrol", "_p_erturbed"
ens_idxname <- stringr::str_pad(ens_index - 1, width = 2, pad = "0")
base_filename2 <- paste0(member_type,ens_idxname,".t",cycle,"z.pgrb2a.0p50.f")

lats <- round(lat_list/.5)*.5
lons <- round(lon_list/.5)*.5
Comment on lines 200 to 201
Copy link
Member Author

@infotroph infotroph Aug 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but: GEFS is now also available on a 0.25 degree grid. Would it be of interest to add support for that?


if(lons < 0){
lons <- 360 + lons
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're comparing these longitudes to the raw values in the grib file, which have negative values for west lon. I assume this must have changed at some point since these lines were written, but can't find any reference to it in the GEFS changelogs.

curr_hours <- hours_char

for(hr in 1:length(curr_hours)){
Expand All @@ -263,8 +224,13 @@ process_gridded_noaa_download <- function(lat_list,
vgrd10m[s, hr] <- grib_data_df$`10[m] HTGL=Specified height level above ground; v-component of wind [m/s]`[index]

if(curr_hours[hr] != "000"){
apcpsfc[s, hr] <- grib_data_df$`SFC=Ground or water surface; 03 hr Total precipitation [kg/(m^2)]`[index]
tcdcclm[s, hr] <- grib_data_df$`RESERVED(10) (Reserved); Total cloud cover [%]`[index]
# total precip alternates being named as 3 or 6 hr total
# TODO: not sure if the contents actually differ or if this is a labeling bug in the grib files
precip_hr <- if ((as.numeric(curr_hours[hr]) %% 2) == 1) { "03" } else { "06" }
precip_name <- paste("SFC=Ground or water surface;", precip_hr, "hr Total precipitation [kg/(m^2)]")
apcpsfc[s, hr] <- grib_data_df[[precip_name]][index]

tcdcclm[s, hr] <- grib_data_df$`EATM=Entire Atmosphere; Total cloud cover [%]`[index]
Comment on lines +227 to +233
Copy link
Member Author

@infotroph infotroph Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird and I don't understand why the grib files do it (or whether that means the interpretation of the precip totals should change from one time to the next), but I checked multiple sites and forecast days and the alternation is definitely there in the raw gribs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also the change in cloud cover name from "RESERVED..." to "EATM..." -- I would have expected this to show up in a changelog somewhere as well, but haven't been able to find any references to it.

dswrfsfc[s, hr] <- grib_data_df$`SFC=Ground or water surface; Downward Short-Wave Rad. Flux [W/(m^2)]`[index]
dlwrfsfc[s, hr] <- grib_data_df$`SFC=Ground or water surface; Downward Long-Wave Rad. Flux [W/(m^2)]`[index]
}
Expand Down Expand Up @@ -301,17 +267,15 @@ process_gridded_noaa_download <- function(lat_list,



cycle <-forecast_time
cycle <- forecast_time
infotroph marked this conversation as resolved.
Show resolved Hide resolved
curr_forecast_time <- forecast_date + lubridate::hours(cycle)
if(cycle < 10) cycle <- paste0("0",cycle)
if(cycle == "00"){
hours <- c(seq(0, 240, 3),seq(246, 840 , 6))
}else{
hours <- c(seq(0, 240, 3),seq(246, 384 , 6))
cycle <- stringr::str_pad(cycle, width = 2, pad = "0")
if (cycle == "00") {
hours <- c(seq(0, 240, 3),seq(246, 840, 6))
} else {
hours <- c(seq(0, 240, 3),seq(246, 384, 6))
}
hours_char <- hours
hours_char[which(hours < 100)] <- paste0("0",hours[which(hours < 100)])
hours_char[which(hours < 10)] <- paste0("0",hours_char[which(hours < 10)])
hours_char <- stringr::str_pad(hours, width = 3, pad = "0") # 3->"003", 384->"384"

raw_files <- list.files(file.path(model_name_raw_dir,forecast_date,cycle))
hours_present <- as.numeric(stringr::str_sub(raw_files, start = 25, end = 27))
Expand Down Expand Up @@ -341,19 +305,21 @@ process_gridded_noaa_download <- function(lat_list,
FUN = extract_sites,
hours_char = hours_char,
hours = hours,
cycle,
site_id,
lat_list,
lon_list,
cycle = cycle,
site_id = site_id,
lat_list = lat_list,
lon_list = lon_list,
working_directory = file.path(model_name_raw_dir,forecast_date,cycle),
mc.cores = 1)


forecast_times <- lubridate::as_datetime(forecast_date) + lubridate::hours(as.numeric(cycle)) + lubridate::hours(as.numeric(hours_char))
forecast_times <- lubridate::as_datetime(forecast_date) +
lubridate::hours(as.numeric(cycle)) +
lubridate::hours(as.numeric(hours_char))



#Convert negetive longitudes to degrees east
#Convert negative longitudes to degrees east
if(lon_list < 0){
lon_east <- 360 + lon_list
}else{
Expand Down Expand Up @@ -425,17 +391,18 @@ process_gridded_noaa_download <- function(lat_list,
#Calculate wind speed from east and north components
wind_speed <- sqrt(noaa_data$eastward_wind$value^2 + noaa_data$northward_wind$value^2)

forecast_noaa <- tibble::tibble(time = noaa_data$air_temperature$forecast.date,
NOAA.member = noaa_data$air_temperature$ensembles,
air_temperature = noaa_data$air_temperature$value,
air_pressure= noaa_data$air_pressure$value,
relative_humidity = noaa_data$relative_humidity$value,
surface_downwelling_longwave_flux_in_air = noaa_data$surface_downwelling_longwave_flux_in_air$value,
surface_downwelling_shortwave_flux_in_air = noaa_data$surface_downwelling_shortwave_flux_in_air$value,
precipitation_flux = noaa_data$precipitation_flux$value,
specific_humidity = specific_humidity,
cloud_area_fraction = noaa_data$cloud_area_fraction$value,
wind_speed = wind_speed)
forecast_noaa <- tibble::tibble(
time = noaa_data$air_temperature$forecast.date,
NOAA.member = noaa_data$air_temperature$ensembles,
air_temperature = noaa_data$air_temperature$value,
air_pressure= noaa_data$air_pressure$value,
relative_humidity = noaa_data$relative_humidity$value,
surface_downwelling_longwave_flux_in_air = noaa_data$surface_downwelling_longwave_flux_in_air$value,
surface_downwelling_shortwave_flux_in_air = noaa_data$surface_downwelling_shortwave_flux_in_air$value,
precipitation_flux = noaa_data$precipitation_flux$value,
specific_humidity = specific_humidity,
cloud_area_fraction = noaa_data$cloud_area_fraction$value,
wind_speed = wind_speed)

forecast_noaa$cloud_area_fraction <- forecast_noaa$cloud_area_fraction / 100 #Convert from % to proportion

Expand All @@ -455,14 +422,10 @@ process_gridded_noaa_download <- function(lat_list,
for (ens in 1:31) { # i is the ensemble number

#Turn the ensemble number into a string
if(ens-1< 10){
ens_name <- paste0("0",ens-1)
}else{
ens_name <- ens - 1
}
ens_name <- stringr::str_pad(ens - 1, width = 2, pad = "0")

forecast_noaa_ens <- forecast_noaa %>%
dplyr::filter(NOAA.member == ens) %>%
dplyr::filter(.data$NOAA.member == ens) %>%
dplyr::filter(!is.na(.data$air_temperature))

end_date <- forecast_noaa_ens %>%
Expand Down Expand Up @@ -525,6 +488,15 @@ process_gridded_noaa_download <- function(lat_list,
return(results_list)
} #process_gridded_noaa_download










#' @title Downscale NOAA GEFS from 6hr to 1hr
#' @return None
#'
Expand Down Expand Up @@ -645,6 +617,14 @@ temporal_downscale <- function(input_file, output_file, overwrite = TRUE, hr = 1











##' @title Write NOAA GEFS netCDF
##' @name write_noaa_gefs_netcdf
##' @param df data frame of meterological variables to be written to netcdf. Columns
Expand Down Expand Up @@ -711,4 +691,4 @@ write_noaa_gefs_netcdf <- function(df, ens = NA, lat, lon, cf_units, output_file

ncdf4::nc_close(nc_flptr) #Write to the disk/storage
}
}
}
47 changes: 26 additions & 21 deletions modules/data.atmosphere/R/download.NOAA_GEFS.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,36 +6,43 @@
##' @references https://www.ncdc.noaa.gov/crn/measurements.html
##'
##' @section NOAA_GEFS General Information:
##' This function downloads NOAA GEFS weather data. GEFS is an ensemble of 21 different weather forecast models.
##' A 16 day forecast is avaliable every 6 hours. Each forecast includes information on a total of 8 variables.
##' These are transformed from the NOAA standard to the internal PEcAn
##' standard.
##'
##' @section Data Avaliability:
##' NOAA GEFS weather data is avaliable on a rolling 12 day basis; dates provided in "start_date" must be within this range. The end date can be any point after
##' that, but if the end date is beyond 16 days, only 16 days worth of forecast are recorded. Times are rounded down to the previous 6 hour forecast. NOAA
##' GEFS weather data isn't always posted immediately, and to compensate, this function adjusts requests made in the last two hours
##' back two hours (approximately the amount of time it takes to post the data) to make sure the most current forecast is used.
Comment on lines -17 to -18
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this two-hour adjustment was being done in the current code, and am sure it isn't being done after this PR. Deleted it from the docs.

##' This function downloads NOAA GEFS weather data. GEFS is an ensemble of 31 different weather forecast models.
##' A 16 day forecast is available every 6 hours and a 35 day forecast is available every 24 hours.
##' Both are at 3-hour frequency for the first 10 days of the forecast and 6-hour frequency beyond that.
##' Each forecast includes information on a total of 8 variables.
##' These are transformed from the NOAA standard to the internal PEcAn standard.
##'
##' @section Data Availability:
##' NOAA GEFS weather data is available on a rolling 4 day basis.
##' Dates provided in "start_date" must be within this range.
##' The end date can be any point after that, but if the end date is beyond 16 days
##' (35 days for the midnight UTC forecast), only 16 (35) days worth of forecast are retrieved.
##' Times are rounded down to the previous 6 hour forecast.
##'
##' NOAA GEFS weather data isn't always posted immediately. Each 16-day forecast takes
##' approximately three hours to run, and the once-a-day forecasts for days 17-35 are
##' posted much later (up to 21 hours) than the forecasts for days 0 to 16.
##' See the [GEFS v12 release announcement](https://www.weather.gov/media/notification/pdf2/scn20-75gefs_v12_changes.pdf)
##' for details.
##'
##' @section Data Save Format:
##' Data is saved in the netcdf format to the specified directory. File names reflect the precision of the data to the given range of days.
##' Data is saved in the netcdf format to the specified directory.
##' File names reflect the precision of the data to the given range of days.
##' NOAA.GEFS.willow creek.3.2018-06-08T06:00.2018-06-24T06:00.nc specifies the forecast, using ensemble number 3 at willow creek on
##' June 6th, 2018 at 6:00 a.m. to June 24th, 2018 at 6:00 a.m.
##'
##' @return A list of data frames is returned containing information about the data file that can be used to locate it later. Each
##' data frame contains information about one file.
##'
##' @param outfolder Directory where results should be written
##' @param start_date, Range of dates/times to be downloaded (default assumed to be time that function is run)
##' @param end_date, end date for range of dates to be downloaded (default 16 days from start_date)
##' @param site_id The unique ID given to each site. This is used as part of the file name.
##' @param lat.in site latitude in decimal degrees
##' @param lon.in site longitude in decimal degrees
##' @param site_id The unique ID given to each site. This is used as part of the file name.
##' @param sitename Site name
##' @param username username from pecan workflow
##' @param outfolder Directory where results should be written
##' @param start_date Range of dates/times to be downloaded (default assumed to be time that function is run)
##' @param end_date end date for range of dates to be downloaded (default 16 days from start_date)
##' @param downscale logical, assumed True. Indicates whether data should be downscaled to hourly
##' @param overwrite logical. Download a fresh version even if a local file with the same name already exists?
##' @param downscale logical, assumed True. Indicated whether data should be downscaled to hourly
##' @param ... Additional optional parameters
##' @param ... Additional optional parameters, currently ignored
##'
##' @export
##'
Expand All @@ -50,8 +57,6 @@
##' @author Quinn Thomas, modified by K Zarada
##'
download.NOAA_GEFS <- function(site_id,
sitename = NULL,
username = 'pecan',
infotroph marked this conversation as resolved.
Show resolved Hide resolved
lat.in,
lon.in,
outfolder,
Expand Down
Loading
Loading