-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weather Data QA #72
Comments
Big Questions Answers
|
|
Hey Rob,
|
Excellent - thanks for the update! Glad to see I misinterpreted the precip status! |
http://deq1.bse.vt.edu:81/met/ - web address for /backup/meteorology directory |
Various docs/resources that we have used for QA
|
Helpful Links for Null Values [here ](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt
|
7/15/2021 Update When batch running the land segments we discovered that we had missed some grids when using the All of the missing grids are currently being extracted from 1984-2020 (I used the handy nohup trick Rob showed us yesterday so it should run all night even when I get signed out of deq4). I will probably login for a couple of min tomorrow when the grids are finished downloading and batch run the UPDATE ON MISSING VALUES IN DATA
|
We ran into a couple land segments with missing data:
The problem seems to be that some of the grid data didn't finish downloading in 2008. We are currently redownloading and updating deq4 with complete timeseries data for both grids and corresponding land segments. The fact that our function caught the missing data is a good sign and we should be able to fix the issue and have all the ET csv files on deq4 by the meeting on Monday. 7/26/2021 Update:
|
9/20 update on issue that was previously being tracked in #122 The issue regarding missing data we were dealing with over the summer was dealing with two land segments not near the land segments from the new problem we have been discussing. Therefore, the potential for the previous NLDAS2_ASCII_to_LSegs run having used bad grid data is not what caused the bad precip time series data. Here is the side by side comparison of the same time between the old and new data. There seems to be no pattern from what I can see. Plot: Line plots: Log plots: After reviewing the summary stats, the precip_annual and 90_day_max_precip columns are extremely high. Searching and filtering each land segment by these will be a QA test to run. |
Another way of visualizing anomalies in the data: finding the upper and lower quartiles of data set, computing the IQR, if a value is 1.5*IQR it is flagged as an outlier.
|
Searching through all of the land segment data and flagging for yearly precipitation values greater than 150 inches resulted in 30 years of land segment data. For whatever reason 2008 seems to have been a problem year. However, this is using the data from before we just reran the function: which fixed the 2 land segments we have been looking at. It will be interesting to see if the data is fixed for every single of these land segments now too (this is also a reminder for me to do that tomorrow). Here is the .txt with year and land segment: |
@kylewlowe great outcome ^^. Eagerly anticipating the re-run that you do and see if that fixes many of these. |
Update on rerunning of flagged segments function: All of the 2008 values seemed to have fixed themselves after the re run. However, the two 1985 land segments did not change. We checked the grid data for the corresponding grids to see if the raw meteorological data downloaded wrong for 1985 and found a grid that only goes has data up until June 10th on the 10th hour. The grid is x382y101, which is in both land segments. This was probably a result of grib_to_ascii function not finishing, or the actual raw data downloading from NLDAS not finishing while running over the summer. We will continue to work on figuring out which one of these is the problem and redownload necessary data tomorrow. |
Update on Timeseries QA after reimporting data All data checked out with nothing being overly unusual. Flagged segment txt files for each metric (DPT, PRC, etc.) are located in the /backup/meteorology directory for viewing of individual values. The number of flagged data points are as follows:
The test values used were the same values used before the database reset. They are as follows:
|
R Code examine equations.
R Examine Data.
|
Still had a problem, some grid cells fixed, others not.
Batch process:
|
Big questions:
nldas_datasets
: om-model-info/6863472/dh_properties1984010100-2020123123
: om-model-info/6863473/dh_propertiesRscript R/lseg_qa_test_timeseries.R hydrocode dataset ftype model_code
Rscript R/lseg_qa_test_timeseries.R A37135 1984010100-2020123123 cbp532_landseg cbp-5.3.2
nldas_feature_dataset_prop()
DDPT, x385y94, 1986 , -40.3535233
DDPT, x386y94, 1986 , -40.3506203
DDPT, x386y95, 1986 , -40.3687019
DDPT, x387y95, 1986 , -40.2827759
DDPT, x388y95, 1986 , -40.0947647
DDPT, x388y96, 1986 , -40.0809441
DDPT, x389y96, 1986 , -39.9123192
QA Scripts/Code Samples
Find -9999 in any file in downloaded and parsed grid cell data
The text was updated successfully, but these errors were encountered: