Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conceptual foundation for precipitation fidelity analyses #1279

Open
rburghol opened this issue Jun 13, 2024 · 2 comments
Open

Conceptual foundation for precipitation fidelity analyses #1279

rburghol opened this issue Jun 13, 2024 · 2 comments
Assignees

Comments

@rburghol
Copy link
Contributor

rburghol commented Jun 13, 2024

Goals

  • We want to create metrics that allow us to analyze multiple precipitation data source to determine which data source is superior over a given time period.
  • We want to use the precip fidelity metrics to assemble a best total aggregate dataset which is an amalgamation of spatially and temporally varying best fit sets.
  • Fidelity Metrics Must Answer:
    • What aspect of precip fidelity does this relationship represent?
    • How can we apply this to improve our models precip inputs and flow outputs?
    • What quantitative evaluations characterize this improvement?
    • What visualizations accompany the quantitative analysis to demonstrate actual improvement, case-studies of improvement ?
    • Describe how to fit this into our workflow framework Work Flows Component/Process Outline for Met Mashups #1282

Conceptual Questions:

  • How can we identify periods of anomalous precip?
    • Find "Phantom storms": precip data is non-zero, but no storm pulse evident in hydrograph
    • "Missing Storms": storm pulse evident in hydrograph, but precip data is zero
    • Outside bounds of expected $Q = f(P)$ relationship
      • Ex: Use weekly mean Q and P to find monthly varying regression relationships
        • If a given week value is outside the 95% confidence interval, consider it suspect
        • If a given dataset has a better or worse relationship for a month, use that one (could mash up data sources on a monthly basis across the board)
  • What are our overall hypotheses about precip vs. flow anomalies, ie., "this is a likely precip error"??
    • What causes a "true" anomaly? Two potential types:
      • Precip data is lower than resulting flow would suggest (beyond soil moisture deficit/storage)
      • Precip data is higher than resulting flow would suggest (beyond soil moisture deficit/storage)
      • Should we evaluate stream response as Q(t)-Q(t-1), because change in stream flow due to precip is easier to get than Q=f(p)
    • Magnitude of precip is the most likely and impactful error.
      • Daily/period total is under or over-simulated, leading to mismatch between rainfall and stream.
    • Intensity of precip is the most likely and impactful error.
      • Temporal distribution of rainfall is erroneous, maybe due to poor radar algorithms, kriging/orographic/other spatial distribution algorithm,
      • Faulty/clogged rain gage?
    • Anomalous peaks in human modified - flow release from dam (for flood prep or drought response, or spawning or whitewater recreation release)
    • Monthly relationship between P and Q, are much stronger than examining dataset by itself
      • $Q=f(P)$; overal relationship should have weak correlations
      • $Q_m=f(P_m)$; monthly relationship should be stronger?
  • How we plan to disaggregate the daily data to hourly model input timescale?
    • Use whatever values are in NLDAS2 to disaggregate PRISM/daymet/etc.
    • Use complex travel-time back-calculation from USGS gage to disaggregate precip for individual flow events
    • Design storm "hyetograph"
    • Monte-carlo like randomized storm, compared to gage record
  • What timescale should we use for stream flow records?
    • Daily:
      • hourly data shows constant diurnal fluctuation due to riparian evapotranspiration, creating a peak and trough every day when looking at hourly data.
      • Our alternative precipitation data is mostly in daily, so our ability to look at hourly signatures is limited.
  • validation: an incredibly important aspect of this project will be finding case studies to present visual evidence that methods are in fact, identifying erroneous periods, and well fitting periods. In this way, storm separation will at minimum play a crucial role here. Just because our numbers look good doesn't guarantee that we're achieving our goal, achieving the goal is the essential
    • are model results skewed because of the data set that was used for calibration?
    • phase6 was calibrated with NLDAS2, does it perform better?
@rburghol
Copy link
Contributor Author

rburghol commented Jun 17, 2024

Draft Workflow 1: Simple lm() analysis:

  • Metric: $R^2$ from lm() of $Q_{week} = f(P_{week,month})$
    • Each dataset has 12 monthly lm() created to compare weekly mean P with weekly mean Q
  • Application: data sets are ranked according to $R^2$, amalgamated dataset contains all monthly data from best performing base dataset for that month.

@rburghol
Copy link
Contributor Author

rburghol commented Jun 17, 2024

Draft Workflow 2: Weekly lm() analysis:

Moved to: HARPgroup/model_meteorology#59

  • Metric: $R^2$ from lm $Q_{ds} = f(P_{ds(week,month)})$
    • Each dataset has 12 monthly lm() created to compare weekly mean P with weekly mean Q
  • Application:
    • Data sets are ranked according to $R^2$,
    • Iterate through weekly datasets, calculating estimated $Q= f(P_{ds})$, and assessing $E_{Qds} = abs(Q_{usgs} - Q_{ds}) / R_{ds}^2$ for that week using the corresponding monthly lm()
    • For each week, use the $ds$ with the lowest $E_Q$
  • Data model
    • Storing Selection Criteria and Data:
      • Raster record for each week, cell value = $E_Q$
      • cell value = select method
        • use selected method varid as cell value
      • Raster record for each $ds$/week, $tsvalue = E_Q$, $tscode = 0, 1, 2, ...$
      • For each USGS watershed, 1 timeseries record for each $ds$/week, with $E_q$ for that week.
        • varkey: metq_lm_week_nldas2
        • varkey: metq_lm_week_daymet
        • varkey: metq_lm_week_prism
    • How to flatten the values for all watersheds into a raster?
  • Visualizations
    • 1 Raster weekly for each $ds$ with the $E_q$ for that $ds$/week
    • 1 raster weekly with the minimum $E_q$ for that week
    • 1 raster weekly with the best fit data source ID value (maybe use varid? or simple 0, 1, 2, 3 index scheme for selected?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants