New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

One-step sfc variables time alignment #214

Merged

brianhenn merged 9 commits into develop-one-steps from feature/one_step_sfc_variables_time_alignment

Apr 6, 2020

Contributor

brianhenn commented Apr 3, 2020 •

edited

Loading

This makes the diagnostics variables appended to the big zarr have the appropriate step and forecast_time dimensions, just as the variables extracted by the wrapper do.

brianhenn added 5 commits

April 2, 2020 20:29


          added sfc var realignment to runfile


          step dim writing debugging

da1df45


          fixed step coord type error

a61a90f


          code cleanup

6b0bbe7


          removed orchestrator yml from PR

2b254b3

brianhenn marked this pull request as ready for review

April 3, 2020 03:25

brianhenn added 2 commits

April 3, 2020 04:01


          fixed physics/dynamics ordering

23d3813


          further debugging

866c9c7

brianhenn requested review from AnnaKwa and nbren12

April 3, 2020 15:53

nbren12 suggested changes

View reviewed changes

Contributor

nbren12 left a comment

Thanks Brian. I agree the surface data should have a "step" dimension just like the other variables. Some pretty minor comments below.

workflows/one_step_jobs/runfile.py Outdated

                   return (
-                      sfc[list(SFC_VARIABLES)]
+                      sfc[list(SFC_VARIABLES + GRID_VARIABLES)]

Contributor

nbren12 Apr 3, 2020

This is fine, but I would use the _safe_get_variables I defined below and see its docstring for why.

Contributor

nbren12 Apr 3, 2020

I do realize I wrote the original list here, but it's worth a refactor now.

workflows/one_step_jobs/runfile.py Outdated

@@ @@ -224,6 +240,8 @@ def post_process( @@
                   )
                   merged = xr.merge([sfc, ds])
+                  # assign step coordinate dtype, otherwise it's cast to object and zarr will choke
+                  merged = merged.assign_coords({"step": merged["step"].values.astype("<U14")})

Contributor

nbren12 Apr 3, 2020

We should somewhere keep track of useful conversions like this so we can reuse them.

Contributor Author

brianhenn Apr 3, 2020

Like as a util function?

workflows/one_step_jobs/runfile.py

               )
+              GRID_VARIABLES = ("lat", "lon", "latb", "lonb", "area")
               def rename_sfc_dt_atmos(sfc: xr.Dataset) -> xr.Dataset:

Contributor

nbren12 Apr 3, 2020

This function does more than just renaming at this point. I would refactor the aligning logic to its own function, which is called from calling routine.

workflows/one_step_jobs/runfile.py Outdated

+              def _align_sfc_step_dim(da: xr.DataArray) -> xr.DataArray:
+                  da_shift = da.shift(shifts={"time": 1})
+                  da_begin = da_shift.expand_dims({"step": ["begin"]})

Contributor

nbren12 Apr 3, 2020

Following comments by @AnnaKwa, I took pains to remove all the hard-codes to the names like "after_dynamics" to into the "paths" dict at line 282, please use the names from that dict, which are the keys to monitor_paths in post_process.

brianhenn commented

View reviewed changes

workflows/one_step_jobs/runfile.py Outdated

@@ @@ -215,7 +231,7 @@ def post_process( @@
                   sfc = xr.open_mfdataset(sfc_pattern, concat_dim="tile", combine="nested").pipe(
                       rename_sfc_dt_atmos
                   )
-                  sfc = _safe_get_variables(sfc, SFC_VARIABLES)
+                  sfc = _safe_get_variables(sfc, SFC_VARIABLES + GRID_VARIABLES)

Contributor Author

brianhenn Apr 3, 2020

Deleted this since it's getting called twice

brianhenn added 2 commits

April 3, 2020 19:15


          addressing PR comments

e2b32e4


          cleanup

afa97a3

brianhenn changed the title ~~Feature/one step sfc variables time alignment~~ One-step sfc variables time alignment

nbren12 approved these changes

View reviewed changes

Contributor

nbren12 left a comment

This LGTM Brian. Thanks. Feel free to merge if you have verified that the one-step jobs do actually run with these changes.

brianhenn merged commit 2f05506 into develop-one-steps

brianhenn deleted the feature/one_step_sfc_variables_time_alignment branch

April 6, 2020 23:01

nbren12 added a commit that referenced this pull request


          Develop one steps (#236)

ced15aa

* Feature/one step save baseline (#193)

This adds several features to the one-step pipeline

- big zarr. Everything is stored as one zarr file
- saves physics outputs
- some refactoring of the job submission.

Sample output: https://gist.github.com/nbren12/84536018dafef01ba5eac0354869fb67

* save lat/lon grid variables from sfc_dt_atmos (#204)

* save lat/lon grid variables from sfc_dt_atmos

* Feature/use onestep zarr train data (#207)

Use the big zarr from the one step workflow as input to the create training data pipeline

* One-step sfc variables time alignment (#214)

This makes the diagnostics variables appended to the big zarr have the appropriate step and forecast_time dimensions, just as the variables extracted by the wrapper do.

* One step zarr fill values (#223)

This accomplishes two things: 1) preventing true model 0s from being cast to NaNs in the one-step big zarr output, and 2) initializing big zarr arrays with NaNs via full so that if they are not filled in due to a failed timestep or other reason, it is more apparent than using empty which produces arbitrary output.

* adjustments to be able to run workflows in dev branch (#218)

Remove references to hard coded dims and data variables or imports from vcm.cubedsphere.constants, replace with arguments.
Can provide coords and dims as args for mappable var

* One steps start index (#231)

Allows for starting the one-step jobs at the specified index in the timestep list to allow for testing/avoiding spinup timesteps

* Dev fix/integration tests (#234)

* change order of required args so output is last

* fix arg for onestep input to be dir containing big zarr

* update end to end integration test ymls

* prognostic run adjustments

* Improved fv3 logging (#225)

This PR introduces several improvements to the logging capability of our prognostic run image

- include upstream changes to disable output capturing in `fv3config.fv3run`
- Add `capture_fv3gfs_func` function. When called this capture the raw fv3gfs outputs and re-emit it as DEBUG level logging statements that can more easily be filtered.
- Refactor `runtime` to `external/runtime/runtime`. This was easy since it did not depend on any other module in fv3net. (except implicitly the code in `fv3net.regression` which is imported when loading the sklearn model with pickle).
- updates fv3config to master

* manually merge in the refactor from master while keeping new names from develop (#237)

* lint

* remove logging from testing

* Dev fix/arg order (#238)

* update history

* fix positional args

* fix function args

* update history

* linting

Co-authored-by: Anna Kwa <[email protected]>
Co-authored-by: brianhenn <[email protected]>

spencerkclark pushed a commit that referenced this pull request


          DOC: changelog (#214)

23dc4c1

[ci skip]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet