Add option to plot utilities to generate PNG output rather than inline plots #42

mnlevy1981 · 2021-01-26T19:23:05Z

This PR will generate images and metadata for @andersy005's dashboard. The first commit produces a separate JSON file for each generated plot, but @andersy005 will follow behind with a tool to clean that all up :) Additional commits are needed to generate PNG files from other plot types (time series, etc).

First pass at generating image files (and a JSON file with associated metadata) rather than including images inline in the notebook. Additional commits on this branch will address different types of plots.

notebooks/utils/utils.py

savefig() is part of the parent class, get_filename() is part of the child class. This current maintains the current API for summary_plots, but next commit will try to make that API more extensible. Still need to get casename into the SummaryMapClass to use in file name.

Use **plot_options for arguments beyond da and diag_metadata. Also updated get_filepaths() to return both the png path and the json path (this let me put output in images/summary_maps/CASENAME/... instead of requiring images/CASENAME to be the root)

Note that this doesn't update the notebooks yet, it's just the utils package

Note that I noticed a pretty big bug to fix in the next commit -- I'm not accounting for isel_dict in the file name, so plotting the same variable at two vertical levels will create a plot at the first level and then over-write it with the plot at the second level. I'll fix that tomorrow.

E.g. SiO3.0095-01-16_0104-12-16.z_t--500.00.png SiO3.0095-01-16_0104-12-16.z_t--35109.35.png

The issue only affected time series, and was due to a bad argument in self.get_timeseries_files() [wanted files for a specific variable, but was getting file names for ALL variables]

Added two new child classes of _PlotTypeBaseClass for trend maps and trend histograms; modified the two trend_maps notebooks to write files instead of generating plots.

mnlevy1981 · 2021-01-29T19:58:36Z

I'm going to turn off draft status because at this point it looks Sanity Check.ipynb is the only notebook with inline plots still, and it's only 133 KB on disk. I've verified that the number of plots I'm creating matches the number of plots being removed from each of the following notebooks (comments in the code block indicate how many png files exist in each directory; totals should sum to the number of plots removed when diffing the notebook against master):

plot_suite_003.ipynb

# histogram/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 160
# time_series/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 24
$ git diff master plot_suite_003.ipynb | grep ^-.*png | wc -l
184

plot_suite_004.ipynb

# histogram/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 680
# time_series/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 24
$ git diff master plot_suite_004.ipynb | grep ^-.*png | wc -l
704

plot_suite_1deg.ipynb

# histogram/g.e22b05.G1850ECOIAF_JRA.TL319_g17.cocco.001: 400
# time_series/g.e22b05.G1850ECOIAF_JRA.TL319_g17.cocco.001: 24
$ git diff master plot_suite_1deg.ipynb | grep ^-.*png | wc -l
184

plot_suite_maps_0001_003.ipynb

# summary_map/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 480
$ git diff master plot_suite_maps_0001_003.ipynb | grep ^-.*png | wc -l
480

plot_suite_maps_0001_004.ipynb

# summary_map/g.e22.G1850ECO_JRA_HR.TL319_t13.004: 480
$ git diff master plot_suite_maps_0001_004.ipynb | grep ^-.*png | wc -l
480

plot_suite_maps_0095_1deg.ipynb

# summary_map/g.e22b05.G1850ECOIAF_JRA.TL319_g17.cocco.001: 480
$ git diff master plot_suite_maps_0095_1deg.ipynb | grep ^-.*png | wc -l
480

trend_maps.003.ipynb

# trend_hist/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 30
# trend_map/g.e22.G1850ECO_JRA_HR.TL319_t13.003: 30
$ git diff trend_maps.003.ipynb | grep ^-.*png | wc -l
60

trend_maps.004.ipynb

# trend_hist/g.e22.G1850ECO_JRA_HR.TL319_t13.004: 30
# trend_map/g.e22.G1850ECO_JRA_HR.TL319_t13.004: 30
$ git diff trend_maps.004.ipynb | grep ^-.*png | wc -l
60

Still, I think there are several open questions we should discuss before merging:

File naming conventions: for now, running any of the notebooks that produce png output will create an images/ directory. Inside that directory, there are subdirectories for each type of plot (histogram/, summary_map/, time_series/, trend_hist/, and trend_map/). Each of those contains subdirectories for each of the cases, and then there is a pattern defined for each type of plot in utils/PlotTypeClass.py. Subdirectories for each case is not necessary, and I'm happy to just include the case name in the file name instead
Metadata conventions
- As a first pass, I am just creating one json file per plot... so the plot plot_type/case/filename.png is associated with plot_type/case/metadata/filename.json. I haven't tied in Anderson's generate_plot_catalog() function yet
- We should look closely at what is currently going into the json files and make sure I'm not omitting anything
Should we do some refactoring to bring routines from Plotting.py into PlotTypeClass.py? Perhaps the notebooks should be importing PlotTypeClass and the plots themselves can be generated by class method objects?
- Perhaps we would have an additional level of inheritance in PlotTypeClass classes... rather than _PlotTypeBaseClass -> SummaryMapClass & TrendMapClass we could have _PlotTypeBaseClass -> _MapBaseClass -> SummaryMapClass & TrendMapClass and have the actual plotting be defined in _MapBaseClass? I haven't actually sat down and thought about what this would look like, but it seems like SummaryMapClass and TrendMapClass would have more in common with each other than, say TrendMapClass and TrendHistClass

mnlevy1981 · 2021-02-01T16:16:47Z

@andersy005 -- I should have mentioned this in the previous comment, but png files are in /glade/work/mlevy/hi-res_BGC_JRA/plots_from_analysis/ if you want to play with those in your dashboard; I think the best way to figure out what metadata is missing is to look at a mock-up and see what we wish we could sort by :)

mnlevy1981 · 2021-02-01T22:06:10Z

Also on the to-do list: I need to add an argument to allow control over the figure size / resolution when writing to disk.

klindsay28 · 2021-02-16T21:03:02Z

I suggest that the __init__ and get_filepaths methods in class _PlotTypeBaseClass have *args and **kwargs arguments.

That way, calls to these methods, that should never be called, will actually get to the raise NotImplementedError statements.

As is, a user of a derived class that doesn't implement these methods will get errors about unexpected keyword arguments or too many positional arguments when the __init__ and get_filepaths methods in _PlotTypeBaseClass are called (, assuming that the user passes arguments to the methods). These argument error messages aren't as helpful as the message in the raise NotImplementedError statements.

1. Address Keith's comment about *args and **kwargs 2. Use time_bounds to determine interval for histograms and time series 3. Reorganize output so that case name is the top-level directory rather than plot type -- this will get clunky when comparing multiple cases, but it lines up better with the directory structure of webext.cgd.ucar.edu. When we have plots comparing multiple cases, we can figure out a good naming scheme 4. generate_plot_catalog() can print relative paths (and the filepath in the json file is relative to the casename rather than the notebooks directory)

mnlevy1981 · 2021-03-12T22:45:39Z

eefc145 introduces a few updates that are mostly related:

 A few changes to plot generation:

1. Address Keith's comment about *args and **kwargs
2. Use time_bounds to determine interval for histograms and time series
3. Reorganize output so that case name is the top-level directory rather than
   plot type -- this will get clunky when comparing multiple cases, but it
   lines up better with the directory structure of webext.cgd.ucar.edu. When we
   have plots comparing multiple cases, we can figure out a good naming scheme
4. generate_plot_catalog() can print relative paths (and the filepath in the
   json file is relative to the casename rather than the notebooks directory)

The first bullet point is just some low-hanging fruit; the rest ties into making data available for the dashboard. I've started posting some PNG files (and associated CSV catalogs) on webext; so now we can inline images directly from that server:

Also, @mgrover1 pointed out that we should use time_bounds instead of time to determine the beginning and end of the interval -- that's why the image above is [correctly] labelled from 0001-01-01 to 0017-12-31 where previous versions claimed it was a time series from 0001-01-16 to 0017-12-16

I'm waiting for one more notebook to finish running, then I'll make a commit with those updates.

edit: I guess the dates don't actually appear in the plot; they're in the file name (POC_FLUX_100m.0001-01-01_0017-12-31.png) as well as the csv catalog

edit2: inline image is now a link to original (I think github cached a version when I originally posted)

These notebooks put images in {casename}/{plottype}/ rather than {plottype}/{casename}/; there is also a gen_csv.ipynb notebook that creates csv catalogs for each case (the catalog is {casename}/png_catalog.csv)

mnlevy1981 · 2021-03-14T17:00:56Z

I noticed that png_catalog.csv has the index in the first column... but when I set index_label=False it doesn't go away. I'll play with it more during the week.

mnlevy1981 · 2021-03-16T18:34:27Z

From chatting with @mgrover1, it looks like we pull useful information from isel_dict for the filename (e.g. basins--Pacific and z_t--35109.35) but the metadata contains entry like "isel_dict": {"basins": 0} and "isel_dict": {"z_t": 0}. Those should be replaced with the actual dimension values... and maybe instead of isel_dict I can call it dim_reduction or something?

For this commit, I only pass savefig_kwargs to summary_plot_histogram() and summary_plot_global_ts(); for the plot_suite_{003,004,1deg}.ipynb files, I use kwargs to set dpi=72 (which matches the default value). The next commit(s) should update the rest of the plotting functions to also use plot_options['savefig_kwargs'].

A few other small changes snuck in with this commit 1. run_notebooks.sh uses PBS rather than SLURM 2. the location of the zonally-averaged output was moved to campaign 2b. the old datasets were missing year 2 from 003, so I regenerated all the za output files -- I did this based on time series rather than history files because that is what we have on disk -- I used a slightly updated region mask that doesn't seem to have changed the plots but did introduce O(1e-4) relative error in the Atlantic and Pacific basins (I believe it moved some grid cells from the Atlantic to the Pacific while leaving the global and Indian basins untouched)

I replaced metadata['isel_dict'] with metadata['reduced_dims'] and instead of including the index of the dim reduction, it lists the actual value (e.g. instead of basins=0, it contains basins="Global") I had also switched from ncar-jobqueue to dask-jobqueue in a previous commit, but with ncar-jobqueue now recognizing the PBS nodes on casper I was able to switch back.

I regenerated the csv files for each case, and this notebook reflects the change from isel_dict -> reduced_dims

mnlevy1981 · 2021-04-16T15:01:58Z

@mgrover1 and @klindsay28 -- this is ready for review.

@mgrover1 -- I've moved the latest output to webext, hopefully the dashboard reflects the change from isel_dict to reduced_dims.

@klindsay28 -- for now, I've left isel_dict in diag_metadata.yaml; we talked earlier this week about changing that so (for example) that the code uses sel_dict(z_t=35000, method="nearest") instead of isel_dict(z_t=28) but I'd rather address that in a separate issue / PR.

I believe all the other comments mentioned in meetings / this PR thread have been addressed.

1. use "sel_dict" instead of "reduced_dims" for name of dictionary in metadata 2. no longer pass casename = ds.attrs["title"] to plotting functions; all functions receive ds, and can determine casename itself (updated trend_map notebooks to copy time series attrs to the zonally-averaged datasets) 3. use ds[ds["time"].attrs["bounds"]] instead of ds["time_bound"] or ds.time_bound 4. clean up gen_csv.ipynb: * use relative path for finding pngs / reading metadata * use relative path for writing csv file * don't explicitly skip trend_map and trend_hist for the 1 degree run; * generate_plot_catalog() can handle non-existent directories 5. Add png_catalog.csv to the repository

mnlevy1981 and others added 2 commits January 26, 2021 12:19

Add option summary_plot_maps() to save png files

a3a1a73

First pass at generating image files (and a JSON file with associated metadata) rather than including images inline in the notebook. Additional commits on this branch will address different types of plots.

generate plot catalog

ebb17db

andersy005 reviewed Jan 26, 2021

View reviewed changes

notebooks/utils/utils.py Outdated Show resolved Hide resolved

Use absolute paths for the time being

7ffe097

andersy005 reviewed Jan 26, 2021

View reviewed changes

notebooks/utils/utils.py Outdated Show resolved Hide resolved

mnlevy1981 added 7 commits January 27, 2021 11:10

Refactor summary_plot_maps() API

d3994cd

Use **plot_options for arguments beyond da and diag_metadata. Also updated get_filepaths() to return both the png path and the json path (this let me put output in images/summary_maps/CASENAME/... instead of requiring images/CASENAME to be the root)

Add support for writing time series as pngs

a1f1c7e

Note that this doesn't update the notebooks yet, it's just the utils package

Different isel_dict leads to different file names

4986914

E.g. SiO3.0095-01-16_0104-12-16.z_t--500.00.png SiO3.0095-01-16_0104-12-16.z_t--35109.35.png

bugfix: gen_dataset failed reading multiple vars

43be393

The issue only affected time series, and was due to a bad argument in self.get_timeseries_files() [wanted files for a specific variable, but was getting file names for ALL variables]

Add support for PNGs from trend_maps notebooks

ce4ec17

Added two new child classes of _PlotTypeBaseClass for trend maps and trend histograms; modified the two trend_maps notebooks to write files instead of generating plots.

mnlevy1981 marked this pull request as ready for review January 29, 2021 19:58

Updated notebooks changing image dirtree

a94142f

These notebooks put images in {casename}/{plottype}/ rather than {plottype}/{casename}/; there is also a gen_csv.ipynb notebook that creates csv catalogs for each case (the catalog is {casename}/png_catalog.csv)

mnlevy1981 added 4 commits April 13, 2021 09:08

Re-ran gen_csv

00274a3

I regenerated the csv files for each case, and this notebook reflects the change from isel_dict -> reduced_dims

mnlevy1981 merged commit 01626d7 into marbl-ecosys:master Apr 22, 2021

mnlevy1981 mentioned this pull request Apr 22, 2021

Add interactive notebook example and update environment #44

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to plot utilities to generate PNG output rather than inline plots #42

Add option to plot utilities to generate PNG output rather than inline plots #42

mnlevy1981 commented Jan 26, 2021

mnlevy1981 commented Jan 29, 2021

mnlevy1981 commented Feb 1, 2021

mnlevy1981 commented Feb 1, 2021

klindsay28 commented Feb 16, 2021

mnlevy1981 commented Mar 12, 2021 •

edited

Loading

mnlevy1981 commented Mar 14, 2021

mnlevy1981 commented Mar 16, 2021

mnlevy1981 commented Apr 16, 2021

Add option to plot utilities to generate PNG output rather than inline plots #42

Add option to plot utilities to generate PNG output rather than inline plots #42

Conversation

mnlevy1981 commented Jan 26, 2021

mnlevy1981 commented Jan 29, 2021

mnlevy1981 commented Feb 1, 2021

mnlevy1981 commented Feb 1, 2021

klindsay28 commented Feb 16, 2021

mnlevy1981 commented Mar 12, 2021 • edited Loading

mnlevy1981 commented Mar 14, 2021

mnlevy1981 commented Mar 16, 2021

mnlevy1981 commented Apr 16, 2021

mnlevy1981 commented Mar 12, 2021 •

edited

Loading