Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: self-contained notebooks, API for example models #1872

Open
wpbonelli opened this issue Jul 13, 2023 · 3 comments
Open

feature: self-contained notebooks, API for example models #1872

wpbonelli opened this issue Jul 13, 2023 · 3 comments
Assignees
Milestone

Comments

@wpbonelli
Copy link
Member

wpbonelli commented Jul 13, 2023

Heavily edited after more consideration

Is your feature request related to a problem? Please describe.

This repo contains example model data in examples/data/. A download link was recently added to example notebooks rendered on ReadTheDocs, but many are not immediately runnable after download as they rely on example data. To run the notebooks first requires cloning the repo, or downloading files from the GitHub web UI, etc. This adds friction for first-time use and (from a maintainer perspective) complicates the docs build.

Describe the solution you'd like

One option is a module (and maybe CLI) providing access to example models. Small models could be included in the package, larger ones downloaded/cached on demand. Projects like PyVista and scikit-image do this. Or model input files could be generated on demand. Models in the following repos could be included:

  • modflowpy/flopy, in examples/data/
  • MODFLOW-USGS/modflow6-examples
  • MODFLOW-USGS/modflow6-testmodels
  • MODFLOW-USGS/modflow6-largetestmodels

PyVista usage looks like:

from pyvista import examples
teapot = examples.download_teapot()

skimage looks like:

from skimage import data
coins = data.coins()

The latter seems mildly preferable for brevity and because the data may already be cached (if downloaded) or generated (no downloads).

In flopy's case, maybe e.g.

from flopy import examples
sim = examples.freyberg_mf6()

Alternatively, it could return a Path to the model/simulation directory instead of the Modflow/MFSimulation/etc itself. The model/simulation seems preferable as the path is retrievable from it anyway. To avoid polluting the cache with output files and to support the common case of loading then switching to a new workspace before rewriting/running, a workspace or sim_path option may be convenient, perhaps defaulting to a temporary directory.

Notebooks and tests would then be able to use the example model interface. Removing implicit filesystem expectations leaves notebooks dependent only on a python/flopy env and modflow binaries.

PyVista uses Pooch to do the fetching/caching, some of whose source skimage appears to vendor in their own implementation. If we generate model input files instead of downloading them, this would not be necessary.

@wpbonelli wpbonelli changed the title feature: distribute and/or download example data feature: self-contained notebooks, API for example models Oct 3, 2023
@wpbonelli
Copy link
Member Author

wpbonelli commented Oct 3, 2023

A few more considerations. Example models are currently defined directly as input files. Is this the right way for programmatic access to models? If so, which models to bundle and which to download?

The largest subdirs of examples/data, increasing by size, are

> du -sh * | sort -h
...
1.0M	ssm_load_test
1.7M	swr_test
2.5M	mp6_examples
2.7M	preserve_unitnums
2.9M	mf2005_test
3.6M	freyberg_usg
5.0M	options
5.0M	uzf_examples
5.6M	swtv4_test
7.7M	mp6
8.2M	mt3d_example_sft_lkt_uzt
 17M	mfusg_test
 23M	mnw2_examples
 23M	pcgn_test
 33M	mf6
 51M	mt3d_test
 54M	zonbud_examples
 62M	secp
 86M	freyberg_multilayer_transient

An alternative may be to define in flopy and generate/cache on first request. I'm not sure how straightforward it is to convert existing input files to flopy code.

It seems like an examples module could offer the same API either way. Maybe it is worth experimenting to see if it is generally faster to pull big models over the wire or write them fresh (maybe the recent pandas speedup helps here).

For files distributed with flopy, the examples module could internally use importlib.resources.files.

@wpbonelli
Copy link
Member Author

wpbonelli commented Oct 6, 2023

An example models module would also simplify flopy and mf6 autotests, by removing the need for custom fixtures to fetch/prepare models for testing. There has been some effort to standardize the approach for this but it's still patchy.

Maybe devtools could provide programmatic access to examples, flopy could add a hard devtools dependency, and pass the same API through for convenience. This seems reasonable because

  • devtools is dependency free so it would be a light-weight addition
  • mf6.utils.generate_classes (perhaps wrongly) already depends on devtools — this seemed justifiable since class generation was up to now considered a developer task
  • it would allow deduplicating http client code in utils.get_modflow (maybe get-modflow implementation could move to devtools too)

@wpbonelli wpbonelli added this to the 3.7.0 milestone Mar 2, 2024
@wpbonelli wpbonelli modified the milestones: 3.7.0, 3.8.0 May 23, 2024
wpbonelli added a commit that referenced this issue Jul 10, 2024
Small step towards #1872. Move geometry info to a YAML file under examples/data/ and move utils into scripts to remove common/ module import and sys.path manipulation. A later PR may introduce pooch as we have done for the mf6 examples:

- MODFLOW-USGS/modflow6-examples#137
- MODFLOW-USGS/modflow6-examples#153
@wpbonelli wpbonelli modified the milestones: 3.8.0, 3.9.0 Aug 5, 2024
wpbonelli added a commit that referenced this issue Aug 8, 2024
#2264 removed the notebook_utils.py module but neglected to remove this dependency from 3 examples:

* export_vtk_tutorial.py
* plot_cross_section_example.py
* plot_map_view_example.py

This is why these are missing from the develop version of the RTD site.

This PR inlines the shared models which previously lived in notebook_utils.py. This duplication will go away once we have a models API as proposed in #1872.
@wpbonelli wpbonelli modified the milestones: 3.9, 3.10 Sep 4, 2024
@wpbonelli wpbonelli modified the milestones: 3.10, 4.0 Nov 5, 2024
@wpbonelli wpbonelli self-assigned this Nov 5, 2024
wpbonelli added a commit that referenced this issue Dec 12, 2024
First step towards #1872. Use pooch for data access. This is ugly, but it makes notebooks runnable (provided exes and python environment) out of the box. Local files will be used if detected, otherwise downloaded, following the pattern in the mf6 example models.

An eventual models API could hide all the details of model access.

Also mention the optional dependencies requirement on the tutorials and examples gallery pages.
@wpbonelli
Copy link
Member Author

wpbonelli commented Dec 16, 2024

Another variation on what this could look like, from xugrid

import xugrid as xu
ds = xu.data.adh_san_diego(xarray=True)

https://deltares.github.io/xugrid/examples/quick_overview.html#from-xarray-dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant