feature: self-contained notebooks, API for example models #1872

wpbonelli · 2023-07-13T17:12:54Z

Heavily edited after more consideration

Is your feature request related to a problem? Please describe.

This repo contains example model data in examples/data/. A download link was recently added to example notebooks rendered on ReadTheDocs, but many are not immediately runnable after download as they rely on example data. To run the notebooks first requires cloning the repo, or downloading files from the GitHub web UI, etc. This adds friction for first-time use and (from a maintainer perspective) complicates the docs build.

Describe the solution you'd like

One option is a module (and maybe CLI) providing access to example models. Small models could be included in the package, larger ones downloaded/cached on demand. Projects like PyVista and scikit-image do this. Or model input files could be generated on demand. Models in the following repos could be included:

modflowpy/flopy, in examples/data/
MODFLOW-USGS/modflow6-examples
MODFLOW-USGS/modflow6-testmodels
MODFLOW-USGS/modflow6-largetestmodels

PyVista usage looks like:

from pyvista import examples
teapot = examples.download_teapot()

skimage looks like:

from skimage import data
coins = data.coins()

The latter seems mildly preferable for brevity and because the data may already be cached (if downloaded) or generated (no downloads).

In flopy's case, maybe e.g.

from flopy import examples
sim = examples.freyberg_mf6()

Alternatively, it could return a Path to the model/simulation directory instead of the Modflow/MFSimulation/etc itself. The model/simulation seems preferable as the path is retrievable from it anyway. To avoid polluting the cache with output files and to support the common case of loading then switching to a new workspace before rewriting/running, a workspace or sim_path option may be convenient, perhaps defaulting to a temporary directory.

Notebooks and tests would then be able to use the example model interface. Removing implicit filesystem expectations leaves notebooks dependent only on a python/flopy env and modflow binaries.

PyVista uses Pooch to do the fetching/caching, some of whose source skimage appears to vendor in their own implementation. If we generate model input files instead of downloading them, this would not be necessary.

The text was updated successfully, but these errors were encountered:

wpbonelli · 2023-10-03T17:10:46Z

A few more considerations. Example models are currently defined directly as input files. Is this the right way for programmatic access to models? If so, which models to bundle and which to download?

The largest subdirs of examples/data, increasing by size, are

> du -sh * | sort -h
...
1.0M	ssm_load_test
1.7M	swr_test
2.5M	mp6_examples
2.7M	preserve_unitnums
2.9M	mf2005_test
3.6M	freyberg_usg
5.0M	options
5.0M	uzf_examples
5.6M	swtv4_test
7.7M	mp6
8.2M	mt3d_example_sft_lkt_uzt
 17M	mfusg_test
 23M	mnw2_examples
 23M	pcgn_test
 33M	mf6
 51M	mt3d_test
 54M	zonbud_examples
 62M	secp
 86M	freyberg_multilayer_transient

An alternative may be to define in flopy and generate/cache on first request. I'm not sure how straightforward it is to convert existing input files to flopy code.

It seems like an examples module could offer the same API either way. Maybe it is worth experimenting to see if it is generally faster to pull big models over the wire or write them fresh (maybe the recent pandas speedup helps here).

For files distributed with flopy, the examples module could internally use importlib.resources.files.

wpbonelli · 2023-10-06T19:47:46Z

An example models module would also simplify flopy and mf6 autotests, by removing the need for custom fixtures to fetch/prepare models for testing. There has been some effort to standardize the approach for this but it's still patchy.

Maybe devtools could provide programmatic access to examples, flopy could add a hard devtools dependency, and pass the same API through for convenience. This seems reasonable because

devtools is dependency free so it would be a light-weight addition
mf6.utils.generate_classes (perhaps wrongly) already depends on devtools — this seemed justifiable since class generation was up to now considered a developer task
it would allow deduplicating http client code in utils.get_modflow (maybe get-modflow implementation could move to devtools too)

Small step towards #1872. Move geometry info to a YAML file under examples/data/ and move utils into scripts to remove common/ module import and sys.path manipulation. A later PR may introduce pooch as we have done for the mf6 examples: - MODFLOW-USGS/modflow6-examples#137 - MODFLOW-USGS/modflow6-examples#153

#2264 removed the notebook_utils.py module but neglected to remove this dependency from 3 examples: * export_vtk_tutorial.py * plot_cross_section_example.py * plot_map_view_example.py This is why these are missing from the develop version of the RTD site. This PR inlines the shared models which previously lived in notebook_utils.py. This duplication will go away once we have a models API as proposed in #1872.

First step towards #1872. Use pooch for data access. This is ugly, but it makes notebooks runnable (provided exes and python environment) out of the box. Local files will be used if detected, otherwise downloaded, following the pattern in the mf6 example models. An eventual models API could hide all the details of model access. Also mention the optional dependencies requirement on the tutorials and examples gallery pages.

wpbonelli · 2024-12-16T20:25:39Z

Another variation on what this could look like, from xugrid

import xugrid as xu
ds = xu.data.adh_san_diego(xarray=True)

https://deltares.github.io/xugrid/examples/quick_overview.html#from-xarray-dataset

wpbonelli added the enhancement label Jul 13, 2023

wpbonelli mentioned this issue Sep 13, 2023

Self-contained example notebooks MODFLOW-USGS/modflow6#1348

Closed

3 tasks

wpbonelli added the documentation label Sep 14, 2023

wpbonelli changed the title ~~feature: distribute and/or download example data~~ feature: self-contained notebooks, API for example models Oct 3, 2023

This was referenced Oct 16, 2023

Bringing this into flopy bdestombe/python-flopy-parser#3

Open

feature: expand benchmarking, try ASV #1989

Open

wpbonelli mentioned this issue Jan 11, 2024

Models API (demo) MODFLOW-USGS/modflow-devtools#134

Open

wpbonelli added this to the 3.7.0 milestone Mar 2, 2024

wpbonelli modified the milestones: 3.7.0, 3.8.0 May 23, 2024

wpbonelli mentioned this issue Jul 10, 2024

docs(examples): move geometry data to yml file, inline utilities #2264

Merged

wpbonelli modified the milestones: 3.8.0, 3.9.0 Aug 5, 2024

wpbonelli mentioned this issue Aug 8, 2024

fix(examples): restore example notebooks skipped after #2264 #2286

Merged

wpbonelli modified the milestones: 3.9, 3.10 Sep 4, 2024

wpbonelli modified the milestones: 3.10, 4.0 Nov 5, 2024

wpbonelli self-assigned this Nov 5, 2024

wpbonelli mentioned this issue Dec 10, 2024

docs(examples): use pooch for data access in tutorials/examples #2392

Merged

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: self-contained notebooks, API for example models #1872

feature: self-contained notebooks, API for example models #1872

wpbonelli commented Jul 13, 2023 •

edited

Loading

wpbonelli commented Oct 3, 2023 •

edited

Loading

wpbonelli commented Oct 6, 2023 •

edited

Loading

wpbonelli commented Dec 16, 2024 •

edited

Loading

feature: self-contained notebooks, API for example models #1872

feature: self-contained notebooks, API for example models #1872

Comments

wpbonelli commented Jul 13, 2023 • edited Loading

wpbonelli commented Oct 3, 2023 • edited Loading

wpbonelli commented Oct 6, 2023 • edited Loading

wpbonelli commented Dec 16, 2024 • edited Loading

wpbonelli commented Jul 13, 2023 •

edited

Loading

wpbonelli commented Oct 3, 2023 •

edited

Loading

wpbonelli commented Oct 6, 2023 •

edited

Loading

wpbonelli commented Dec 16, 2024 •

edited

Loading