-
Notifications
You must be signed in to change notification settings - Fork 1
Use venvs to support multiple pangeo-forge-recipes
versions
#115
Comments
@cisaacstern and I circled up today to discuss options Short termAn approach which allows @rabernat to easily run full integration tests with experimental tags of
I have not yet experimented with how runtime usage of Medium termTo address this dependency issue, minimize issues around base Docker image management and allow version backward compatibility we propose
Long term@cisaacstern and @yuvipanda have expressed some valid concerns about the security implications of installing packages into a
|
IMO, medium term we should move everything completely to Docker and not use venv at all! I've an alternate suggestion, just getting back into things after taking the day off yesterday - will respond by end of day. |
The core of the problem is this:
That's because pangeo-forge-runner executes the recipe.py file with an Here's what I think is a short-term solution:
I think this should be reasonably low hanging fruit to implement from both orchestrator as well as in the runner, and gives us what we want now ( |
I also think this can all be done soon enough to help @rabernat with the beam refactor, and we can then work on ways to figure out how to containerize this properly. That would involve definitely moving away from heroku to something else, and probably more first-class support for a |
@yuvipanda thanks for the thoughtful response.
To explore the pathway proposed by @sharkinsspatial a bit further, I just wanted to highlight that the proposal was to remove install time dependency on def install_pangeo_forge_recipes(version):
pkg = (
f"pangeo-forge-recipes=={version}"
if not version.startswith("@")
else f"git+https://github.com/pangeo-forge/pangeo-forge-recipes.git{version}"
)
cmd = [sys.executable, "-m", "pip", "install", "-q", pkg]
logger.info(f"\nInstalling {pkg} with pip in '--quiet' mode. This may take a few moments...\n")
subprocess.check_call(cmd)
logger.info(f"Installed {pkg}!") So this would not mean |
An advantage I see here, entirely apart from the possible usefulness for |
I had misunderstood what dependency you wanted removed, thanks for clarifying that @cisaacstern! IMO, I want us to do the least amount of work possible right now to unblock @rabernat, and then move towards containerization asap. An additional complication with installing
IMO, this can also be very conflicting - as a user, I don't want me calling a tool from a package to modify the environment it is in (unless it is explicitly built for that purpose, like pip). This is especially going to be problematic if my local environment has different versions of dependencies (like xarray) than what the pangeo-forge-recipes version depends on. |
This also makes me have opinions on pangeo-forge/pangeo-forge-runner#24, and I think the appropriate name for |
Based on @yuvipanda and my offline chat just now:
|
Thanks for the very productive conversation, @cisaacstern! <3 working with this group! I opened pangeo-forge/pangeo-forge-runner#27 to discuss splitting the app. It's a decent refactor that'll take a while. In the meantime, to unblock @rabernat maybe we can work on providing instructions so he can run |
Yes, IMO this is the best path forward. Ryan can use |
Per conversation at yesterday's Coordination meeting, we should use venvs to support multiple versions of
pangeo-forge-recipes
.To do so, we'll need to:
pangeo_forge_version
passed inmeta.yaml
as the venv for subprocess calls topangeo-forge-runner
.A circularity challenge: we need to call
pangeo-forge-runner expand-meta
to determine thepangeo_forge_version
. That happens in two places: here (for test runs) and here (for prod runs). This is problematic because, in the case ofdict_objects
,pangeo-forge-runner expand-meta
imports the recipe, in order to determine the names of the recipes.This means that we probably need a PR to
pangeo-forge-runner
which adds a--dont-expand-dict
flag toexpand-meta
, or alternatively simply a different command such asget-meta
, which does the same thing asexpand-meta
, without expanding dict objects. In either case, what we need is a way to ascertain thepangeo_forge_version
frommeta.yaml
without any risk of triggering a recipe import inpangeo-forge-runner
.An alternate path would be to lift this
pangeo_forge_version
concern into the orchestrator itself (via a GitHub API call to get the contents ofmeta.yaml
). My sense is that this feature (using a dynamically determined venv as the running environment forpangeo-forge-runner
) is useful for all users ofpangeo-forge-runner
(not just the orchestrator), so it is probably best kept at that layer?One further thought on venvs: one possibility would be to pre-build venvs for all supported
pangeo-forge-recipes
releases in the Dockerfile. Another option would be to havepangeo-forge-runner
dynamically provision a venv (perhaps even in a tempdir, if that's possible?) based on themeta.yaml
. The latter option seems more elegant to me, and more broadly useful for other users ofpangeo-forge-runner
, though I'm not totally clear how challenging it may be.cc @rabernat @yuvipanda @sharkinsspatial
The text was updated successfully, but these errors were encountered: