Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: tc4 installs python-netCDF4 via venv #1166

Merged
merged 12 commits into from
Jul 27, 2020

Conversation

marshallward
Copy link
Collaborator

Currently, users are expected to have numpy and netcdf4 python modules
in order to generate the necessary netCDF input files. This fails in
environments where these modules are unavailable.

This patch now installs the modules into a virtual environment which are
accessible when generating the tc4 inputs.

This solution is local to tc4 but could be extended to other tests as
needed.

Currently, users are expected to have numpy and netcdf4 python modules
in order to generate the necessary netCDF input files.  This fails in
environments where these modules are unavailable.

This patch now installs the modules into a virtual environment which are
accessible when generating the tc4 inputs.

This solution is local to tc4 but could be extended to other tests as
needed.
The numpy and netCDF4 packages are no longer needed since tc4 now
installs these locally.
tc4 was using the venv module which appears to be python3-specific, and
Travis Ubuntu defaults to python2.  Also virtualenv was not installed in
either case.

This patch adds python-virtualenv to the install packages and uses the
virtualenv module.
Travis ARM nodes need to build numpy natively when installed by pip, and
thus require Python headers.  These are provided by the python-dev
package.
@marshallward
Copy link
Collaborator Author

Two other comments:

  1. This only touches .testing so does not require regression testing on Gaea
  2. This would be a good candidate to trial squash commit.

@adcroft
Copy link
Collaborator

adcroft commented Jul 22, 2020

On gaea

/usr/bin/python2: No module named virtualenv

@@ -1,3 +1,6 @@
ocean_hgrid.nc topog.nc temp_salt_ic.nc sponge.nc:
python build_grid.py
python build_data.py
python2 -m virtualenv local-env
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convenient those these are for a user anything that is platform specific (these do not work everywhere) needs to be higher up. I think the local-env creation belongs in the .travis.yml file so we can put something different in the .gitlab-ci.yml file.

What you could do is move this step into it's own target for .testing/local-env and make these targets depend on .testing/local-env. That way it can be created by the travis/gitlab scripts but it will also be created for a user's convenience. It just won't work on gaea or everywhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is a problem. It seems like python3 -m venv ${dir} works on Gaea but python2 -m virtualenv ${dir} does not? I agree this needs to be resolved.

But I don't think the python3 -m venv local-env step (or py2 equivalent above) belongs in .travis.yml. This is not much different from apt install python-netcdf and arguably solves nothing, replacing one dependency with another.

I'm not really worried about Travis or Gaea's Gitlab, since those are controlled environments already. I'm more concerned about the user who has neither one set up.

Arguably the current scripts build_grid.py and build_data.py are also platform-specific, since they assume numpy and python-netCDF4 exist on the platform. So I concede that this PR really just shifts the requirements around without solving the problem.

We cannot call pip install netCDF without also calling virtualenv (or conda or whatever). So this command should only be called in conjunction with python -m venv local-env.

One suggestion is to do some autoconf-like testing to determine which one works.

But maybe there is just no way to do this unless the user has numpy and python netCDF4 installed.

Copy link
Collaborator Author

@marshallward marshallward Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other thoughts:

  • I had mistakenly thought virtualenv was in the Standard Library. It is not, which is arguably the source of our problems.
  • However, venv is part of the Python 3 Standard Library. So if we can assume Python 3, then we can assume venv.
  • So I suggest we do a bit of testing before running this Makefile:
    1. Check if python 3 is available. If so, no worries.
    2. Check if python 2 is being used. If so, then
      a. Test for virtualenv. If so, then replace venv with virtualenv
      b. If not, then test for python-netCDF4. If so, then don't even bother running virtualenv.
    3. Finally, if there is no 3 and 2 is unsuitable (for the reasons above) then we abandon the test.

While 2.x is still pervasive, 2.7 has been tagged for deprecation and its days are numbered. It won't be long until everyone is on Python 3. So I think we go with 3 using the proposed (and amended) solution above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I don't disagree that we want this to work for a user but the current form will be hard to make work in our pipeline. If you do this we can work with it:

local-env:
	python2 -m virtualenv local-env \
	&& . local-env/bin/activate \
	&& pip install netCDF4
ocean_hgrid.nc topog.nc temp_salt_ic.nc sponge.nc: local-env
	. local-env/bin/activate \
	&& python build_grid.py \
	&& python build_data.py
```
because we can pre-build tc4/local-env.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just switching to Python 3 would solve our problems (after adding it to the Travis instance)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh BTW I agree on splitting the rule, I had meant to do that at some point.

Reverting the python 2 support (default for Travis) to use Python 3
syntax.

The main reason is that Python 3 includes venv (equivalent to
virtualenv) as its standard library, and is therefore guaranteed to
exist if Python 3 exists.

Python 3's virtualenv must be independently installed, which cannot be
confirmed.

This will cause problems for people without Python 3, but this is
probably the best solution, or at least the starting point for a more
general solution.
Ubuntu apparently requires an explicit install of python3-venv despite
it being part of the standard library.  Go figure...
Arm64 Ubuntu environments require explicit installations which are
otherwise provided on x86 Ubuntu:

* Python 3 Pip must be installed (python3-pip)
* Wheel installation must be explicitly installed
* Cython is required for numpy
* Numpy must be explicitly built before installing python-netCDF4
@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2020

Codecov Report

Merging #1166 into dev/gfdl will decrease coverage by 0.27%.
The diff coverage is 32.16%.

Impacted file tree graph

@@             Coverage Diff              @@
##           dev/gfdl    #1166      +/-   ##
============================================
- Coverage     46.08%   45.81%   -0.28%     
============================================
  Files           214      224      +10     
  Lines         69399    70244     +845     
============================================
+ Hits          31984    32179     +195     
- Misses        37415    38065     +650     
Impacted Files Coverage Δ
...g_src/external/GFDL_ocean_BGC/FMS_coupler_util.F90 0.00% <0.00%> (ø)
...fig_src/external/GFDL_ocean_BGC/generic_tracer.F90 0.00% <0.00%> (ø)
...c/external/GFDL_ocean_BGC/generic_tracer_utils.F90 0.00% <0.00%> (ø)
config_src/external/ODA_hooks/kdtree.f90 0.00% <0.00%> (ø)
config_src/external/ODA_hooks/ocean_da_core.F90 0.00% <0.00%> (ø)
config_src/external/ODA_hooks/ocean_da_types.F90 0.00% <0.00%> (ø)
config_src/external/ODA_hooks/write_ocean_obs.F90 0.00% <0.00%> (ø)
config_src/solo_driver/MESO_surface_forcing.F90 0.00% <0.00%> (ø)
config_src/solo_driver/MOM_driver.F90 68.72% <ø> (ø)
config_src/solo_driver/user_surface_forcing.F90 0.00% <0.00%> (ø)
... and 144 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b0f83d...da5938e. Read the comment docs.

Build times for setting up the virtual environments can be very
expensive on the Arm64 Ubuntu nodes, so we now create a shared directory
for launching the environments.
@marshallward
Copy link
Collaborator Author

This is going nowhere fast. The latest problem is that Arm64 has no access to cached binaries of numpy, so is rebuilding it. This is causing timeouts with Travis, which is killing the job.

As re-building numpy was always my original worry about this approach, I am going to close this issue until something simpler can be done.

We have reworked the Makefile to conditionally test for required Python
modules in tc4.  If unavailable, we install these in a virtual
environment.

This does not address many scenarios, such as if Python 3 is missing,
venv is missing (as in Ubuntu), or handle the situation if they do not
exist.  It assumes that either the modules exist, or that they can be
installed by venv.

This should be seen as an iterative step to get things working on Travis
x86 and Arm64, as well as GFDL's Gaea and most user Linux platforms.
This resolves some issues with python2/3 resolution and limited support
of various platforms for module support.

Specifically, older platform with basic Python 3 support may not also
have numpy support.  In this case, we can defer back to Python 2 (or
whatever the system Python may be).
This patch moves the Python virtual environment configuration to the
main Makefile, which is setup at build time, rather than in the model
configuration Makefile, which will typically not have internet access if
run on a compute node.

As before, the venv will only be setup when the numpy and netCDF4
modules are unavailable.  A minor bug in the logic of the check has
also been fixed.
@marshallward marshallward reopened this Jul 24, 2020
@marshallward
Copy link
Collaborator Author

This PR has been reworked to conditionally set up the virtual environment when the numpy and netCDF4 Python modules are unavailable.

While this is not going to be tested in the Travis nodes, it should streamline usage on Gaea or platforms with minimal Python module support.

@Hallberg-NOAA could you test that the tc4 tests run for you?

@marshallward marshallward requested a review from adcroft July 24, 2020 00:27
Copy link
Collaborator

@adcroft adcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked this works on gaea with a mere module load cray-python.

@adcroft adcroft merged commit 6529752 into mom-ocean:dev/gfdl Jul 27, 2020
@marshallward marshallward deleted the tc4_venv branch July 27, 2020 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants