Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using subprocess.run() in FSURDATMODIFYCTSM #2125

Merged
merged 21 commits into from
Sep 19, 2023

Conversation

samsrabin
Copy link
Collaborator

@samsrabin samsrabin commented Aug 29, 2023

Issues #2109 and #2111 seem to stem from idiosyncrasies of different users' Cheyenne environments that become important when subprocess.run() is called. This PR removes the use of subprocess.run() from the FSURDATMODIFYCTSM test, improving robustness and resolving those issues.

Description of changes

Instead of starting a new subprocess shell in which fsurdat_modifier is called, this PR makes it so the command is called directly by Python itself. This does require that all the Python dependencies are loaded, which can be accomplished by activating the ctsm_pylib environment before calling run_sys_tests or cime/scripts/create_test.

Unfortunately, this method can't be used to fix RXCROPMATURITY failing for some users, even though that's also due to environment. Hopefully that will resolve itself once we move to Derecho.

Specific notes

Remaining tasks (not including testing):

  • Make FSURDATMODIFYCTSM call fsurdat_modifier directly.

Contributors other than yourself, if any:

CTSM Issues Fixed:

Are answers expected to change (and if so in what way)? No.

Any User Interface Changes (namelist or namelist defaults changes)? No.

Testing

FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel:

For tests that invoke cmds_to_setup_conda(), manually calling the script invoking that function (e.g., case.build for FSURDATMODIFYCTSM) could fail if doing so with a conda environment already activated. The problem is that
    conda run -n ctsm_pylib
seems to not actually use ctsm_pylib if, for instance the conda base environment is active. Instead doing
    CONDA_PREFIX=
    conda run -n ctsm_pylib
seems to work.
This avoids using subprocess.run(), which should hopefully reduce issues related to user environment. However, it does require that all the Python dependencies are loaded. This can be accomplished by activating the ctsm_pylib environment before calling run_sys_tests or cime/scripts/create_test.
@samsrabin samsrabin added code health improving internal code structure to make easier to maintain (sustainability) bug something is working incorrectly testing additions or changes to tests labels Aug 29, 2023
@samsrabin samsrabin self-assigned this Aug 29, 2023
@samsrabin
Copy link
Collaborator Author

Before I continue down this path, I want to make sure FSURDATMODIFYCTSM is working for everyone. It does work for me, but I didn't experience either of the previous issues.

@adrifoster @glemieux Would one of y'all be willing to test this to see if you hit issue #2109?
@billsacks @ekluzek @rgknox @slevis-lmwg Would one of y'all be willing to test this to see if you hit issue #2111?

Specifically, what we need to see is that fsurdat.nc and done_FSURDATMODIFYCTSM_setup.txt are generated in the case directory of test FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel. This happens early in the SHAREDLIB_BUILD phase, so it should be pretty quick to tell one way or the other.

Note that you must have all the requisite Python dependencies loaded. To ensure this, please test with the ctsm_pylib conda environment active when calling run_sys_tests or cime/scripts/create_test. If you could test with both of those methods, that'd be great. Thanks!

…sts.

Specifically, FSURDATMODIFYCTSM and RXCROPMATURITY.
@ekluzek
Copy link
Collaborator

ekluzek commented Sep 1, 2023

I tried it again and the build phase works!

However, it failed at the run phase, because numpy wasn't loaded. This is because the run phase is sent to the share queue as a separate thing, so it probably needs to load the conda env again. I also tried running case.submit with --no-batch, but that didn't work either. I think it might still spawn off a separate process even in that case, although it doesn't go into the queue. I'm actually also suspicious that there is a bug in cime for no-batch.

I also tried Derecho, but that had multiple problems at this point.

My case for Cheyenne is here:

/glade/scratch/erik/tests_0901-095751ch/FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel.0901-095751ch

@samsrabin
Copy link
Collaborator Author

Unfortunately I'm seeing this as well. I guess the run phase re-imports fsurdatmodifyctsm.py, even though it doesn't extend run_phase(). (Of course, there's no way for CIME to know that, so this behavior makes sense.)

I've unchecked "works for me" in the PR description.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 1, 2023

By the way @samsrabin thanks for working on this. It's tricky for all of us, but important to get figured out. If it's helpful to get some of us together to brainstorm let us know.

This avoids "numpy not found" error for FSURDATMODIFYCTSM, but this isn't a
solution for RXCROPMATURITY, because that test actually does need the right
conda environment during the run phase (which is when generate_gdds.py is
called).
@ekluzek
Copy link
Collaborator

ekluzek commented Sep 1, 2023

Awesome. Latest update now works for me!

PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel CREATE_NEWCASE
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel XML
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel SETUP
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel SHAREDLIB_BUILD time=252
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel MODEL_BUILD time=48
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel SUBMIT
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel RUN time=135
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel MEMLEAK insuffiencient data for memleak test
PASS FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel SHORT_TERM_ARCHIVER

Comment on lines 8 to 13
# Import the CTSM Python utilities
_CTSM_PYTHON = os.path.join(
os.path.dirname(os.path.realpath(__file__)), os.pardir, os.pardir, os.pardir, "python"
)
sys.path.insert(1, _CTSM_PYTHON)
import ctsm.crop_calendars.cropcal_utils as utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you speak to why this is needed? In other places we can import stuff throughout the ctsm python library without doing path manipulation - e.g., see job_launcher_qsub.py (for an example in a subdirectory of python/ctsm, like this one is). Is it possible that you need an __init__.py in the crop_calendars directory to enable this??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; not necessary! I had that bit in my head from adding it to fsurdatmodifyctsm.py, but that was only necessary because it's not in python/. My latest commit removes these unneeded lines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I still see the path logic, am I missing something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the thing we should discuss as a group.

@samsrabin
Copy link
Collaborator Author

samsrabin commented Sep 6, 2023

It's looking like this solution can't be applied to RXCROPMATURITY because it calls external Python scripts during the run phase; see discussion here. I'm leaning towards bringing in the FSURDATMODIFYCTSM solution now, hoping that Derecho magically fixes things for RXCROPMATURITY—I'm not sure banging my head against this any more is worth it for a test that hardly anyone will run.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 6, 2023

That makes sense to me @samsrabin. The RxCropMaturity test will only be run in the ctsm_sci test list right? Or maybe another special test list right?

We also might want to try this out on Derecho and see if it's a problem there. If it isn't there isn't a need to worry about this for Cheyenne.

Some of these things can be difficult to figure out, so there's a point when we should just punt and move on...

@samsrabin
Copy link
Collaborator Author

That's right @ekluzek, it's only run in the ctsm_sci suite.

I agree that it's worth testing on Derecho, but I guess we have some work to do before that's possible.

@samsrabin samsrabin requested review from ekluzek and removed request for ekluzek September 14, 2023 17:11
This commit improves organization of cmds_to_setup_conda() and tries to fall back to the original "conda activate" method if "conda run" fails.
Those are necessary for when the crop calendar scripts are being called on their own, from outside the CTSM repo.
This reverts commit 9893c80.
@samsrabin samsrabin marked this pull request as ready for review September 15, 2023 14:39
@samsrabin samsrabin requested a review from ekluzek September 15, 2023 14:39
@samsrabin samsrabin changed the title Avoid using subprocess.run() in SystemTests Avoid using subprocess.run() in FSURDATMODIFYCTSM Sep 15, 2023
Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have at least one suggestion about a comment. And a question about the test_conda_retry logical. I also reopened the conversation Bill had about the CTSM path stuff, which looks like it's still there. And I think it can be removed in that place as well as others. In any case I'm looking forward to our discussion to go over this.

cime_config/SystemTests/systemtest_utils.py Outdated Show resolved Hide resolved
cime_config/SystemTests/systemtest_utils.py Outdated Show resolved Hide resolved
Comment on lines 8 to 13
# Import the CTSM Python utilities
_CTSM_PYTHON = os.path.join(
os.path.dirname(os.path.realpath(__file__)), os.pardir, os.pardir, os.pardir, "python"
)
sys.path.insert(1, _CTSM_PYTHON)
import ctsm.crop_calendars.cropcal_utils as utils
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I still see the path logic, am I missing something?

cime_config/SystemTests/systemtest_utils.py Outdated Show resolved Hide resolved
@samsrabin
Copy link
Collaborator Author

clm_pymods tests still pass after latest commits.

@samsrabin samsrabin requested a review from ekluzek September 18, 2023 21:51
Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samsrabin and I went over this earlier. And he made changes according to my suggestions.

The one thing that I thought we should discuss as a group is how to handle setting the path for python for these type of system tests that now need to manipulate the path. We didn't have this before because the top level tool skeleton handled it. So there it's in one place. Here we need a better way to put it in one place. One way to do that would be to set the path for python using an env variable. I can think of other ways to do it as well. We should decide as a group and then make an issue to change it to that method.

samsrabin added a commit that referenced this pull request Sep 19, 2023
* Add system and unit tests for making fsurdat with all crops everywhere (#2081)
* Rework master_list* files etc. (#2087)
* Fixes to methane Tech Note (#2091)
* Add is_doy_in_interval() function (#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (#2079)
* Rework master_list_(no)?fates.rst? (#2083)
* conda run -n can fail if a conda environment is already active (#2109)
* conda fails to load for SystemTests (#2111)
@samsrabin samsrabin merged commit a207713 into ESCOMP:master Sep 19, 2023
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Sep 19, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Sep 20, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Sep 21, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Sep 27, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Oct 2, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Oct 3, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Oct 4, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Oct 5, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Dec 23, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)

# Conflicts:
#	src/biogeochem/CNBalanceCheckMod.F90
#	src/biogeochem/CNCIsoFluxMod.F90
#	src/biogeochem/CNDriverMod.F90
#	src/biogeochem/CNPhenologyMod.F90
#	src/biogeochem/CNProductsMod.F90
#	src/biogeochem/CNVegCarbonFluxType.F90
#	src/biogeochem/CNVegNitrogenFluxType.F90
#	src/biogeochem/EDBGCDynMod.F90
#	src/main/clm_initializeMod.F90
#	src/main/controlMod.F90
#	src/soilbiogeochem/SoilBiogeochemDecompCascadeBGCMod.F90
samsrabin added a commit to samsrabin/CTSM that referenced this pull request Dec 23, 2023
b4b changes to Python scripts, history lists, tech note, and clm_time_manager.

* Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081)
* Rework master_list* files etc. (ESCOMP#2087)
* Fixes to methane Tech Note (ESCOMP#2091)
* Add is_doy_in_interval() function (ESCOMP#2158)
* Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125)

Closes issues:
* Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079)
* Rework master_list_(no)?fates.rst? (ESCOMP#2083)
* conda run -n can fail if a conda environment is already active (ESCOMP#2109)
* conda fails to load for SystemTests (ESCOMP#2111)
@samsrabin samsrabin added simple bfb bit-for-bit labels Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bfb bit-for-bit bug something is working incorrectly code health improving internal code structure to make easier to maintain (sustainability) testing additions or changes to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

conda fails to load for SystemTests conda run -n can fail if a conda environment is already active
3 participants