-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conda fails to load for SystemTests #2111
Comments
Note this is something that @rgknox and I see, but @samsrabin and @adrifoster don't? So works for some and not for others. Still a good thing to figure out. Also there error checking for these situations could be better so that it's more obvious what is going on. |
I am seeing this error in my testing for ctsm5.1.dev136. The test |
This was the same error I got because I had a conda environment loaded
Can you try doing "conda deactivate" and then rebuilding?
…On Tue, Aug 22, 2023 at 11:34 AM Bill Sacks ***@***.***> wrote:
I am seeing this error in my testing for ctsm5.1.dev136. The test
FSURDATMODIFYCTSM_D_Mmpi-serial_Ld1.5x5_amazon.I2000Clm50SpRs.cheyenne_intel
fails in the SHAREDLIB_BUILD phase.
—
Reply to this email directly, view it on GitHub
<#2111 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE42IWRH7XJG3QUJOIZW4TXWTUUBANCNFSM6AAAAAA3SUUV7Y>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@adrifoster I think your error was actually the one given in #2109, no? |
@ekluzek and I worked on this a few days ago, and the conclusion (among other notes) was that there's probably something up with his environment. This may be biting all users with old Cheyenne accounts, as Adrianna and I don't experience the issue. This may end up taking a long time to debug. In the interim, I think I'll add a fallback to the original behavior if the new behavior fails. |
I don't think this was the case for me. I agree with the idea of doing a quick-ish workaround for now, not spending a lot of time on this. Time would probably better be spent figuring out what needs to be done to get this working on derecho, if it doesn't work out-of-the-box. |
A perplexing new development from some troubleshooting just now with @slevis-lmwg… I had him use the @billsacks @ekluzek @rgknox: Would any of y'all be willing to try this to confirm? |
I tried 4 things: From both out-of-the-box ctsm5.1.dev129, and from master but with fsurdatmodifyctsm.py backed out to the version in ctsm5.1.dev129, I tried both running
|
Could it be that something changed in the Cheyenne software stack? I wonder if it'd work on Izumi or Derecho. For now, it seems clear that this isn't actually an issue that I introduced with |
@samsrabin I'm not answering your question. I'm confirming that the test first failed in dev134, so after your changes had already passed. |
Or alternatively… perhaps I could make it so that |
@samsrabin yes, this is a point where we are stuck on a problem and not sure how to proceed. But, now I saw your new suggestion that is probably worth trying to see if it fixes it. Anyway, when we are stuck on an issue and don't know what to do, I think it's reasonable (and probably best practice) to remove that issue from a PR and leave it open until later -- but bring that PR in. We still want to fix this issue, but we can sideline it until we have a new idea on something to try. |
I'm a bit hesitant to wade more deeply into this since I've only been half-following, but I'd be interested in the group exploring whether we can use a different – and in my mind more robust and easier to maintain – solution as we transition to derecho: Rather than having individual system tests try to do something with your environment, instead require that the user has set up their python environment appropriately before running the given test. This could be done manually by the user, or maybe could be built into run_sys_tests – so moving any conda setup to run_sys_tests as part of the subprocess where we kick off the create_test job. See also ESMCI/cime#4059 |
Adding another data point: I tested out-of-the box dev129 and dev136 and saw similar build failures and the same stacktrace as what @billsacks reported above. The only difference between the two tags were the preambles leading up to the stacktrace; the dev136 failure message didn't have the
UPDATE: adding full
Per @samsrabin request, the contents of the dev136 fsurdat_modifier.log:
Location of test run folders:
|
@glemieux Good news! It looks like your |
Note, that @johnpaulalex ran into this as well. But, I suspect that deactivating his conda environment before running run_sys_tests might have got it to work for him? He tried a bunch of things, that I'll put below. And hopefully the fix @samsrabin has will also work for him. These problems that "work for me, but not thee", are a real pain to figure out. And everyone's environment seems to be different enough that we run into different issues. Here's John's experience. I don't think there's anything new here, but putting it in, in case it's helpful...
# This file is for user convenience only and is not used by the model
# Changes to this file will be ignored and overwritten
# Changes to the environment should be made in env_mach_specific.xml
# Run ./case.setup --reset to regenerate this file
. /glade/u/apps/ch/opt/lmod/7.5.3/lmod/lmod/init/sh
module purge
module load ncarenv/1.3 python/3.7.9 cmake/3.22.0 intel/19.1.1 esmf_libs mkl mpi-serial/2.3.0
module use /glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1/
module load esmf-8.4.1b02-ncdfio-mpiuni-g ncarcompilers/0.5.0 netcdf/4.9.0 pio/2.5.10d
export OMP_STACKSIZE=1024M
export TMPDIR=/glade/scratch/jpalex
export MPI_TYPE_DEPTH=16
export MPI_USE_ARRAY=None
export ESMF_RUNTIME_PROFILE=ON
export ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY
export UGCSINPUTPATH=/glade/work/turuncu/FV3GFS/benchmark-inputs/2012010100/gfs/fcst
export UGCSFIXEDFILEPATH=/glade/work/turuncu/FV3GFS/fix_am
export UGCSADDONPATH=/glade/work/turuncu/FV3GFS/addon
export OMP_WAIT_POLICY=PASSIVE
export MPI_DSM_VERBOSE=true
|
@johnpaulalex, my fix is at #2125. Would you be able to give that a try? |
Hey Sam, I'm not confident in my choice of git/manage_externals commands, but in this case I did a git merge of your fix onto my branch, then ran that one test: and its TestStatus.log says:
...which is at least a different error :) but I'm not sure if it means it got past the 'conda activate' command or not. |
Looks like it did, hurray! I can tell because of the presence of |
* Add system and unit tests for making fsurdat with all crops everywhere (#2081) * Rework master_list* files etc. (#2087) * Fixes to methane Tech Note (#2091) * Add is_doy_in_interval() function (#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (#2079) * Rework master_list_(no)?fates.rst? (#2083) * conda run -n can fail if a conda environment is already active (#2109) * conda fails to load for SystemTests (#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111) # Conflicts: # src/biogeochem/CNBalanceCheckMod.F90 # src/biogeochem/CNCIsoFluxMod.F90 # src/biogeochem/CNDriverMod.F90 # src/biogeochem/CNPhenologyMod.F90 # src/biogeochem/CNProductsMod.F90 # src/biogeochem/CNVegCarbonFluxType.F90 # src/biogeochem/CNVegNitrogenFluxType.F90 # src/biogeochem/EDBGCDynMod.F90 # src/main/clm_initializeMod.F90 # src/main/controlMod.F90 # src/soilbiogeochem/SoilBiogeochemDecompCascadeBGCMod.F90
b4b changes to Python scripts, history lists, tech note, and clm_time_manager. * Add system and unit tests for making fsurdat with all crops everywhere (ESCOMP#2081) * Rework master_list* files etc. (ESCOMP#2087) * Fixes to methane Tech Note (ESCOMP#2091) * Add is_doy_in_interval() function (ESCOMP#2158) * Avoid using subprocess.run() in FSURDATMODIFYCTSM (ESCOMP#2125) Closes issues: * Add unit test for making fsurdat with all crops everywhere (ESCOMP#2079) * Rework master_list_(no)?fates.rst? (ESCOMP#2083) * conda run -n can fail if a conda environment is already active (ESCOMP#2109) * conda fails to load for SystemTests (ESCOMP#2111)
Brief summary of bug
For some users, FSURDATMODIFYCTSM is failing because
conda
(called in a subprocess from Python) doesn't load. Note that this is a different issue (and gives a different sort of error) from #2109.General bug information
CTSM version you are using:
ctsm5.1.dev133-47-g6925f8cc9
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected: FSURDATMODIFYCTSM and RXCROPMATURITY tests.
Details of bug
@rgknox and @ekluzek (#1959), as well as @slevis-lmwg (#2106), are all affected. This doesn't happen for me or, I think, @adrifoster. I think it's thus something to do with our shell environments.
We suspect this stems from changes introduced in
cime_config/SystemTests/
inctsm5.1.dev131
(as was the case for #2109). Themodule unload python; module load conda;
step fails to loadconda
, so then the call ofconda run -n
fails and ends the test. However, that module un/loading step was present before, so there's something subtle going on.Important details of your setup / configuration so we can reproduce the bug
Unknown.
Important output or errors that show the problem
From TestStatus.log:
And fsurdat_modifier.log:
The text was updated successfully, but these errors were encountered: