Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More mira port #258

Merged
merged 12 commits into from
Jul 19, 2016
Merged

More mira port #258

merged 12 commits into from
Jul 19, 2016

Conversation

jedwards4b
Copy link
Contributor

@jedwards4b jedwards4b commented Jul 14, 2016

This change gets a basic case setup working on mira again. However the workflow - running st_archive and then resubmitting does not yet work. This means that ERR and ERI tests will not work. mpi-serial is also not supported on this platform so most of the scripts_regression_tests will also fail.

Also including updates (machine file updates) for machines rarely used/tested - Updating batch system type for babbage, brutus, eos, erebus, hera, pleiades-wes, sierra, titan.

Also build output is now logged, streaming it to a file, instead of buffering it in memory.

Test suite:
Test baseline:
Test namelist changes:
Test status: [bit for bit, roundoff, climate changing]

Closes #245
User interface changes?:

Code review: jayesh

@jayeshkrishna
Copy link
Contributor

I am going to try testing this change using CIME_MODEL=cesm with the following case,

./create_newcase -case cime_sanity_check_mira -compset X -res f19_g16

@jayeshkrishna
Copy link
Contributor

This PR also seems to have changes for other (not mira) machines. Is that intentional?

@@ -142,21 +143,21 @@
</batch_system>

<!-- babbage is PBS -->
<batch_system MACH="babbage" version="x.y">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated some config_batch.xml for machines that are rarely used or tested.

@jayeshkrishna
Copy link
Contributor

@jedwards4b : I updated the PR message with the information (changes not related to mira) above. Please go through it and feel free to update/modify the new comments.

@jedwards4b
Copy link
Contributor Author

looks good, thank you

@jayeshkrishna
Copy link
Contributor

I have submitted a job on Cetus for CIME_MODEL=cesm (create/setup/build/submit succeeds).

The ./create_newcase command failed with CIME_MODEL=acme . This will need to be pursued as part of Issue #245 . The output from create_newcase (CIME_MODEL=acme) is shown below,

[jayesh@miralac1 scripts (master>)]$ ./create_newcase -case acme_sanity_check_mira -compset X -res f19_g16                       
Compset longname is 2000_XATM_XLND_XICE_XOCN_XROF_XGLC_XWAV 
Compset specification file is /gpfs/mira-home/jayesh/acme/ESMCI_cime_merge/scripts/Tools/../../driver_cpl/cime_config/config_compsets.xml
Pes     specification file is /gpfs/mira-home/jayesh/acme/ESMCI_cime_merge/scripts/Tools/../../driver_cpl/cime_config/config_pes.xml
Pes setting: grid          is a%1.9x2.5_l%1.9x2.5_oi%gx1v6_r%r05_m%gx1v6_g%null_w%null 
Pes setting: compset       is 2000_XATM_XLND_XICE_XOCN_XROF_XGLC_XWAV 
Pes setting: grid match    is any 
Pes setting: machine match is mira 
Pes setting: compset_match is [DX]ATM 
Pes setting: pesize match  is any 
ERROR: Command: '/usr/bin/xmllint --format --output /gpfs/mira-home/jayesh/acme/ESMCI_cime_merge/scripts/acme_sanity_check_mira/env_mach_pes.xml -' failed. See terminal output

@jedwards4b
Copy link
Contributor Author

The PR has been updated with a fix for the above, please try again.

@jayeshkrishna
Copy link
Contributor

Got the following error with CIME_MODEL=cesm (in the job error output),

Not able to fully resolve item '/usr/bin/runjob   $LOCARGS  --envs OMP_STACKSIZE=64M  --envs OMP_NUM_THREADS=8  --envs BG_THREADLAYOUT=1  --label short  -p 8  -n 8 :  /projects/ClimateEnergy/usr/jayesh/cesm_sanity_check_mira/bld/acme.exe  >> acme.log.160714-213314 2>&1 '
ERROR: Command: '/usr/bin/runjob   $LOCARGS  --envs OMP_STACKSIZE=64M  --envs OMP_NUM_THREADS=8  --envs BG_THREADLAYOUT=1  --label short  -p 8  -n 8 :  /projects/ClimateEnergy/usr/jayesh/cesm_sanity_check_mira/bld/acme.exe  >> acme.log.160714-213314 2>&1 ' failed from dir '/projects/ClimateEnergy/usr/jayesh/cesm_sanity_check_mira/run'. See terminal output

@jedwards4b : Can you build and run any case on Mira/Cetus with CIME_MODEL=cesm ? I could try the case that you are using in your tests to see if the fix works.

@jayeshkrishna
Copy link
Contributor

I have submitted a job (case creates+builds+submits correctly) on Mira for CIME_MODEL=cesm .
This PR still does not fix issues for CIME_MODEL=acme. I can work on that since CIME_MODEL=cesm is working now.

@jedwards4b
Copy link
Contributor Author

I'm sorry I thought that it would work for acme - what problem are you seeing?

@jedwards4b jedwards4b merged commit 84ebf2d into ESMCI:master Jul 19, 2016
@jayeshkrishna
Copy link
Contributor

No worries, I will start looking at the issues (the first one was due to missing PES_PER_NODE tag). Let me try to get it working and will let you know the details.

@jayeshkrishna
Copy link
Contributor

I got the following error at runtime (CIME_MODEL=cesm) after merging the branch locally to master,

Not able to fully resolve item '/usr/bin/runjob   $LOCARGS  --envs OMP_STACKSIZE=64M  --envs OMP_NUM_THREADS=8  --envs BG_THREADLAYOUT=1  --label short  -p 8  -n 8 :  /projects/ClimateEnergy/usr/jayesh/cesm_sanity_check_mira/bld/acme.exe  >> acme.log.160719-210632 2>&1 '
ERROR: Command: '/usr/bin/runjob   $LOCARGS  --envs OMP_STACKSIZE=64M  --envs OMP_NUM_THREADS=8  --envs BG_THREADLAYOUT=1  --label short  -p 8  -n 8 :  /projects/ClimateEnergy/usr/jayesh/cesm_sanity_check_mira/bld/acme.exe  >> acme.log.160719-210632 2>&1 ' failed from dir '/projects/ClimateEnergy/usr/jayesh/cesm_sanity_check_mira/run'. See terminal output

Will try it with master and see how it goes.

@jedwards4b
Copy link
Contributor Author

The "not able to fully resolve" is not an error, it is expected on mira because of the LOCARGS flag.
Look in acme.log.160719-210632 for the real problem.

@jayeshkrishna
Copy link
Contributor

jayeshkrishna commented Jul 20, 2016

The error message above is with CIME_MODEL=cesm, and I only have JOBID.{output|error|cobaltlog} in the case directory
(PS: Meanwhile, I built it on Mira and ran it on Cetus - not sure if that makes a difference)

@jedwards4b
Copy link
Contributor Author

the acme.log file will be in the run directory

@jayeshkrishna
Copy link
Contributor

Looks like an issue in my environment, although I had set CIME_MODEL=cesm while building the model my shell startup files had CIME_MODEL=acme. After changing it to CIME_MODEL=cesm, the job ran successfully.
Going to try the ACME side now.

@rljacob
Copy link
Member

rljacob commented Jul 20, 2016

Did this also fix #253 ?

@jedwards4b
Copy link
Contributor Author

I believe so

jayeshkrishna added a commit that referenced this pull request Jul 21, 2016
Several changes to get Mira working was added in
1f9f0b3 . However some build
and runtime issues remained with CIME_MODEL=acme . This commit
adds the remaining fixes required to get the code working on
Mira for CIME_MODEL=acme.

* Fixed the batch submit command
* Added the PES_PER_NODE tag
* Fixed the name of the default project (now called
  HiRes_EarthSys_2)
* Modified the locargs argument for runjob (--block)

These fixes are already present in cesm (CIME_MODEL=cesm).

Also see Issue #245 and PR #258
@jedwards4b jedwards4b deleted the more_mira_port branch July 26, 2016 20:32
pesieber pushed a commit to pesieber/cime that referenced this pull request Mar 15, 2023
This simply uses github's suggested code of conduct, which looks good to me.

Fixes ESMCI#258
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants