Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_from_HPSS* test cases fail. #349

Closed
danielabdi-noaa opened this issue Sep 13, 2022 · 1 comment
Closed

get_from_HPSS* test cases fail. #349

danielabdi-noaa opened this issue Sep 13, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@danielabdi-noaa
Copy link
Collaborator

danielabdi-noaa commented Sep 13, 2022

Expected behavior

get_from_HPSS* test cases should run to completion first time they are run in a WE2E regression test. Often they fail the first time they are run along with other test cases, but re-running them individually later succeeds.

Current behavior

Test cases fail to run to completion on a WE2E regression test.

Machines affected

Tested on HERA but most likely issue is present on other platforms as well.

Steps To Reproduce

Run the comprhensive WE2E regression test

Detailed Description of Fix (optional)

None

Additional Information (optional)

None

Possible Implementation (optional)

None

Output (optional)

get_extrn_ics fails with a variety of errors

INFO: Running command
 htar -xvf /NCEPPROD/hpssprod/runhistory/rh2019/201907/20190701/gpfs_dell1_nco_ops_com_gfs_prod_gfs.20190701_00.gfs_nemsioa.tar ./gfs.20190701/00/gfs.t00z.atmanl.nemsio ./gfs.20190701/00/gfs.t00z.sfcanl.nemsio

[connecting to hpsscore1.fairmont.rdhpcs.noaa.gov/1217]
slurmstepd: error: *** JOB 35521415 ON hfe12 CANCELLED AT 2022-09-08T17:09:30 DUE TO TIME LIMIT ***
_______________________________________________________________
Start Epilog v20.08.28 on node hfe12 for job 35521415 :: Thu Sep 8 17:09:30 UTC 2022
Job 35521415 (serial) finished for user Michael.Kavulich in partition service with exit code 0:15
_______________________________________________________________
End Epilogue v20.08.28 Thu Sep 8 17:09:30 UTC 2022

Sometimes like this:

143 Traceback (most recent call last):
144   File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 998, in <module>
145     main(sys.argv[1:])
146   File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 830, in main
147     unavailable = hpss_requested_files(
148   File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 565, in hpss_requested_files
149     clean_up_output_dir(
150   File "/scratch2/BMC/gsd-hpcs/Daniel.Abdi/ufs-srweather-app/ush/retrieve_data.py", line 64, in clean_up_output_dir
151     os.removedirs(expected_subdir)
152   File "/contrib/miniconda3/4.5.12/envs/regional_workflow/lib/python3.8/os.py", line 239, in removedirs
153     rmdir(name)
154 OSError: [Errno 39] Directory not empty: './gfs.20190701/00'
155 + print_err_msg_exit 'Call to retrieve_data.py failed with a non-zero exit status.
@danielabdi-noaa danielabdi-noaa added the bug Something isn't working label Sep 13, 2022
@danielabdi-noaa
Copy link
Collaborator Author

The second part of the error turns out to be due to ics and lbcs using same locations in NCO mode, so only issue is currently due to timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant