Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update E3SM-Project submodule #326

Merged
merged 1 commit into from
Mar 20, 2022

Conversation

xylar
Copy link
Collaborator

@xylar xylar commented Mar 18, 2022

This merge updates the E3SM-Project submodule to match today's E3SM/master
Commit hash: 7b87d1faa545f7da4e792058401ad3bc04434c85

Merges of interest to compass:

closes #308

@xylar xylar self-assigned this Mar 18, 2022
@xylar xylar requested a review from mark-petersen March 18, 2022 21:21
@xylar xylar added ocean python package DEPRECATED: PRs and Issues involving the python package (master branch) E3SM PR finished labels Mar 18, 2022
@xylar
Copy link
Collaborator Author

xylar commented Mar 18, 2022

Testing

I ran the pr test suite on Anvil with Intel and Intel-MPI (optimized). I used master with the previous E3SM-Project submodule as a baseline.

As expected, I'm seeing non-bit-for-bit changes in many global_ocean tests (those that use Redi).

Unfortunately, I'm also seeing non-bit-for-bit in a lot of other tests that shouldn't have been. All tests run successfully but the comparison fails for many:

00:08 PASS ocean_baroclinic_channel_10km_default
00:15 FAIL ocean_baroclinic_channel_10km_threads_test
00:14 FAIL ocean_baroclinic_channel_10km_decomp_test
00:16 FAIL ocean_baroclinic_channel_10km_restart_test
03:15 PASS ocean_global_convergence_cosine_bell
00:39 PASS ocean_global_ocean_QU240_mesh
00:31 PASS ocean_global_ocean_QU240_PHC_init
00:25 FAIL ocean_global_ocean_QU240_PHC_performance_test
00:51 FAIL ocean_global_ocean_QU240_PHC_restart_test
00:49 FAIL ocean_global_ocean_QU240_PHC_decomp_test
00:50 FAIL ocean_global_ocean_QU240_PHC_threads_test
00:36 FAIL ocean_global_ocean_QU240_PHC_analysis_test
00:55 FAIL ocean_global_ocean_QU240_PHC_dynamic_adjustment
01:46 PASS ocean_global_ocean_QU240_PHC_files_for_e3sm
00:29 FAIL ocean_global_ocean_QU240_PHC_RK4_performance_test
00:52 FAIL ocean_global_ocean_QU240_PHC_RK4_restart_test
00:48 FAIL ocean_global_ocean_QU240_PHC_RK4_decomp_test
00:50 FAIL ocean_global_ocean_QU240_PHC_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_PHC_init
00:26 FAIL ocean_global_ocean_QUwISC240_PHC_performance_test
00:00 PASS ocean_global_ocean_EC30to60_mesh
00:02 PASS ocean_global_ocean_EC30to60_PHC_init
01:10 FAIL ocean_global_ocean_EC30to60_PHC_performance_test
00:00 PASS ocean_global_ocean_ECwISC30to60_mesh
00:02 PASS ocean_global_ocean_ECwISC30to60_PHC_init
01:12 FAIL ocean_global_ocean_ECwISC30to60_PHC_performance_test
00:33 FAIL ocean_ice_shelf_2d_5km_z-star_restart_test
00:34 PASS ocean_ice_shelf_2d_5km_z-level_restart_test
01:36 FAIL ocean_isomip_plus_2km_z-star_Ocean0
00:12 PASS ocean_ziso_20km_default
00:11 FAIL ocean_ziso_20km_with_frazil

On my Ubuntu laptop, the nightly test suite with Gnu and MPICH passed all tests except the global ocean ones that use Redi, so these failures may be specific to Intel, and maybe specific versions.

@xylar
Copy link
Collaborator Author

xylar commented Mar 18, 2022

@sbrus89, I may need to bust out git bisect to track these down. Or I it might even be time to write compass bisect...

Copy link
Collaborator

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the nightly suite on grizzly with both gnu and intel 19. I compared the commit just before the Redi column one (E3SM-Project/E3SM@11f02f2aa7) against the previous baseline on this PR (E3SM-Project/E3SM@44814ae) and the PASSED the comparison on all tests. So it looks like @xylar's intel test failure was due to something in his set-up.

Then I compared the head (E3SM-Project/E3SM@7b87d1fa) to just before the Redi column one (E3SM-Project/E3SM@11f02f2aa7) and the tests with Redi did not match, as expected:

00:10 PASS ocean_baroclinic_channel_10km_default
00:16 PASS ocean_baroclinic_channel_10km_threads_test
00:15 PASS ocean_baroclinic_channel_10km_decomp_test
00:17 PASS ocean_baroclinic_channel_10km_restart_test
00:41 PASS ocean_global_ocean_QU240_mesh
00:31 PASS ocean_global_ocean_QU240_PHC_init
00:31 FAIL ocean_global_ocean_QU240_PHC_performance_test
01:04 FAIL ocean_global_ocean_QU240_PHC_restart_test
01:02 FAIL ocean_global_ocean_QU240_PHC_decomp_test
01:03 FAIL ocean_global_ocean_QU240_PHC_threads_test
00:48 FAIL ocean_global_ocean_QU240_PHC_analysis_test
00:30 FAIL ocean_global_ocean_QU240_PHC_RK4_performance_test
01:04 FAIL ocean_global_ocean_QU240_PHC_RK4_restart_test
01:02 FAIL ocean_global_ocean_QU240_PHC_RK4_decomp_test
01:02 FAIL ocean_global_ocean_QU240_PHC_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_PHC_init
00:33 FAIL ocean_global_ocean_QUwISC240_PHC_performance_test
00:33 PASS ocean_ice_shelf_2d_5km_z-star_restart_test
00:34 PASS ocean_ice_shelf_2d_5km_z-level_restart_test
00:15 PASS ocean_ziso_20km_default
00:13 PASS ocean_ziso_20km_with_frazil

but everything passed the execution.

@xylar
Copy link
Collaborator Author

xylar commented Mar 20, 2022

On Chrysalis with Intel and Intel-MPI, comparing the updated E3SM-Project submodule with the current submodule as a baseline, I'm seeing:

00:04 PASS ocean_baroclinic_channel_10km_default
00:07 PASS ocean_baroclinic_channel_10km_threads_test
00:07 PASS ocean_baroclinic_channel_10km_decomp_test
00:08 PASS ocean_baroclinic_channel_10km_restart_test
02:02 PASS ocean_global_convergence_cosine_bell
00:33 PASS ocean_global_ocean_QU240_mesh
00:24 PASS ocean_global_ocean_QU240_PHC_init
00:22 FAIL ocean_global_ocean_QU240_PHC_performance_test
00:44 FAIL ocean_global_ocean_QU240_PHC_restart_test
00:43 FAIL ocean_global_ocean_QU240_PHC_decomp_test
00:43 FAIL ocean_global_ocean_QU240_PHC_threads_test
00:29 FAIL ocean_global_ocean_QU240_PHC_analysis_test
00:47 FAIL ocean_global_ocean_QU240_PHC_dynamic_adjustment
01:02 PASS ocean_global_ocean_QU240_PHC_files_for_e3sm
00:23 FAIL ocean_global_ocean_QU240_PHC_RK4_performance_test
00:43 FAIL ocean_global_ocean_QU240_PHC_RK4_restart_test
00:42 FAIL ocean_global_ocean_QU240_PHC_RK4_decomp_test
00:42 FAIL ocean_global_ocean_QU240_PHC_RK4_threads_test
00:00 PASS ocean_global_ocean_QUwISC240_mesh
00:00 PASS ocean_global_ocean_QUwISC240_PHC_init
00:22 FAIL ocean_global_ocean_QUwISC240_PHC_performance_test
00:00 PASS ocean_global_ocean_EC30to60_mesh
00:01 PASS ocean_global_ocean_EC30to60_PHC_init
00:52 FAIL ocean_global_ocean_EC30to60_PHC_performance_test
00:00 PASS ocean_global_ocean_ECwISC30to60_mesh
00:02 PASS ocean_global_ocean_ECwISC30to60_PHC_init
00:57 FAIL ocean_global_ocean_ECwISC30to60_PHC_performance_test
00:17 PASS ocean_ice_shelf_2d_5km_z-star_restart_test
00:17 PASS ocean_ice_shelf_2d_5km_z-level_restart_test
01:01 PASS ocean_isomip_plus_2km_z-star_Ocean0
00:08 PASS ocean_ziso_20km_default
00:06 PASS ocean_ziso_20km_with_frazil

So the expected results. I will test the same with OpenMPI (with both Intel and Gnu) but everything looks good. It seems like Intel and Intel-MPI on Anvil is just producing non-bit-for-bit results, whereas everywhere else is happy. I think we can let it be.

@xylar
Copy link
Collaborator Author

xylar commented Mar 20, 2022

@mark-petersen, thanks very much for testing and approving!

@xylar
Copy link
Collaborator Author

xylar commented Mar 20, 2022

Everything looks good on Chrysalis (with Intel and Intel-MPI; Intel and OpenMPI; and Gnu and OpenMPI). Merging...

@xylar xylar merged commit b8aabe0 into MPAS-Dev:master Mar 20, 2022
@xylar xylar deleted the update_e3sm_project_submodule branch March 20, 2022 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E3SM PR finished ocean python package DEPRECATED: PRs and Issues involving the python package (master branch)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SST in WC14 stand-alone spin-up warms to 60C
2 participants