Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build error with new test SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu on gcp12 #3036

Closed
ndkeen opened this issue Oct 10, 2024 · 3 comments · Fixed by E3SM-Project/E3SM#6676

Comments

@ndkeen
Copy link
Contributor

ndkeen commented Oct 10, 2024

This just may require some configs for the machine as first time tried here.

SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu

In the e3sm build log, I do see:

No macro file found: /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/cmake_macros/gcp12.cmake
CMake Error at cmake/build_eamxx.cmake:34 (include):
  include could not find requested file:

    /home/ndk/E3SM/components/eamxx/cmake/machine-files/gcp12.cmake
Call Stack (most recent call first):
  CMakeLists.txt:125 (build_eamxx)

but as there are so many error/warings, not sure if it's the actual issue

SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeError.log

Determining if the Fortran sgemm exists failed with the following output:
Change Dir: /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_42ddf/fast && make  -f CMakeFiles/cmTC_42ddf.dir/build.make CMakeFiles/cmTC_42ddf.dir/build
make[1]: Entering directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2'
Building Fortran object CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/openmpi-4.1.4-lg57hjqli32cbgtyryq7cw6omdxfjtzy/bin/mpif90    -c /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2/testFortranCompiler.f -o CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o
Linking Fortran executable cmTC_42ddf
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/cmake-3.25.1-7z33y6jx4xrph64rva2louj3r3s6oaae/bin/cmake -E cmake_link_script CMakeFiles/cmTC_42ddf.dir/link.txt --verbose=1
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/openmpi-4.1.4-lg57hjqli32cbgtyryq7cw6omdxfjtzy/bin/mpif90 CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o -o cmTC_42ddf 
CMakeFiles/cmTC_42ddf.dir/testFortranCompiler.f.o: In function `MAIN__':
testFortranCompiler.f:(.text+0xa): undefined reference to `sgemm_'
collect2: error: ld returned 1 exit status
make[1]: *** [cmTC_42ddf] Error 1
make[1]: Leaving directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-wKzUi2'
gmake: *** [cmTC_42ddf/fast] Error 2



Determining if the MPICH_VERSION exist failed with the following output:
Change Dir: /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj

Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_33807/fast && make  -f CMakeFiles/cmTC_33807.dir/build.make CMakeFiles/cmTC_33807.dir/build
make[1]: Entering directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj'
Building C object CMakeFiles/cmTC_33807.dir/CheckSymbolExists.c.o
/opt/apps/spack/opt/spack/linux-centos7-zen2/gcc-12.2.0/openmpi-4.1.4-lg57hjqli32cbgtyryq7cw6omdxfjtzy/bin/mpicc   -mcmodel=medium  -o CMakeFiles/cmTC_33807.dir/CheckSymbolExists.c.o -c /home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c
/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c: In function 'main':
/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c:8:19: error: 'MPICH_VERSION' undeclared (first use in this function); did you mean 'MPI_VERSION'?
    8 |   return ((int*)(&MPICH_VERSION))[argc];
      |                   ^~~~~~~~~~~~~
      |                   MPI_VERSION
/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj/CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [CMakeFiles/cmTC_33807.dir/CheckSymbolExists.c.o] Error 1
make[1]: Leaving directory `/home/ndk/e3sm/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu.20241010_003548_ss6ai3/bld/cmake-bld/CMakeFiles/CMakeScratch/TryCompile-cyzuxj'

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 10, 2024

I think this is simply a case of needing to have a gcp12.cmake which can be moved from gcp.cmake. When I try that it builds, but get runtime error:

42: e3sm.exe: /home/ndk/E3SM/components/homme/src/share/cxx/GllFvRemapImpl.cpp:832: void Homme::GllFvRemapImpl::remap_tracer_dyn_to_fv_phys(int, int, const CPhys3T&, const Phys3T&): Assertion `qs_fv.extent_int(0) >= nelemd && qs_fv.extent_int(1) >= nf2 && qs_fv.extent_int(2) >= nq && qs_fv.extent_int(3) % packn == 0' failed.
42:
42: Program received signal SIGABRT: Process abort signal.

ie,
mv components/eamxx/cmake/machine-files/gcp.cmake components/eamxx/cmake/machine-files/gcp12.cmake

@ambrad
Copy link
Member

ambrad commented Oct 10, 2024

This error suggests that somehow the test has inconsistent compile-time and run-time sizes, either of number of tracers or number of levels. You could put a printf right above that assert that prints out all the numbers that are being used in that assert.

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 11, 2024

Thanks.
I tacked this into unrelated PR, but it only addresses the name of that cmake file.
I don't see anything obvious in that file that's different from others.
Was just thinking making progress here to avoid build error, then can make another issue with runtime error.

ndkeen added a commit to E3SM-Project/E3SM that referenced this issue Oct 14, 2024
…ts' into next (PR #6676)

For ne4 cases, use only 96 tasks as scream requires no more MPI's than number of elements.
SMS.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.pm-cpu_intel

Unrelated: Rename a machinefile to reflect machine name for gcp12 builds with scream.
For this change it fixes E3SM-Project/scream#3036 (at least the build issue)

[bfb]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants