Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel-oneapi/2024.0.2 ICE #3

Closed
jedwards4b opened this issue Aug 13, 2024 · 5 comments
Closed

intel-oneapi/2024.0.2 ICE #3

jedwards4b opened this issue Aug 13, 2024 · 5 comments

Comments

@jedwards4b
Copy link
Collaborator

Opening an issue here to track the problem. It first appeared in a cesm build using the dev/ncar_0.0.3 tag but since I can reproduce it on master that's how I reported it.

Here is a reproducer:

Currently Loaded Modules:

  1. cesmdev/1.0 (H,S) 4) cmake/3.26.3 7) ncarcompilers/1.0.0 10) netcdf-mpi/4.9.2 13) esmf/8.6.0
  2. ncarenv/23.09 (S) 5) intel-oneapi/2024.0.2 8) cray-mpich/8.1.27 11) parallel-netcdf/1.12.3
  3. craype/2.7.31 6) mkl/2024.0.0 9) hdf5-mpi/1.14.3 12) parallelio/2.6.2

Where:
S: Module is Sticky, requires --force to unload or purge
H: Hidden Module

git clone https://github.com/ESCOMP/FMS
mkdir bld
cd bld
cmake ../FMS
make

cmake ../FMS/
-- The C compiler identification is IntelLLVM 2024.0.2
-- The Fortran compiler identification is IntelLLVM 2024.0.2
-- Cray Programming Environment 2.7.31 C
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /glade/u/apps/derecho/23.09/spack/opt/spack/ncarcompilers/1.0.0/oneapi/2024.0.2/3szf/bin/icx - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Cray Programming Environment 2.7.31 Fortran
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /glade/u/apps/derecho/23.09/spack/opt/spack/ncarcompilers/1.0.0/oneapi/2024.0.2/3szf/bin/ifx - skipped
-- Setting build type to 'Release' as none was specified.
-- Found MPI_C: /glade/u/apps/derecho/23.09/spack/opt/spack/ncarcompilers/1.0.0/oneapi/2024.0.2/3szf/bin/icx (found version "3.1")
-- Found MPI_Fortran: /glade/u/apps/derecho/23.09/spack/opt/spack/ncarcompilers/1.0.0/oneapi/2024.0.2/3szf/bin/ifx (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: C Fortran
-- Found NetCDF: /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5/include (found version "4.9.2") found components: C Fortran
-- FindNetCDF defines targets:
-- - NetCDF_VERSION [4.9.2]
-- - NetCDF_PARALLEL [TRUE]
-- - NetCDF_C_CONFIG_EXECUTABLE [/glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5/bin/nc-config]
-- - NetCDF::NetCDF_C [SHARED] [Root: /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5] Lib: /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5/lib/libnetcdf.so
-- - NetCDF_Fortran_CONFIG_EXECUTABLE [/glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5/bin/nf-config]
-- - NetCDF::NetCDF_Fortran [SHARED] [Root: /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5] Lib: /glade/u/apps/derecho/23.09/spack/opt/spack/netcdf/4.9.2/cray-mpich/8.1.27/oneapi/2024.0.2/fmc5/lib/libnetcdff.so
-- Looking for gettid
-- Looking for gettid - found
-- Configuring done (14.2s)
-- Generating done (1.0s)
-- Build files have been written to: /glade/derecho/scratch/jedwards/fmsbug/bld
derecho3: /glade/derecho/scratch/jedwards/fmsbug/bld
:) make
[ 1%] Building C object CMakeFiles/fms_r4_c.dir/affinity/affinity.c.o
[ 2%] Building C object CMakeFiles/fms_r4_c.dir/fms/fms_stacksize.c.o
[ 3%] Building C object CMakeFiles/fms_r4_c.dir/mosaic/create_xgrid.c.o
/glade/derecho/scratch/jedwards/fmsbug/FMS/mosaic/create_xgrid.c:42:1: warning: '/' within block comment [-Wcomment]
42 | /
******************************************************************************
| ^
1 warning generated.
[ 4%] Building C object CMakeFiles/fms_r4_c.dir/mosaic/gradient_c2l.c.o
[ 5%] Building C object CMakeFiles/fms_r4_c.dir/mosaic/interp.c.o
[ 6%] Building C object CMakeFiles/fms_r4_c.dir/mosaic/mosaic_util.c.o
[ 7%] Building C object CMakeFiles/fms_r4_c.dir/mosaic/read_mosaic.c.o
[ 8%] Building C object CMakeFiles/fms_r4_c.dir/mpp/mpp_memuse.c.o
[ 9%] Building C object CMakeFiles/fms_r4_c.dir/parser/yaml_parser_binding.c.o
[ 10%] Building C object CMakeFiles/fms_r4_c.dir/parser/yaml_output_functions.c.o
[ 11%] Building C object CMakeFiles/fms_r4_c.dir/string_utils/fms_string_utils_binding.c.o
[ 11%] Built target fms_r4_c
[ 12%] Building Fortran object CMakeFiles/fms_r4_f.dir/platform/platform.F90.o
Using 8-byte addressing
Using pure routines.
Using allocatable derived type array members.
Using cray pointers.
[ 13%] Building Fortran object CMakeFiles/fms_r4_f.dir/mpp/mpp_parameter.F90.o
[ 14%] Building Fortran object CMakeFiles/fms_r4_f.dir/mpp/mpp_data.F90.o
[ 15%] Building Fortran object CMakeFiles/fms_r4_f.dir/mpp/mpp.F90.o
[ 16%] Building Fortran object CMakeFiles/fms_r4_f.dir/constants/fmsconstants.F90.o
[ 17%] Building Fortran object CMakeFiles/fms_r4_f.dir/constants/constants.F90.o
[ 18%] Building Fortran object CMakeFiles/fms_r4_f.dir/string_utils/fms_string_utils.F90.o
[ 19%] Building Fortran object CMakeFiles/fms_r4_f.dir/mpp/mpp_efp.F90.o
[ 20%] Building Fortran object CMakeFiles/fms_r4_f.dir/mpp/mpp_memutils.F90.o
[ 21%] Building Fortran object CMakeFiles/fms_r4_f.dir/mpp/mpp_domains.F90.o
[ 22%] Building Fortran object CMakeFiles/fms_r4_f.dir/fms2_io/fms_io_utils.F90.o
[ 23%] Building Fortran object CMakeFiles/fms_r4_f.dir/fms2_io/netcdf_io.F90.o
#0 0x000000000232d4ea
#1 0x0000000002394d07
#2 0x0000000002394e30
#3 0x00007f568f22fd50
#4 0x00000000028270fa
#5 0x000000000282243c
NOAA-GFDL#6 0x0000000002821a05
NOAA-GFDL#7 0x000000000281e9af
NOAA-GFDL#8 0x00000000027db423
NOAA-GFDL#9 0x00000000026a36dd
NOAA-GFDL#10 0x00000000026aa9df
NOAA-GFDL#11 0x00000000026a4e49
NOAA-GFDL#12 0x00000000022c9ffb
NOAA-GFDL#13 0x00000000022c7c67
NOAA-GFDL#14 0x0000000002270263
NOAA-GFDL#15 0x0000000002452dbe
NOAA-GFDL#16 0x00007f568f21a29d __libc_start_main + 239
NOAA-GFDL#17 0x00000000020ab129

/glade/derecho/scratch/jedwards/tmp/ifx1589109924PeUwvK/ifxGFgVwM.i90: error #5633: Internal compiler error: segmentation violation signal raised Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error.
compilation aborted for /glade/derecho/scratch/jedwards/fmsbug/FMS/fms2_io/netcdf_io.F90 (code 3)
make[2]: *** [CMakeFiles/fms_r4_f.dir/build.make:699: CMakeFiles/fms_r4_f.dir/fms2_io/netcdf_io.F90.o] Error 3
make[1]: *** [CMakeFiles/Makefile2:113: CMakeFiles/fms_r4_f.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

@jedwards4b
Copy link
Collaborator Author

I found a work around for CESM
In Depends.intel-oneapi:

# FMS objects that ICE with -O2                                                                                                                           
REDUCED_OPT_OBJS=\                                                                                                                                        
netcdf_io.o \                                                                                                                                             
fms_netcdf_domain_io.o \                                                                                                                                  
fms_netcdf_unstructured_domain_io.o   

  $(REDUCED_OPT_OBJS): %.o: %.F90                                                                                                                         
          $(FC) -c $(INCLDIR) $(INCS) $(FFLAGS) $(FREEFLAGS)  -O0 $<         

@dphow
Copy link

dphow commented Aug 14, 2024

Copying from the HPC Jira ticket response

Was this a recent observation that originated in the latest commit, ie 18cb810

or has this likely been an ongoing problem when using the intel-oneapi LLVM compilers?

I see the workaround suggested is to use -O0 optimization (ie no optimization). Was there a previous commit where -O2 did work with intel-oneapi? Have you tried any other OneAPI suites to test this as well?

I am not sure to what extent Intel compiler folks will want a report using the whole model as a reproducer but if there's a smaller test case, or if this issue is within solely a netcdf interface as the error currently suggests, then perhaps we can share with them a smaller MRE?

That is assuming there wasn't some code change that can be explicitly identified causing this. Nonetheless, these types of errors are often best reported to Intel compiler folks. Hopefully, we can find better work arounds in the interim.

@jedwards4b
Copy link
Collaborator Author

@dphow It's been an ongoing problem, same issue in intel-oneapi/2023.2.1. This is not an entire model this is one support library.

@dphow
Copy link

dphow commented Aug 19, 2024

Copying here from Jira ticket...

Can you try the new intel/2024.2.1 module to see if this issue remains in this updated version?

We had the thought that new compilers Intel has released may have already addressed this, as per this past issues of theirs https://community.intel.com/t5/Intel-Fortran-Compiler/Internal-compiler-error-segmentation-violation-signal-raised-WRF/td-p/1575801

@jedwards4b
Copy link
Collaborator Author

@dphow It does seem to have solved the problem. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants