Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone MPAS-Ocean builds broken #5598

Closed
xylar opened this issue Apr 13, 2023 · 16 comments · Fixed by #5600
Closed

Standalone MPAS-Ocean builds broken #5598

xylar opened this issue Apr 13, 2023 · 16 comments · Fixed by #5600
Assignees

Comments

@xylar
Copy link
Contributor

xylar commented Apr 13, 2023

...presumably by #5583

When I try to build standalone MPAS-Ocean on Chrysalis with Intel, OpenMPI, and OpenMP (see below for details), I see:

src/libdycore.a(mpas_ocn_time_integration_si.o): In function `ocn_time_integration_si_mp_si_precond_':
mpas_ocn_time_integration_si.F:(.text+0x1d7c9): undefined reference to `dspmv_'
make[1]: *** [Makefile:1032: mpas] Error 1
make[1]: Leaving directory '/gpfs/fs1/home/ac.xylar/compass/add_prescribed_melt_fluxes/e3sm_master/components/mpas-ocean'
make: *** [Makefile:174: ifort] Error 2

This was not seen until recently (not in 0273cfa when #5343 was merged, the current location of the E3SM-Project submodule in compass).

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

Here's what I'm loading on Chrysalis to build from the current master:

echo Loading Spack environment...
module purge
source /lcrc/soft/climate/compass/chrysalis/spack/spack_for_mache_1.14.0/share/spack/setup-env.sh
spack env activate dev_compass_1_2_0-alpha_5_intel_openmpi
module load perl/5.32.0-bsnc6lt
module load intel/20.0.4-kodw73g
module load openmpi/4.1.3-pin4k7o
module load intel-mkl/2020.4.304-g2qaxzf
module load hdf5/1.10.7-eewgp6v
module load netcdf-c/4.4.1-ihoo4zq
module load netcdf-fortran/4.4.4-tplolxh
module load parallel-netcdf/1.11.0-gvcfihh
echo Done.
echo

export MPAS_EXTERNAL_LIBS=""
export NETCDF=$(dirname $(dirname $(which nc-config)))
export NETCDFF=$(dirname $(dirname $(which nf-config)))
export PNETCDF=$(dirname $(dirname $(which pnetcdf-config)))
export PIO=/lcrc/soft/climate/compass/chrysalis/spack/spack_for_mache_1.14.0/var/spack/environments/dev_compass_1_2_0-alpha_5_intel_openmpi/.spack-env/view

export USE_PIO2=true
export OPENMP=true
export HDF5_USE_FILE_LOCKING=FALSE

git submodule update --init --recursive
cd components/mpas-ocean
make clean
make ifort

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

@hyungyukang, can you have a look at this?

@philipwjones
Copy link
Contributor

@xylar dspmv is a LAPACK function so this would need to be built with LAPACK. We probably need to add either our own reference version to use if LAPACK isn't linked or make sure there's a guard around this when LAPACK isn't linked (and an error if SI is turned on without building with LAPACK)

@jonbob
Copy link
Contributor

jonbob commented Apr 13, 2023

@xylar -- we can revert that PR today if necessary and give the developers some time to come up with a solution

@hyungyukang
Copy link
Contributor

@xylar , I will look into the error. As @philipwjones mentioned, the error comes from compiling the standalone MPAS-O without LAPACK but trying to use the SI solver. Before having this error, did you define LAPACK directory in your shell ENV and add USE_LAPACK=true when you compile the standalone MPAS-O?

When you turn on config_time_integrator = 'split_implicit', you should compile the MPAS-O with LAPACK again by adding USE_LAPACK=true.

I actually didn't change anything on LAPACK-related functions in #5583. DSPMV has been always there since the semi-implicit added to MPAS-O. Please make sure your configurations and let me know.

BTW, I don't have account on Chicoma, but Can I have my account?

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

@philipwjones and @hyungyukang, first, I have not got as far as running the model since it doesn't build. So I have not selected a particular solver but I will not be running with split-implicit, as so far none of our compass tests cases use it.

We typically build standalone MPAS-Ocean without LAPACK. This is the default mode, so it needs to work but it is currently broken. I realize I could build with LAPACK but this is not really a solution for us because we have been able to run without it up to now (or at least we should have been given some warning that it was going to be a requirement so we could make it part of our build process).

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

BTW, I don't have account on Chicoma, but Can I have my account?

Sorry, I mistyped. This was on Chrysalis. (You don't want an account on Chicoma, trust me.)

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

I actually didn't change anything on LAPACK-related functions in #5583. DSPMV has been always there since the semi-implicit added to MPAS-O. Please make sure your configurations and let me know.

I'm sure nothing was intentionally changed related to DSPMV in #5583 but I can see that there are code changes that involve that function. It seems likely to me that something in #5583 must have taken what was an optional call to DSPMV depending on compiler flags and accidentally made it happen even when not setting USE_LAPACK=true, or something along those lines. I can double check but I don't think any other recent merge can account for this problem, and it emerged in the last week at the most.

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

@xylar -- we can revert that PR today if necessary and give the developers some time to come up with a solution

@jonbob, if a fix can be found and merged soon, that would be better. @philipwjones and @hyungyukang, what do you think?

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

A bit more detail. After #5583, we have:

! LAPACK on CPU
! (also if no MAGMA & CUBLAS for GPU ; Data staging)
!$acc update host(SIvec_in)
call DSPMV('U', nPrecVec, 1.0_RKIND, prec_ivmat(:,1), &
SIvec_in(1:nPrecVec) , 1, 0.0_RKIND, &
SIvec_out(1:nPrecVec), 1)

This is in a block that gets called if we haven't defined USE_MAGMA or USE_CUBLAS, which is currently going to be true in standalone for the vast majority of testing we do.

Before #5583, the DSPMV were wrapped in #if USE_LAPACK checks:

#ifdef USE_LAPACK
call DSPMV('U', nPrecVec, 1.0_RKIND, prec_ivmat, &
SIvec_in(1:nPrecVec) , 1, 0.0_RKIND, &
SIvec_out(1:nPrecVec), 1)
#endif
elseif ( trim(config_btr_si_preconditioner) == &
'block_jacobi' ) then
! Block-Jacobi preconditioning: Use BLAS for the symmetric
! matrix-vector multiplication
#ifdef USE_LAPACK
call DSPMV('U', nPrecVec, 1.0_RKIND, prec_ivmat, &
SIvec_in(1:nPrecVec) , 1, 0.0_RKIND, &
SIvec_out(1:nPrecVec), 1)
#endif

Which is why it worked well in standalone runs without LAPACK.

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

@xylar dspmv is a LAPACK function so this would need to be built with LAPACK. We probably need to add either our own reference version to use if LAPACK isn't linked or make sure there's a guard around this when LAPACK isn't linked (and an error if SI is turned on without building with LAPACK)

@philipwjones, yes, that's my reading too.

@philipwjones
Copy link
Contributor

@xylar beat me to it - just discovered the same thing. We just need to add the ifdef USE_LAPACK around these calls again. Just a little more complicated because it's in an else clause for other solver packages.

@hyungyukang
Copy link
Contributor

hyungyukang commented Apr 13, 2023

@xylar , thanks a lot for finding those! My bad. I made mistake when I add OpenACC directives. I'm going to work on it!

@jonbob , could you please revert #5583? Or would it be better to open another PR?

@jonbob
Copy link
Contributor

jonbob commented Apr 13, 2023

If you can come up with a fix quickly, I'd prefer to merge another PR

@xylar
Copy link
Contributor Author

xylar commented Apr 13, 2023

@hyungyukang, thanks, I appreciate it!

I agree with @jonbob, a PR to fix this would be much better than reverting.

It isn't affecting most of our testing yet since we haven't update the E3SM-Project submodule in compass yet to bring in #5583. It's just when we try to test branches that have been rebased onto master and include #5583 that we notice the problem. For now, these branches can just be based off of the previous merge commit before #5583 and we're good.

@hyungyukang
Copy link
Contributor

If you can come up with a fix quickly, I'd prefer to merge another PR

@jonbob , thanks. I will fix it soon and open a PR.
@xylar , thanks again and sorry for making a trouble. I will let you know through the new PR once it's finished.

hyungyukang added a commit that referenced this issue Apr 14, 2023
- The standalone MPAS-O build was broken due to a missing 'USE_LAPACK' directive in the SI solver code.
jonbob added a commit that referenced this issue Apr 14, 2023
Fix a bug in the SI solver

Fixes a bug in the standalone MPAS-O build that was broken due to a
missing 'USE_LAPACK' directive in the SI solver code (issued by #5598).

Fixes #5598

[BFB]
@jonbob jonbob closed this as completed in 4b3e611 Apr 17, 2023
crterai pushed a commit that referenced this issue May 8, 2023
- The standalone MPAS-O build was broken due to a missing 'USE_LAPACK' directive in the SI solver code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants