Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When is the next release ? #3708

Open
mboisson opened this issue Nov 15, 2023 · 31 comments
Open

When is the next release ? #3708

mboisson opened this issue Nov 15, 2023 · 31 comments

Comments

@mboisson
Copy link

Last release of LibMesh is 1.5 years old, and there has been a lot of development. I have projects that depend on specific commits, but that is not appropriate to install on a large cluster (we want specific versions instead).

@jwpeterson
Copy link
Member

Not sure what @roystgnr prefers, but IMO we could start a release branch now. I was doing somewhat regular releases for a while there, but it wasn't clear that anyone was actually using them... as far as I know most big projects are doing some form of a git submodule for libmesh at this time, and updating it as needed.

@mboisson
Copy link
Author

Speaking as a package manager/software install for large HPC clusters, we very much prefer having releases (with sane versioning schemes) than having commit hashes and having to make up our own release scheme. All software on HPC cluster is installed through modules, and having a sane versioning scheme with actual releases makes it a lot simpler for everyone.

@roystgnr
Copy link
Member

It would be a good time to start a release branch, IMHO.

@ostueker
Copy link
Contributor

ostueker commented Oct 24, 2024

Any idea when we might see a new release?

I see there's now a branch_v1.7.2 but no release has been tagged at https://github.com/libMesh/libmesh/releases

Is branch_v1.7.2 still work-in-progress or does 8cc62e9 represent an un-tagged 1.7.2 release?

Edit: Okay, I see that there is actually a 1.7.2 tag, but no release was made on GitHub.
Could you please create a release and upload a complete tar-ball that contains the code from the submodules, just like it was done for 1.7.1?

@jwpeterson
Copy link
Member

jwpeterson commented Oct 28, 2024

There was an issue with the 1.7.2 tag where the version number was not incremented correctly before the tag was created. So, there won't be a 1.7.2 release tarball, but we can make a 1.7.3 release tarball. Note, though, that not much has been changed since 1.7.1 was tagged, although a lot of time has gone by.

After discussing things a bit with @roystgnr, our plan is to tag a 1.8.0 release and create tarballs once we get a passing devel -> master merge here.

@jwpeterson
Copy link
Member

Would you mind trying out one of the 1.7.3 release tarballs here? As I mentioned before, it won't be hugely different from 1.7.1, but if there's something wrong with the files, it would be good to know before we start in on the 1.8.x series.

@ostueker
Copy link
Contributor

ostueker commented Oct 29, 2024

Thanks for the 1.7.3 release.

When I try to build it, it fails because it can't find the exodus_config.h:

In file included from src/ex_create_par.c:126:
./include/exodusII.h:18:10: fatal error: exodus_config.h: No such file or directory
   18 | #include "exodus_config.h"
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.

and sure enough, I find several *_config.h files but the one for exodus is missing:

$ find /tmp/stuekero/avx2/libMesh/1.7.3/foss-2023a -name "*_config.h"
/tmp/stuekero/avx2/libMesh/1.7.3/foss-2023a/libmesh-1.7.3/contrib/metaphysicl/src/utilities/include/metaphysicl/metaphysicl_config.h
/tmp/stuekero/avx2/libMesh/1.7.3/foss-2023a/libmesh-1.7.3/contrib/laspack/laspack_config.h
/tmp/stuekero/avx2/libMesh/1.7.3/foss-2023a/libmesh-1.7.3/contrib/timpi/src/utilities/include/timpi/timpi_config.h
/tmp/stuekero/avx2/libMesh/1.7.3/foss-2023a/libmesh-1.7.3/include/libmesh/libmesh_config.h
/tmp/stuekero/avx2/libMesh/1.7.3/foss-2023a/libmesh-1.7.3/include/libmesh_config.h

Nothing in the output of "./configure" that would explain this. Full configure line:

./configure --prefix=/home/stuekero/.local/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/libmesh/1.7.3  \
  --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu CXX=mpic++ CC=mpicc --enable-curl \
  --disable-strict-lgpl  --enable-parmesh --enable-distmesh --enable-xdr --enable-hdf5 \
  --with-hdf5=$EBROOTHDF5  --with-eigen-include=$EBROOTEIGEN/include  \
  --with-glpk-include=$EBROOTGENTOO/include --with-glpk-lib=$EBROOTGENTOO/lib64  \
  --with-nlopt-include=$EBROOTNLOPT/include --with-nlopt-lib=$EBROOTNLOPT/lib  \
  --with-curl-include=$EBROOTGENTOO/include/curl --with-curl-lib=$EBROOTGENTOO/lib  \
  --with-tbb=$EBROOTTBB/tbb

What is the most recent version of HDF5 that you have tested with libMesh?
Before trying 1.7.3 I've created my own tarball of 1.7.2 by doing a recursive clone of the branch_v1.7.2, deleting the .git and then tar-ing it up.
But when using our current default version of HDF5 (1.14.2) I got errors when running make check:

tst_h_files4.c: In function op_func:
tst_h_files4.c:54:43: error: const union <anonymous> has no member named address
   54 |    if ((id = H5Oopen_by_addr(g_id, info->u.address)) < 0) ERR;
      |                                           ^
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2/include/H5public.h:31,
                 from /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2/include/hdf5.h:21,
                 from tst_h_files4.c:11:
tst_h_files4.c: In function main:
/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2/include/H5version.h:934:30: error: too few arguments to function H5Oget_info_by_idx3
  934 |   #define H5Oget_info_by_idx H5Oget_info_by_idx3
      |                              ^~~~~~~~~~~~~~~~~~~
tst_h_files4.c:189:14: note: in expansion of macro H5Oget_info_by_idx
  189 |          if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
      |              ^~~~~~~~~~~~~~~~~~
In file included from /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2/include/H5Apublic.h:21,
                 from /cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2/include/hdf5.h:22:
/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2/include/H5Opublic.h:599:15: note: declared here
  599 | H5_DLL herr_t H5Oget_info_by_idx3(hid_t loc_id, const char *group_name, H5_index_t idx_type,
      |               ^~~~~~~~~~~~~~~~~~~
make[4]: *** [Makefile:703: tst_h_files4.o] Error 1

These went away when I replaced it with hdf5/1.10.11.

I would like to avoid having to install another version of HDF5 because users can't have two different hdf5 modules (versions) loaded at the same time and there are so many other modules that depend on HDF5 (and are using our default version).

@jwpeterson
Copy link
Member

When I try to build it, it fails because it can't find the exodus_config.h:

And this doesn't happen when you try building the 1.7.1 tar archive? Nothing has changed with our bundled Exodus between releases 1.7.1 and 1.7.3, so I don't know why it would work for one but not the other. I found an exodus_config.h file in our source tree (./contrib/exodusii/v8.11/exodus/sierra/exodus_config.h) but it doesn't contain anything important (just a bunch of defines inside #if 0 blocks) so a quick workaround might be to comment out the inclusion in exodusII.h

What is the most recent version of HDF5 that you have tested with libMesh?

I'm using HDF 1.10.0 myself.

But when using our current default version of HDF5 (1.14.2) I got errors when running make check

Those errors are coming from the bundled NetCDF sources, which are on v4.9.2. So, in order to use newer HDF5 versions, we'd first have to update our internal NetCDF to something newer.

@jwpeterson
Copy link
Member

OK, I confirmed that I get the same error as you regarding the exodus_config.h file not being found while trying to build from the 1.7.3 release tarball. We figured out that our internal make distcheck testing does not enable HDF5, and is only testing the Exodus v5.22 code path, so we never realized there was an issue with building v8.11 from the release tarball.

We'll look into a fix for this and create a new release if/when we find one.

@jwpeterson
Copy link
Member

We realized there was already a fix for this issue on master (69d99e0), it just needed to be cherry-picked to the 1.7.x release branch. I have now done that and created the new tarballs here, would you mind giving that a try? Unfortunately I don't think we can fix the "does not work with recent HDF" issue on this old release branch, and likely not on 1.8.x either since we are trying to tag that very soon, but we'll try and keep it on the radar for future releases.

@ostueker
Copy link
Contributor

The exodus_config.h error is gone with 1.7.4.
The configure and compilation runs through but easybuild still fails in the "make check" stage but I hadn't had the time yet to find out why (no clues in the output of make check).

@jwpeterson
Copy link
Member

jwpeterson commented Oct 31, 2024

I tested running make check with the 1.7.4 tarball locally with HDF enabled, and I got several errors from the unit testing stage:

1) test: MeshInputTest::testExodusCopyNodalSolutionDistributed (E) 
uncaught exception of type std::exception (or derived).
- Error creating ExodusII/Nemesis mesh file.


2) test: MeshInputTest::testExodusCopyElementSolutionDistributed (E) 
uncaught exception of type std::exception (or derived).
- Error creating ExodusII/Nemesis mesh file.


3) test: MeshInputTest::testExodusCopyNodalSolutionReplicated (E) 
uncaught exception of type std::exception (or derived).
- Error creating ExodusII/Nemesis mesh file.

...

and 23 others, all errors come from the call to ExodusII_IO_Helper::create(). So far, I have not figured out what causes this, but I did confirm that a build with Exodus v5.22 and HDF disabled passes make check. It would be helpful if you could confirm whether you see the same output from make check.

Thanks,
John

@ostueker
Copy link
Contributor

I'm currently in the "make check" phase for my build of libmesh-1.7.5, but haven't reached the ExodusII tests yet.

When two days ago I first noticed the crashes (compiling with HDF5 1.10.11) I looked through the make check logs and didn't see any failing tests.

@jwpeterson
Copy link
Member

I recompiled in dbg mode, reran the failing tests, and this time, I got more useful information, it looks like there is some issue with the way that we have configured NetCDF on the 1.7.x branch that prevents it from writing "netcdf-4" format files, which I believe is what we try to do when HDF is enabled:

Exodus Library Warning/Error: [ex__handle_mode]
    EXODUS: ERROR: File format specified as netcdf-4, but the NetCDF library being used was not configured to enable this format


Exodus Library Warning/Error: [ex_create_int]
    ERROR: file create failed for dist_with_nodal_soln.e in NETCDF4 mode.
    This library does not support netcdf-4 files.
    NetCDF: Invalid argument
Error creating ExodusII/Nemesis mesh file.

@ostueker
Copy link
Contributor

ostueker commented Oct 31, 2024

For me it suddenly fails with:

[...]
***************************************************************
* Done Running Example optimization_ex1:
*   ./example-opt -tao_monitor -tao_view -tao_type nls 
***************************************************************
PASS: run.sh
=============
1 test passed
=============
make[4]: Leaving directory '/tmp/stuekero/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex1'
make[3]: Leaving directory '/tmp/stuekero/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex1'
make[2]: Leaving directory '/tmp/stuekero/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex1'
Making check in optimization/optimization_ex2
make[2]: Entering directory '/tmp/stuekero/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex2'
make  check-am
make[3]: Entering directory '/tmp/stuekero/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex2'
make  example-dbg example-devel example-opt   run.sh
make[4]: Entering directory '/tmp/stuekero/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex2'
  CX

I don't even reach the EXODUS tests.

Do the optimization tests need a lot of memory? I'm running on a build cluster and the build job has 8 cores and about 28 GiB of memory available.

@jwpeterson
Copy link
Member

jwpeterson commented Oct 31, 2024

OK, sorry for the red herring. I'm pretty sure the examples run after the unit tests, though, so if you made it to them you likely passed all the unit testing.

The optimization_ex2 test runs on a 10x10 mesh by default, it should not require much memory at all. On my system it runs in about 1-2 seconds, but I noted that it doesn't converge well with the default arguments that we are using and the solve stops because it reaches the maximum number of function evaluations:

Tao optimization solver convergence/divergence reason: DIVERGED_MAXFCN

This is with PETSc 3.17, I guess you might see other behavior with different versions of PETSc.

@ostueker
Copy link
Contributor

ostueker commented Nov 1, 2024

Thanks to the help of a colleague I now know that the crash is related to PetSC.

I'm just not sure whether it's due to the fact that I'm trying to build against petsc/3.20.0 (the only version in our current environment) or that petsc/3.20.0 was compiled against hdf5/1.14.2 but the netCDF that you include requires an hdf5/1.10.x.

Also:

  1. Why do you even include netcdf in libMesh? We have a perfectly good netcdf/4.9.2 in our software stack that we could use. Or if you at least could include the option to not use the one you include in the sources but allow to supply our own instead.

  2. Why doesn't the netcdf-c-4.6.2 work with hdf5/1.14.2?
    contrib/netcdf/netcdf-c-4.6.2/CMakeLists.txt sets H5_USE_16_API which should even switch HDF5 1.14.x back to the 1.6.x API.

I noticed that you already have a 1.8.0 branch that includes netcdf-4.9.2 (the same version as our current default). Next week I'm going to create a tarball from that branch and test it.

Oliver

make[4]: Entering directory '/tmp/oldeman/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex2'
***************************************************************
* Running Example optimization_ex2:
*   ./example-dbg -tao_monitor -tao_view -tao_type ipm -pc_type jacobi -ksp_max_it 100 
***************************************************************
 
 Mesh Information:
  elem_dimensions()={2}
  spatial_dimension()=2
  n_nodes()=441
    n_local_nodes()=441
  n_elem()=100
    n_local_elem()=100
    n_active_elem()=100
  n_subdomains()=1
  n_partitions()=1
  n_processors()=1
  n_threads()=1
  processor_id()=0
  is_prepared()=true
  is_replicated()=false

 EquationSystems
  n_systems()=1
   System #0, "Optimization"
    Type "Optimization"
    Variables="u" 
    Finite Element Types="LAGRANGE" 
    Approximation Orders="SECOND" 
    n_dofs()=441
    n_local_dofs()=441
    n_constrained_dofs()=0
    n_local_constrained_dofs()=0
    n_vectors()=4
    n_matrices()=2
    DofMap Sparsity
      Average  On-Processor Bandwidth <= 14.8776
      Average Off-Processor Bandwidth <= 0
      Maximum  On-Processor Bandwidth <= 25
      Maximum Off-Processor Bandwidth <= 0
    DofMap Constraints
      Number of DoF Constraints = 0

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Not for unassembled vector, did you call VecAssemblyBegin()/VecAssemblyEnd()?
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-tao_view (no value) source: command line
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.20.0, Sep 28, 2023 
[0]PETSC ERROR: /tmp/oldeman/avx2/libMesh/1.7.5/foss-2023a/libmesh-1.7.5/examples/optimization/optimization_ex2/.libs/lt-example-dbg on a  named node2.int.archimedes.c3.ca by oldeman Fri Nov  1 16:01:45 2024
[0]PETSC ERROR: Configure options --prefix=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/petsc/3.20.0 --with-hdf5=1 --with-hdf5-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2 --with-cxx-dialect=C++14 --with-memalign=64 --with-python=no --with-mpi4py=no --download-mumps=1 --download-metis=1 --download-SuiteSparse=1 --download-triangle=1 --download-strumpack=1 --download-spooles=1 --download-ptscotch=1 --download-spai=1 --download-superlu_dist=1 --download-prometheus=1 --download-parmetis=1 --download-party=1 --download-superlu=1 --download-hypre=1 --download-slepc=1 --download-chaco=1 --download-hpddm=1 --download-ml=1 --download-mumps-shared=0 --download-ptscotch-shared=0 --download-superlu-shared=0 --download-superlu_dist-shared=0 --download-parmetis-shared=0 --download-metis-shared=0 --download-ml-shared=0 --download-SuiteSparse-shared=0 --download-hypre-shared=0 --download-prometheus-shared=0 --download-spooles-shared=0 --download-chaco-shared=0 --download-slepc-shared=0 --download-spai-shared=0 --download-party-shared=0 --with-cc=mpicc --with-cxx=mpicxx --with-c++-support --with-fc=mpifort --CFLAGS="-O2 -ftree-vectorize -march=x86-64-v3 -fno-math-errno -fPIC" --CXXFLAGS="-O2 -ftree-vectorize -march=x86-64-v3 -fno-math-errno -fPIC -DOMPI_SKIP_MPICXX -DMPICH_SKIP_MPICXX" --FFLAGS="-O2 -ftree-vectorize -march=x86-64-v3 -fno-math-errno -fPIC" --with-mpi=1 --with-build-step-np=8 --with-shared-libraries=1 --with-debugging=0 --with-pic=1 --with-x=0 --with-windows-graphics=0 --with-scalapack=1 --with-scalapack-lib="[/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/scalapack/2.2.0/lib/libscalapack.a,libflexiblas.a,libgfortran.a]" --with-blaslapack-lib="[/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Core/flexiblascore/3.3.1/lib/libflexiblas.a,libgfortran.a]" --with-hdf5=1 --with-hdf5-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/hdf5-mpi/1.14.2 --with-fftw.mpi=1 --with-fftw.mpi-dir=/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/MPI/gcc12/openmpi4/fftw-mpi/3.3.10
[0]PETSC ERROR: #1 VecAXPYAsync_Private() at /tmp/ebuser/avx2/PETSc/3.20.0/foss-2023a/petsc-3.20.0/src/vec/vec/interface/rvector.c:560
[0]PETSC ERROR: #2 VecAXPY() at /tmp/ebuser/avx2/PETSc/3.20.0/foss-2023a/petsc-3.20.0/src/vec/vec/interface/rvector.c:608
[0]PETSC ERROR: #3 IPMUpdateAi() at /tmp/ebuser/avx2/PETSc/3.20.0/foss-2023a/petsc-3.20.0/src/tao/constrained/impls/ipm/ipm.c:765
[0]PETSC ERROR: #4 IPMEvaluate() at /tmp/ebuser/avx2/PETSc/3.20.0/foss-2023a/petsc-3.20.0/src/tao/constrained/impls/ipm/ipm.c:632
[0]PETSC ERROR: #5 TaoSolve_IPM() at /tmp/ebuser/avx2/PETSc/3.20.0/foss-2023a/petsc-3.20.0/src/tao/constrained/impls/ipm/ipm.c:45
[0]PETSC ERROR: #6 TaoSolve() at /tmp/ebuser/avx2/PETSc/3.20.0/foss-2023a/petsc-3.20.0/src/tao/interface/taosolver.c:164
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
In: PMI_Abort(1, N/A)
slurmstepd: error: *** STEP 3219.4 ON node2 CANCELLED AT 2024-11-01T16:01:45 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: node2: task 0: Killed

@ostueker
Copy link
Contributor

ostueker commented Nov 4, 2024

How do I tell ./configure to use the netcdf-c v4.9.2 instead of the default netcdf-c-4.6.2 ?

@jwpeterson
Copy link
Member

jwpeterson commented Nov 4, 2024

How do I tell ./configure to use the netcdf-c v4.9.2 instead of the default netcdf-c-4.6.2 ?

Are you talking about in the 1.7.x release series? I don't think we bundled NetCDF 4.9.2 with libmesh at that time.

Or if you at least could include the option to not use the one you include in the sources but allow to supply our own instead.

There was an attempt to let the use select a system NetCDF installation in the past, but it wasn't merged. I don't recall exactly why, but I don't think it worked with all our supported platforms. I think it would be a good improvement, but don't have time to work on it myself.

Why doesn't the netcdf-c-4.6.2 work with hdf5/1.14.2

No idea, that would probably be a good bug report for the NetCDF/HDF people if you can narrow down exactly what the issue is.

@ostueker
Copy link
Contributor

ostueker commented Nov 4, 2024

How do I tell configure to use the netcdf-c v4.9.2 instead of the default netcdf-c-4.6.2 ?
Are you talking about in the 1.7.x release series? I don't think we bundled NetCDF 4.9.2 with libmesh at that time.

No, today I was giving branch_v1.8.0 a try. It has netcdf-c v4.9.2 as a submodule and I found mentions of LIBMESH_ENABLE_NETCDF_V492 in configure but ./configure --help didn't yield any hints, hence my question.

I'll try again tomorrow.

@jwpeterson
Copy link
Member

On master and branch_v1.8.0, you should be able to get it by doing --enable-netcdf=v492. The relevant logic for this is in m4/netcdf.m4, I agree it doesn't show up in the output of ./configure --help which is not great.

@ostueker
Copy link
Contributor

ostueker commented Nov 6, 2024

Why doesn't the netcdf-c-4.6.2 work with hdf5/1.14.2

No idea, that would probably be a good bug report for the NetCDF/HDF people if you can narrow down exactly what the issue is.

Turns out that it was only the netCDF tests that were failing with HDF5-1.12.0 or newer.
This was first raised in Unidata/netcdf-c#1628 and addressed in Unidata/netcdf-c#1671.

I've ported that patch so that it can be applied to libMesh-1.7.5 and now the netcdf-tests all pass even when using our current default module HDF5/1.14.2.

Patch libMesh's NetCDF tests for compatibility with HDF5 >= 1.12.

The below patch has been taken from [1] and modified so that it can be
applied to the netcdf-c-4.6.2 sources included with libMesh.

[1]: https://github.com/Unidata/netcdf-c/commit/9f9b125028b28d8e94f2c990c8d92a7df76fde78

diff --git a/contrib/netcdf/v4/h5_test/tst_h_atts3.c b/contrib/netcdf/v4/h5_test/tst_h_atts3.c
index 7976821b0..4fb672798 100644
--- a/contrib/netcdf/v4/h5_test/tst_h_atts3.c
+++ b/contrib/netcdf/v4/h5_test/tst_h_atts3.c
@@ -46,7 +46,11 @@ main()
       hid_t file_typeid1[NUM_OBJ], native_typeid1[NUM_OBJ];
       hid_t file_typeid2, native_typeid2;
       hsize_t num_obj;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       char obj_name[STR_LEN + 1];
       hsize_t dims[1] = {ATT_LEN}; /* netcdf attributes always 1-D. */
       struct s1
@@ -148,8 +152,14 @@ main()
       for (i = 0; i < num_obj; i++)
       {
 	 /* Get the name, and make sure this is a type. */
+
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+				 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 				i, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (H5Lget_name_by_idx(grpid, ".", H5_INDEX_NAME, H5_ITER_INC, i,
 				obj_name, STR_LEN + 1, H5P_DEFAULT) < 0) ERR;
 	 if (obj_info.type != H5O_TYPE_NAMED_DATATYPE) ERR;
@@ -267,8 +277,13 @@ main()
       for (i = 0; i < num_obj; i++)
       {
 	 /* Get the name, and make sure this is a type. */
+#if H5_VERSION_GE(1,12,0)
+       	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+                                 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR; 
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 				i, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (H5Lget_name_by_idx(grpid, ".", H5_INDEX_NAME, H5_ITER_INC, i,
 				obj_name, STR_LEN + 1, H5P_DEFAULT) < 0) ERR;
 	 if (obj_info.type != H5O_TYPE_NAMED_DATATYPE) ERR;
diff --git a/contrib/netcdf/v4/h5_test/tst_h_atts4.c b/contrib/netcdf/v4/h5_test/tst_h_atts4.c
index 6228dd661..d70f4a497 100644
--- a/contrib/netcdf/v4/h5_test/tst_h_atts4.c
+++ b/contrib/netcdf/v4/h5_test/tst_h_atts4.c
@@ -49,7 +49,11 @@ main()
       hid_t file_typeid1[NUM_OBJ_2], native_typeid1[NUM_OBJ_2];
       hid_t file_typeid2, native_typeid2;
       hsize_t num_obj;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       char obj_name[STR_LEN + 1];
       hsize_t dims[1] = {ATT_LEN}; /* netcdf attributes always 1-D. */
       struct s1
@@ -139,8 +143,13 @@ main()
       for (i = 0; i < num_obj; i++)
       {
 	 /* Get the name, and make sure this is a type. */
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+				 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 				i, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (H5Lget_name_by_idx(grpid, ".", H5_INDEX_NAME, H5_ITER_INC, i,
 				obj_name, STR_LEN + 1, H5P_DEFAULT) < 0) ERR;
 	 if (obj_info.type != H5O_TYPE_NAMED_DATATYPE) ERR;
diff --git a/contrib/netcdf/v4/h5_test/tst_h_compounds2.c b/contrib/netcdf/v4/h5_test/tst_h_compounds2.c
index 2f885a57a..9707b801d 100644
--- a/contrib/netcdf/v4/h5_test/tst_h_compounds2.c
+++ b/contrib/netcdf/v4/h5_test/tst_h_compounds2.c
@@ -48,7 +48,11 @@ main()
       hsize_t dims[1];
       hsize_t num_obj, i_obj;
       char obj_name[STR_LEN + 1];
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       hid_t fapl_id, fcpl_id;
       htri_t equal;
       char file_in[STR_LEN * 2];
@@ -131,8 +135,13 @@ main()
       if (H5Gget_num_objs(grpid, &num_obj) < 0) ERR;
       for (i_obj = 0; i_obj < num_obj; i_obj++)
       {
+#if H5_VERSION_GE(1, 12, 0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, 
+				i_obj, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, 
 				i_obj, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (H5Lget_name_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, 
 				i_obj, obj_name, STR_LEN + 1, H5P_DEFAULT) < 0) ERR;
 
@@ -194,8 +203,13 @@ main()
       if (H5Gget_num_objs(grpid, &num_obj) < 0) ERR;
       for (i_obj = 0; i_obj < num_obj; i_obj++)
       {
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, i_obj, &obj_info, 
+				H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, i_obj, &obj_info, 
 				H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (H5Lget_name_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, i_obj, obj_name, 
 				STR_LEN + 1, H5P_DEFAULT) < 0) ERR;
 
diff --git a/contrib/netcdf/v4/h5_test/tst_h_files4.c b/contrib/netcdf/v4/h5_test/tst_h_files4.c
index eef3e1608..7ea991acd 100644
--- a/contrib/netcdf/v4/h5_test/tst_h_files4.c
+++ b/contrib/netcdf/v4/h5_test/tst_h_files4.c
@@ -44,14 +44,23 @@ with the H5Lvisit function call
 
 */
 herr_t
-op_func (hid_t g_id, const char *name, const H5L_info_t *info, 
+op_func (hid_t g_id, const char *name, 
+#if H5_VERSION_GE(1,12,0)
+         const H5L_info2_t *info,
+#else
+         const H5L_info_t *info,
+#endif
 	 void *op_data)  
 {
    hid_t id;
    H5I_type_t obj_type;
 
    strcpy((char *)op_data, name);
+#if H5_VERSION_GE(1,12,0)
+   if ((id = H5Oopen_by_token(g_id, info->u.token)) < 0) ERR;
+#else
    if ((id = H5Oopen_by_addr(g_id, info->u.address)) < 0) ERR;
+#endif
 
 /* Using H5Ovisit is really slow. Use H5Iget_type for a fast
  * answer. */
@@ -169,7 +178,11 @@ main()
    {
       hid_t fapl_id, fileid, grpid;
       H5_index_t idx_field = H5_INDEX_CRT_ORDER;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       hsize_t num_obj;
       ssize_t size;
       char obj_name[STR_LEN + 1];
@@ -186,8 +199,13 @@ main()
       if (H5Gget_num_objs(grpid, &num_obj) < 0) ERR;
       for (i = 0; i < num_obj; i++)
       {
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, 
+                                 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT)) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, 
 				i, &obj_info, H5P_DEFAULT)) ERR;
+#endif
 	 if ((size = H5Lget_name_by_idx(grpid, ".", idx_field, H5_ITER_INC, i,
 					NULL, 0, H5P_DEFAULT)) < 0) ERR;
 	 if (H5Lget_name_by_idx(grpid, ".", idx_field, H5_ITER_INC, i,
diff --git a/contrib/netcdf/v4/h5_test/tst_h_vars2.c b/contrib/netcdf/v4/h5_test/tst_h_vars2.c
index 49158ba86..2b731b3c9 100644
--- a/contrib/netcdf/v4/h5_test/tst_h_vars2.c
+++ b/contrib/netcdf/v4/h5_test/tst_h_vars2.c
@@ -31,7 +31,11 @@ main()
       hsize_t num_obj;
       hid_t fileid, grpid, spaceid;
       int i;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       char names[NUM_ELEMENTS][MAX_SYMBOL_LEN + 1] = {"H", "He", "Li", "Be", "B", "C"};
       char name[MAX_SYMBOL_LEN + 1];
       ssize_t size;
@@ -79,8 +83,13 @@ main()
       if (num_obj != NUM_ELEMENTS) ERR;
       for (i = 0; i < num_obj; i++)
       {
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+                                 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 				i, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (obj_info.type != H5O_TYPE_DATASET) ERR;
 	 if ((size = H5Lget_name_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, i,
 					NULL, 0, H5P_DEFAULT)) < 0) ERR;
@@ -106,7 +115,11 @@ main()
       hid_t fileid, grpid;
       hsize_t num_obj;
       int i;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       char names[NUM_DIMSCALES][MAX_SYMBOL_LEN + 1] = {"b", "a"};
       char name[MAX_SYMBOL_LEN + 1];
       hid_t dimscaleid;
@@ -152,8 +165,13 @@ main()
       if (num_obj != NUM_DIMSCALES) ERR;
       for (i = 0; i < num_obj; i++)
       {
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+				 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 				i, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
 	 if (obj_info.type != H5O_TYPE_DATASET) ERR;
 	 if ((size = H5Lget_name_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, i,
                                 	 NULL, 0, H5P_DEFAULT)) < 0) ERR;
@@ -178,7 +196,11 @@ main()
       hsize_t num_obj;
       hid_t fileid, grpid, spaceid;
       float val = 3.1495;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       char name[MAX_NAME_LEN + 1];
       ssize_t size;
 
@@ -238,8 +260,14 @@ main()
 
       if (H5Gget_num_objs(grpid, &num_obj) < 0) ERR;
       if (num_obj != 1) ERR;
+
+#if H5_VERSION_GE(1,12,0)
+      if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+			      0, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR;
+#else
       if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 			     0, &obj_info, H5P_DEFAULT) < 0) ERR;
+#endif
       if (obj_info.type != H5O_TYPE_DATASET) ERR;
       if ((size = H5Lget_name_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC, 0,
 				     NULL, 0, H5P_DEFAULT)) < 0) ERR;
diff --git a/contrib/netcdf/v4/nc_test4/tst_xplatform2.c b/contrib/netcdf/v4/nc_test4/tst_xplatform2.c
index 6b6e1ab24..acefe1807 100644
--- a/contrib/netcdf/v4/nc_test4/tst_xplatform2.c
+++ b/contrib/netcdf/v4/nc_test4/tst_xplatform2.c
@@ -564,7 +564,11 @@ main(int argc, char **argv)
       hid_t file_typeid1[NUM_OBJ], native_typeid1[NUM_OBJ];
       hid_t file_typeid2, native_typeid2;
       hsize_t num_obj, i;
+#if H5_VERSION_GE(1,12,0)
+      H5O_info2_t obj_info;
+#else
       H5O_info_t obj_info;
+#endif
       char obj_name[NC_MAX_NAME + 1];
 
       /* Open one of the netCDF test files. */
@@ -579,8 +583,13 @@ main(int argc, char **argv)
       for (i = 0; i < num_obj; i++)
       {
 	 /* Get the name. */
+#if H5_VERSION_GE(1,12,0)
+	 if (H5Oget_info_by_idx3(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
+                                 i, &obj_info, H5O_INFO_BASIC, H5P_DEFAULT) < 0) ERR_RET;
+#else
 	 if (H5Oget_info_by_idx(grpid, ".", H5_INDEX_CRT_ORDER, H5_ITER_INC,
 				i, &obj_info, H5P_DEFAULT) < 0) ERR_RET;
+#endif
 	 if (H5Lget_name_by_idx(grpid, ".", H5_INDEX_NAME, H5_ITER_INC, i,
 				obj_name, NC_MAX_NAME + 1, H5P_DEFAULT) < 0) ERR_RET;
 	 printf(" reading type %s ", obj_name);

@ostueker
Copy link
Contributor

ostueker commented Nov 6, 2024

Still two libMesh tests/examples are crashing:

  • examples/optimization/optimization_ex2
  • examples/transient/transient_ex2

So far I see the same two examples crash whether I use PETSc 3.19.2 or 3.20.0, though in both cases I've used HDF5 1.14.2 for compiling both libMesh and PETSc.
I'm now testing whether using 1.10.11 for libMesh and PETSc 3.19 makes any difference. I'll update the result of that later.

libMesh HDF5 PETSc Pass/Fail/logs
1.7.5 1.14.2 3.19.6 ❌ 🗒️
1.7.5 1.14.2 3.20.0
1.7.5 1.10.11 3.19.6

I'm attaching the configure log and the output from the two failing examples from the test with HDF5 1.14.2 & PETSc 3.19.6:

Let me know if I should create a separate issue for that.

Edit:
The same examples also fail when using HDF5 1.10.11 & PETSc 3.19.6.

@jwpeterson
Copy link
Member

I had some issues applying your patch because the file paths contain symbolic links (git apply error: affected file is beyond a symbolic link) but I fixed those and it applied cleanly. Attaching here for future reference.

0001-Fixes-for-NetCDF-4-6-2-tests-run-with-HDF-1-12.txt

@ostueker
Copy link
Contributor

ostueker commented Nov 6, 2024

I had some issues applying your patch because the file paths contain symbolic links

Ahh sorry, I forgot to mention that. Yesterday I had the reverse issue because the tar-ball (which I use for the build) doesn't have the contrib/netcdf/netcdf-c-4.6.2/ directory, but instead the files are under contrib/netcdf/v4/.

@ostueker
Copy link
Contributor

ostueker commented Nov 6, 2024

Confirmed! The examples optimization_ex2 and transient_ex2 crash with the error below, not matter whether I use HDF5 1.10.11 or 1.14.2 or PETSc 3.19.6 or 3.20.0 (see test-matrix a few comments up).

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Not for unassembled vector, did you call VecAssemblyBegin()/VecAssemblyEnd()?
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-tao_view (no value) source: command line
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

@jwpeterson
Copy link
Member

OK, that makes it seem like we are missing a close() somewhere in the 1.7.x branch. I can't test libmesh-1.7.x with PETSc 3.19 or 3.20 easily, but I could test it with Petsc 3.17.

PETSc 3.20 came out in September 2023, so it did not even exist when libmesh 1.7.0 was first released. It's definitely possible that these examples worked OK with PETSc from around the same time, but then more/better error checking in PETSc was added subsequently. If you can get a stack trace leading up to that "Object is in wrong state" error, that would be helpful in pointing us to the place where (I assume) we already made the fix in master. As far as I'm aware, those examples work fine with libmesh master and new PETSc.

@jwpeterson
Copy link
Member

OK, I tested my 1.7.5 build with PETSc 3.17 and these two examples definitely don't fail for me. I tested transient_ex2 in both serial and parallel, and didn't get any "Object is in wrong state" errors.

@ostueker
Copy link
Contributor

ostueker commented Nov 6, 2024

How can I run them to get a full stack trace?
So far I've been running them with "make check" in the same directory and redirected the output into a file (see the two examples attached to the post with the table).

@jwpeterson
Copy link
Member

How can I run them to get a full stack trace?

You can run any of the examples from the build directory via make check. For example:

cd examples/transient/transient_ex2
make check

You might be able to get more information by running the executable in the debugger. For transient_ex2, for example, one could do (while in the example build directory):

gdb --args .libs/example-opt pipe-mesh.unv

Once in gdb, type r to run the program, and when it crashes, type bt to see a backtrace. This might not give much helpful information since it's an opt-mode executable, so ideally the error would still be present in a '--with-methods=dbg' build and we could use this approach to get a full stack trace from that.

@ostueker
Copy link
Contributor

ostueker commented Nov 8, 2024

That unfortunately didn't work. I've compiled libMesh with --with-methods=dbg and when I asked gdb for the bt it just said No Stack.

I will be off for the next weeks but will try to get back to this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants