Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken Julia CI #4539

Closed
derobins opened this issue Jun 2, 2024 · 9 comments
Closed

Fix broken Julia CI #4539

derobins opened this issue Jun 2, 2024 · 9 comments
Assignees
Labels
Component - Testing Code in test or testpar directories, GitHub workflows Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub
Milestone

Comments

@derobins
Copy link
Member

derobins commented Jun 2, 2024

The Julia GitHub CI actions have been broken for the past week or two, both in Autotools and CMake. There were no obvious changes that could have caused these failures. They will often pass when re-run.

We will need to investigate why they are failing. Since it's random, it may be a memory issue, either in the Julia wrappers or the HDF5 library.

@derobins derobins added Priority - 1. High 🔼 These are important issues that should be resolved in the next release Component - Testing Code in test or testpar directories, GitHub workflows Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub labels Jun 2, 2024
@derobins derobins added this to the 1.14.5 milestone Jun 2, 2024
@derobins
Copy link
Member Author

derobins commented Jun 2, 2024

Sample test failure output:

Test Summary:                      | Pass  Fail  Broken  Total
HDF5.jl                            | 1497     2       3   1502
  plain                            |  151             1    152
  complex                          |   13                   13
  undefined and null               |    4                    4
  abstract arrays                  |    2                    2
  empty and 0-size arrays          |   39                   39
  generic read of native types     |   17                   17
  show                             |   44                   44
  split1                           |   13                   13
  haskey                           |   18                   18
  AbstractString                   |   51                   51
  opaque data                      |    7                    7
  FixedStrings and FixedArrays     |   18                   18
  Object Exists                    |    8                    8
  HDF5 existance                   |    4                    4
  bounds                           |    2                    2
  create_dataset                   |  264                  264
  Strings                          |    8                    8
  h5a_iterate                      |    7     1              8
  h5l_iterate                      |    7     1              8
  h5dchunk_iter                    |    3                    3
  compound                         |   10                   10
  create_dataset (compound)        |    4                    4
  write_compound                   |   27                   27
  custom                           |    6                    6
  reference                        |    6                    6
  null dataspace                   |   13                   13
  scalar dataspace                 |   15                   15
  simple dataspaces                |   98                   98
  BlockRange                       |   42                   42
  hyperslab                        |    6                    6
  Datatypes                        |   15                   15
  hyperslab                        |    5                    5
  read 0-length arrays: issue #859 |                     No tests
  attrs interface                  |   92                   92
  variable length strings          |    1                    1
  readremote                       |   23                   23
  extend                           |   29                   29
  gc                               |  101                  101
  external                         |    6                    6
  swmr                             |    4                    4
  mmap                             |    9                    9
  properties                       |   46             1     47
  filter                           |   80                   80
  Raw Chunk I/O                    |   80                   80
  fileio                           |    6                    6
  track order                      |   18                   18
  h5f_get_dset_no_attrs_hint       |    6                    6
  non-allocating methods           |   11             1     12
  Compression Filter Unit Tests    |    6                    6
  Object API                       |   38                   38
  virtual dataset                  |    5                    5
  mpio                             |    1                    1
ERROR: LoadError: Some tests did not pass: 1[497](https://github.com/HDFGroup/hdf5/actions/runs/9333687611/job/25700081685?pr=4538#step:11:500) passed, 2 failed, 0 errored, 3 broken.
in expression starting at /home/runner/work/hdf5/hdf5/test/runtests.jl:34
ERROR: LoadError: Package HDF5 errored during testing
Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Types.jl:55
 [2] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, julia_args::Cmd, test_args::Cmd, test_fn::Nothing)
   @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/Operations.jl:1712
 [3] test(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; coverage::Bool, test_fn::Nothing, julia_args::Vector{String}, test_args::Cmd, kwargs::Base.Iterators.Pairs{Symbol, IOContext{Base.PipeEndpoint}, Tuple{Symbol}, NamedTuple{(:io,), Tuple{IOContext{Base.PipeEndpoint}}}})
   @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:343
 [4] test(pkgs::Vector{Pkg.Types.PackageSpec}; io::IOContext{Base.PipeEndpoint}, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:coverage, :julia_args), Tuple{Bool, Vector{String}}}})
   @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:80
 [5] test(; name::Nothing, uuid::Nothing, version::Nothing, url::Nothing, rev::Nothing, path::Nothing, mode::Pkg.Types.PackageMode, subdir::Nothing, kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol}, NamedTuple{(:coverage, :julia_args), Tuple{Bool, Vector{String}}}})
   @ Pkg.API /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Pkg/src/API.jl:96
 [6] top-level scope
   @ ~/work/_actions/julia-actions/julia-runtest/latest/test_harness.jl:15
 [7] include(fname::String)
   @ Base.MainInclude ./client.jl:444
 [8] top-level scope
   @ none:1
in expression starting at /home/runner/work/_actions/julia-actions/julia-runtest/latest/test_harness.jl:7
Error: Process completed with exit code 1.

@derobins
Copy link
Member Author

derobins commented Jun 2, 2024

@mkitti - Any ideas?

@mkitti
Copy link
Contributor

mkitti commented Jun 2, 2024

Could you point me to the CI output?

These both point to issues with the callback mechanism for the iteration functions. I'm not sure which exact test is failing yet though.

derobins added a commit to derobins/hdf5 that referenced this issue Jun 2, 2024
These have been failing for a week or two for unclear reasons, both
in the Autotools and CMake. No obvious library changes triggered
this.

See GitHub issue HDFGroup#4539 for more info/discussion

The Julia tests will be disabled until the root cause is found.
@mkitti
Copy link
Contributor

mkitti commented Jun 2, 2024

Incidentally, we also seem to be having some issues with Windows builds lately:
JuliaPackaging/Yggdrasil#8588

@derobins
Copy link
Member Author

derobins commented Jun 2, 2024

Error output (Autotools) here:

https://github.com/HDFGroup/hdf5/actions/runs/9333687611/job/25707113092

Any recent test failure in HDF5 will likely be a Julia failure.

@derobins
Copy link
Member Author

derobins commented Jun 2, 2024

Could you point me to the CI output?

These both point to issues with the callback mechanism for the iteration functions. I'm not sure which exact test is failing yet though.

Yeah, with the randomness of the error, my guess is that there is some uninitialized memory usage someplace. Maybe -fsanitize=memory on clang would help.

derobins added a commit that referenced this issue Jun 2, 2024
These have been failing for a week or two for unclear reasons, both
in the Autotools and CMake. No obvious library changes triggered
this.

See GitHub issue #4539 for more info/discussion

The Julia tests will be disabled until the root cause is found.
@mkitti
Copy link
Contributor

mkitti commented Jun 2, 2024

Yes, I'm noticing the randomness as well. The issue appears to involve an error being thrown within the Julia callback function. The error gets caught by a Julia try-catch and the callback returns -1.

The problem is that after iteration stops, we are not receiving the error code upon return of H5Aiterate2.

The CI test that is failing checks to see that an error is received when the callback throws an error. The test fails because the error is not detected.

The Julia error reference itself is returned via opdata.

@mkitti
Copy link
Contributor

mkitti commented Jun 2, 2024

I've preparing to disable the affected tests here:
JuliaIO/HDF5.jl#1155

I will merge shortly.

@mkitti
Copy link
Contributor

mkitti commented Jun 2, 2024

I have a successful CI run here:
https://github.com/JuliaIO/HDF5.jl/actions/runs/9341332687/attempts/1

I'm running it one more time before I merge to make sure that there are no stochastic error nows.

lrknox pushed a commit to lrknox/hdf5 that referenced this issue Jun 7, 2024
These have been failing for a week or two for unclear reasons, both
in the Autotools and CMake. No obvious library changes triggered
this.

See GitHub issue HDFGroup#4539 for more info/discussion

The Julia tests will be disabled until the root cause is found.
derobins added a commit that referenced this issue Jun 8, 2024
* Fix daily-build CI and correct use of *_FOUND settings for filters (#4504)

* Correct examples tests to just run under dynamic analysis (#4505)

* Remove trailing extra whitespace in hyperlink (#4509)

* Set H5 specific vars immediately if legacy find (#4512)

* Set H5 specific vars immediately if legacy find

* Correct find process vars (vs in-line build)

* Correct SZIP find

* Everything is libaec 1.0.6 or newer

* Correct option help text

* Don't update 'pos' and 'op' fields when using pread/pwrite (#4492)

Instead of reading the absolute minimal possible, use the likely value of
a v2+ superblock w/8-byte addresses & lengths.

* Fix spelling (#4522)

* Fix typo in DAPL callback documentation (#4523)

* Move/rename libhdf5.settings input files (#4525)

Move without other changes:

src/libhdf5.settings.in -> src/libhdf5.settings.autotools.in
config/cmake/libhdf5.settings.cmake.in -> src/libhdf5.settings.cmake.in

* Disable UNITY_BUILD for now - globally (#4515)

* Fix function name in USAGE for H5Pencode2() (#4519)

* Allow HDF5_LIB_INFIX to work with DLL (#4500)

* Allow HDF5_LIB_INFIX to work with DLL

* Separate individual library name into parts and add suffix option

* Java cannot use alternative names and removed extra setting

* Incorporate the underscore into the CORE name

* Fix typos in property callback documentation (#4532)

* Fix wrong int type as some systems have int as 64-bit wide (#4534)

* H5FDquery return value (#4530)

* Switch H5FDquery() return values to use library's FAIL / SUCCEED macros

* Update return value also

* Refactor to reduce code duplication (#4531)

* Update error output w/new routine name

* Fix a few function names in USAGE comments that don't match the actual (#4533)

* Fix a few function names in USAGE comments that don't match the actual
function names.

* Remove typo '['

* Switch to working url for api-compatibility-macros.html.

* Remove julia CI actions (#4540)

These have been failing for a week or two for unclear reasons, both
in the Autotools and CMake. No obvious library changes triggered
this.

See GitHub issue #4539 for more info/discussion

The Julia tests will be disabled until the root cause is found.

* Bump the github-actions group with 3 updates (#4538)

Bumps the github-actions group with 3 updates: [softprops/action-gh-release](https://github.com/softprops/action-gh-release), [ossf/scorecard-action](https://github.com/ossf/scorecard-action) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `softprops/action-gh-release` from 2.0.4 to 2.0.5
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](softprops/action-gh-release@9d7c94c...69320db)

Updates `ossf/scorecard-action` from 2.3.1 to 2.3.3
- [Release notes](https://github.com/ossf/scorecard-action/releases)
- [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md)
- [Commits](ossf/scorecard-action@0864cf1...dc50aa9)

Updates `github/codeql-action` from 3.25.3 to 3.25.7
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@d39d31e...f079b84)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
- dependency-name: ossf/scorecard-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix various mistakes in doxygen docs (#4541)

* Fix a dead link and example file names

* Add the missing content of a section

* Export HDF5 parallel status for CMake FetchContent'ed VOL connectors (#4542)

* Remove an unnecessary check for parallel and thread-safety from examples (#4543)

* Add option to use zlib-ng as zlib library (#4487)

* Export HDF5 version for CMake FetchContent'ed VOL connectors (#4548)

* Adjust h5repack userblock option to allow reserve size (#4544)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Allen Byrne <[email protected]>
Co-authored-by: H. Joe Lee <[email protected]>
Co-authored-by: Quincey Koziol <[email protected]>
Co-authored-by: jhendersonHDF <[email protected]>
Co-authored-by: mattjala <[email protected]>
Co-authored-by: Dana Robinson <[email protected]>
Co-authored-by: Peter Chang <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: bmribler <[email protected]>
Co-authored-by: Scot Breitenfeld <[email protected]>
lrknox pushed a commit to lrknox/hdf5 that referenced this issue Aug 27, 2024
derobins added a commit that referenced this issue Aug 27, 2024
* Use gfortran 14 for cmake-ctest.yml on mac (#4739)

* Use gfortran 14 for cmake-test on mac

* Remove notarization step

* Address @byrnHDF review

* Fix enum type mismatch warning (#4741)

* Fix macro redefined warnings (#4744)

Removes a duplicated HDopen macro from the performance testing programs

* Update nvhpc CI version to 24.7 (#4740)

* Return basic HTTP range GET logging to ROS3 (#4738)

* Add minimal amount of S3 request logging to ROS3

* Fix ROS3 logging ifdef conditions

* Replace non-VOL calls with VOL calls - part 1 (#4745)

This PR is part of the incremental switching H5I_object() and H5I_object_verify()
to their VOL counterpart, H5VL_object() and H5VL_vol_object_verify(), a newly addedinternal function.

Fixes GH-4730 partially.

* Fix inconsistent documentation of get_name functions (#4715)

- Verified that the listed functions do not include null terminator in the returned length
- Improved some of the tests
- Corrected documentation

Fixes GH-4704

* Casted a positive int to size_t

* Remove HDF-EOS5 CI action (#4750)

The code can't be downloaded due to changes that put it behind an
EarthData login. We'll disable this while we figure out a work-around.

* Replace non-VOL calls with VOL calls - part 2 (#4748)

This PR switches H5I_object_verify() to H5VL_vol_object_verify() in the H5F API
and fixes documentation of H5Fmount and H5Funmount.

* More on H5F API

* Restore rand_r in a few parallel tests (#4749)

The t_pmulti_dset and t_select_io_dset tests rely on the behavior
of the previous private rand_r-like implementation to get the
correct sequence of random numbers to pass. This has been restored
using a fully private rand_r-like implementation that doesn't
rely on rand_r and will work on Windows and other platforms
where rand_r doesn't exist.

* Don't run AOCC parallel tests with -j2 (#4752)

Don't run parallel tests in both Autotools and CMake with multiple
processes. ph5diff still runs with -j2 w/ Autotools since the test
script is in the tools/test/h5diff directory.

* Split off AOCC CMake parallel tests

* Remove unnecessary NPROCS env vars

* Put NPROCS back in serial tests

We run ph5diff tests there

* Replace non-VOL calls with VOL calls - part 3 (#4756)

This PR switches H5I_object_verify() to H5VL_vol_object_verify() in the H5G API
and removes unnecessary casts.

* Turn on parallel CI tests in Autotools & CMake (#4573)

* Fix typo in H5Centry.c (#4762)

* Set/Unset VOL wrapping context in H5VL_attr_close (#4759)

* Add missing C++ and Fortran to Intel oneAPI CI (#4761)

* Add Fortran and C++ to Autotools
* Add Fortran and C++ to Linux CMake
* Add C++ to Windows CMake
* Fix bad GitHub workspace variable

* Remove early test exit (#4757)

* Don't skip file tests

* Remove test with invalid flag for H5Fopen

* Verify that create/open of unseekable file fails

* Remove failure verification

* Restore Julia CI (#4763)

Fixes #4539

* Capitalize f in (#4766)

* Add testing to NVHPC CI actions (CMake & Autotools) (#4760)

Turns on testing, both serial and parallel, but skips:
* dt_arith and dtransform in CMake
* All main library tests in the Autotools
Due to dt_arith and dtransform segfaults when handling long doubles.

* Fix typo in H5T_order_t enum (#4773)

'bit endian' --> 'big endian'

* Correct julia workflows name for hdf5_1_14 branch.

---------

Co-authored-by: H. Joe Lee <[email protected]>
Co-authored-by: Aleksandar Jelenak <[email protected]>
Co-authored-by: bmribler <[email protected]>
Co-authored-by: Dana Robinson <[email protected]>
Co-authored-by: jhendersonHDF <[email protected]>
Co-authored-by: mattjala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - Testing Code in test or testpar directories, GitHub workflows Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to [email protected] instead of creating an issue on GitHub
Projects
None yet
Development

No branches or pull requests

4 participants