t_cache_image hanging on some machines #71
Labels
Component - Parallel
Parallel HDF5 (NOT thread-safety)
Component - Testing
Code in test or testpar directories, GitHub workflows
Priority - 1. High 🔼
These are important issues that should be resolved in the next release
Type - Bug / Bugfix
Please report security issues to [email protected] instead of creating an issue on GitHub
Milestone
Date: Fri, 23 Oct 2020 07:28:52 -0600
From: Orion Poplawski [email protected]
To: HDF Helpdesk [email protected]
Subject: t_cache_image hanging on some machines
Parts/Attachments:
1 Shown ~152 lines Text
[ This message was cryptographically signed but the signature could not be verified. ]
When building hdf5 1.10.6 or 1.10.7 for Fedora Rawhide using the Fedora builders, t_cache_image is hanging when run with openmpi
on some architectures (including x86_64). Unfortunately we cannot reproduce it locally and so are reduced in our ability to debug
the issue. Here is the output of the test:
============================
Testing: t_cache_image
#000: ../../src/H5D.c line 298 in H5Dopen2(): unable to open dataset
major: Dataset
minor: Can't open object
#1: ../../src/H5Dint.c line 1429 in H5D__open_name(): not found
major: Dataset
minor: Object not found
#2: ../../src/H5Gloc.c line 420 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#3: ../../src/H5Gtraverse.c line 848 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#4: ../../src/H5Gtraverse.c line 579 in H5G__traverse_real(): can't look up component
major: Symbol table
minor: Object not found
#5: ../../src/H5Gobj.c line 1118 in H5G__obj_lookup(): can't check for link info message
major: Symbol table
minor: Can't get value
#6: ../../src/H5Gobj.c line 324 in H5G__obj_get_linfo(): unable to read object header
major: Symbol table
minor: Can't get value
#7: ../../src/H5Omessage.c line 873 in H5O_msg_exists(): unable to protect object header
major: Object header
minor: Unable to protect metadata
#8: ../../src/H5Oint.c line 1056 in H5O_protect(): unable to load object header
major: Object header
minor: Unable to protect metadata
#9: ../../src/H5AC.c line 1517 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#10: ../../src/H5C.c line 2378 in H5C_protect(): Can't load cache image
major: Object cache
minor: Unable to load metadata into cache
#11: ../../src/H5Cimage.c line 1164 in H5C__load_cache_image(): Can't reconstruct cache contents from image block
major: Object cache
minor: Unable to decode value
#12: ../../src/H5Cimage.c line 3137 in H5C__reconstruct_cache_contents(): reconstruction of cache entry failed
major: Object cache
minor: Internal error detected
#13: ../../src/H5Cimage.c line 3408 in H5C__reconstruct_cache_entry(): invalid entry size
major: Object cache
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.10.7) MPI-process 1:
#000: ../../src/H5D.c line 298 in H5Dopen2(): unable to open dataset
major: Dataset
minor: Can't open object
#1: ../../src/H5Dint.c line 1429 in H5D__open_name(): not found
major: Dataset
minor: Object not found
#2: ../../src/H5Gloc.c line 420 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#3: ../../src/H5Gtraverse.c line 848 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#4: ../../src/H5Gtraverse.c line 579 in H5G__traverse_real(): can't look up component
major: Symbol table
minor: Object not found
#5: ../../src/H5Gobj.c line 1118 in H5G__obj_lookup(): can't check for link info message
major: Symbol table
minor: Can't get value
#6: ../../src/H5Gobj.c line 324 in H5G__obj_get_linfo(): unable to read object header
major: Symbol table
minor: Can't get value
#7: ../../src/H5Omessage.c line 873 in H5O_msg_exists(): unable to protect object header
major: Object header
minor: Unable to protect metadata
#8: ../../src/H5Oint.c line 1056 in H5O_protect(): unable to load object header
major: Object header
minor: Unable to protect metadata
#9: ../../src/H5AC.c line 1517 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#10: ../../src/H5C.c line 2378 in H5C_protect(): Can't load cache image
major: Object cache
minor: Unable to load metadata into cache
#11: ../../src/H5Cimage.c line 1164 in H5C__load_cache_image(): Can't reconstruct cache contents from image block
major: Object cache
minor: Unable to decode value
#12: ../../src/H5Cimage.c line 3137 in H5C__reconstruct_cache_contents(): reconstruction of cache entry failed
major: Object cache
minor: Internal error detected
#13: ../../src/H5Cimage.c line 3408 in H5C__reconstruct_cache_entry(): invalid entry size
major: Object cache
minor: Bad value
It would be helpful to know what the developers think of this and what we could do to further debug the issue.
–
Orion Poplawski
Manager of NWRA Technical Systems 720-772-5637
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane [email protected]
Boulder, CO 80301 https://www.nwra.com/
It's inside a VM, on an XFS filesystem.
The text was updated successfully, but these errors were encountered: