You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I have been asked, I am creating this issue to document the problem that we have observed in some of the HDF5 tests that create a file in a parallel execution. The effect of this problem is wrong contents resulting in the file under Unify (i.e. the contents of the file produced by a Unify execution are different from what is produced by a non-Unify execution). This program is a simplified version of test CCHUNK5 in the HDF5 testsuite.
The current workaround for this problem is one of these:
(a) insert a call to H5Fflush(file_id) at an appropriate location in the source file t_chunk.c
or
(b) run the program with Unify's setting UNIFYFS_CLIENT_WRITE_SYNC=1
The program creates the HDF5 file named ParaTest.h5. With Unify, when none of the workarounds above is applied, the resulting file has these differences to the file produced without Unify:
The first two bytes do not matter: they are a timestamp in the HDF5 file, so it is expected that they will differ. However, bytes 4865/4869/5057/5061 are indeed different/wrong.
I am leaving here the three source files (testchunk.c, t_chunk.c and t_ds.c), plus a Makefile. This Makefile builds two versions of the program, one without Unify (testchunk) and one with Unify (testchunk-gotcha). These executables are copied to sub-dirs MPI/ and UNIFY/, respectively, so that they can be executed from there.
It must be noted that the Makefile defines two locations:
UNIFYFS is where Unify was installed. HDF5 is where the SOURCES of HDF5-1.10.2 are located. The build process does require the HDF5 sources, because there are include files from HDF5 that are needed and are not available in the system. The h5pcc command used in the Makefile must be obtained in the Catalyst system with the command module load hdf5-parallel:
$ which h5pcc
/usr/tce/packages/hdf5/hdf5-parallel-1.10.2-intel-19.0.4-mvapich2-2.3/bin/h5pcc
The program has been tested with 4 processors on 2 Catalyst nodes (i.e. 2 processors/node).
The three source files and the Makefile are in the gzip file below (chunk5.gz).
As I have been asked, I am creating this issue to document the problem that we have observed in some of the HDF5 tests that create a file in a parallel execution. The effect of this problem is wrong contents resulting in the file under Unify (i.e. the contents of the file produced by a Unify execution are different from what is produced by a non-Unify execution). This program is a simplified version of test CCHUNK5 in the HDF5 testsuite.
The current workaround for this problem is one of these:
(a) insert a call to H5Fflush(file_id) at an appropriate location in the source file t_chunk.c
or
(b) run the program with Unify's setting UNIFYFS_CLIENT_WRITE_SYNC=1
The program creates the HDF5 file named ParaTest.h5. With Unify, when none of the workarounds above is applied, the resulting file has these differences to the file produced without Unify:
[mendes3@catalyst160:UNIFY]$ cmp -b -l ParaTest.h5 ../MPI/ParaTest.h5
949 124 T 220 M-^P
950 63 3 62 2
4865 0 ^@ 1 ^A
4869 0 ^@ 2 ^B
5057 0 ^@ 3 ^C
5061 0 ^@ 4 ^D
The first two bytes do not matter: they are a timestamp in the HDF5 file, so it is expected that they will differ. However, bytes 4865/4869/5057/5061 are indeed different/wrong.
I am leaving here the three source files (testchunk.c, t_chunk.c and t_ds.c), plus a Makefile. This Makefile builds two versions of the program, one without Unify (testchunk) and one with Unify (testchunk-gotcha). These executables are copied to sub-dirs MPI/ and UNIFY/, respectively, so that they can be executed from there.
It must be noted that the Makefile defines two locations:
UNIFYFS=/g/g12/mendes3/UnifyFS-581/UnifyFS/install
HDF5=/g/g12/mendes3/HDF5-1.10.2/hdf5-1.10.2/hdf5
UNIFYFS is where Unify was installed. HDF5 is where the SOURCES of HDF5-1.10.2 are located. The build process does require the HDF5 sources, because there are include files from HDF5 that are needed and are not available in the system. The h5pcc command used in the Makefile must be obtained in the Catalyst system with the command module load hdf5-parallel:
$ which h5pcc
/usr/tce/packages/hdf5/hdf5-parallel-1.10.2-intel-19.0.4-mvapich2-2.3/bin/h5pcc
The program has been tested with 4 processors on 2 Catalyst nodes (i.e. 2 processors/node).
The three source files and the Makefile are in the gzip file below (chunk5.gz).
chunk5.gz
The text was updated successfully, but these errors were encountered: