You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lots of runs failed before compute_halo_properties.py finished with the following message:
mca_fbtl_posix_pwritev: error in (p)write(v):No space left on device
dynamic_gen2_write_all: fbtl_pwritev failed
Depending on the point at which the run fails, SOAP produces halo_properties_XXXX.hdf5 with different file sizes. It is unclear which properties/datasets are incomplete in the final file. The compression does not fail on the next step.
The text was updated successfully, but these errors were encountered:
jemme07
changed the title
s8
compute_halo_properties.py fails on snap8 due to "No space left on device"
May 24, 2023
Do you have the stdout from one of these runs? I'd like to know if SOAP crashed. The worst case scenario would be if MPI-IO failed to write data without raising an exception.
Assuming that's not the case, we could write the output to a temporary filename and then rename it after everything is written successfully so that the output file is only present if it was fully written.
I don't think we have a good fix for this. It looks like a collective write fails to write data because the disk is full but the error code gets lost somewhere before it gets back to python so h5py doesn't raise an exception and we can't detect the failure.
Lots of runs failed before compute_halo_properties.py finished with the following message:
Depending on the point at which the run fails, SOAP produces halo_properties_XXXX.hdf5 with different file sizes. It is unclear which properties/datasets are incomplete in the final file. The compression does not fail on the next step.
The text was updated successfully, but these errors were encountered: