Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm waiting for pyiron to submit a large number of jobs right now, so I thought I have a look at how to make it faster.
One of the major bottlenecks is calling
list_all()
/list_groups()
/list_nodes()
to check what datasets/groups are inside the HDF5 files. In fact roughly 75%(!) of loading a small lammps job is spent inFileHDFio.list_all()
.This makes two changes:
h5io.read_hdf5
instead of checking first whether it is there and then reading it. This makes a simple read likejob['output/generic/energy_pot']
faster by about a factor 5, so that it takes roughly the same amount of time as callingh5io.read_hdf5
directly on a file.list_all()
instead oflist_nodes()
andlist_groups
together. This saves opening the HDF5 file once.Both together make loading a lammps jobs about 10% faster.
I want to mention that this
is still about twice as slow as directly doing
so there's still room for improvement.