Speed up st.select_runs by factor ~100 with strict "_find" in files-storage #371
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the problem / what does the code in this PR do & can you briefly describe how it works?
You might have noticed that
st.select_runs
is getting slow lately. After some nice profiling, I traced back the issue to the operation I'm changing in this PR. For each run we could not find in the rundb (rucio) we used the strax.DataDirectory in the context. These in turn look for the keys not found yet. However, there is a very intensive check inDataDirectory._find
that loops over all the files in theDataDirectory
for each of the missing keys! This makes everything very very slow and without having to do so. This check makes sense only if one does a fuzzy search. Adding the requirement that this is actually a fuzzy search speeds up things a lot, see the comparison below.E.g. due to a recent version bump in
PulseProcessing
we noticed that it takes forever to executest.select_runs
(not much available if you version bump at low level.) This meant a lot of redundant looping over files that were clearly not the file that was requested.Can you give a minimal working example (or illustrate with a figure)?
before:
after