Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up st.select_runs by factor ~100 with strict "_find" in files-storage #371

Merged
merged 1 commit into from
Dec 29, 2020

Conversation

JoranAngevaare
Copy link
Contributor

What is the problem / what does the code in this PR do & can you briefly describe how it works?
You might have noticed that st.select_runs is getting slow lately. After some nice profiling, I traced back the issue to the operation I'm changing in this PR. For each run we could not find in the rundb (rucio) we used the strax.DataDirectory in the context. These in turn look for the keys not found yet. However, there is a very intensive check in DataDirectory._find that loops over all the files in the DataDirectory for each of the missing keys! This makes everything very very slow and without having to do so. This check makes sense only if one does a fuzzy search. Adding the requirement that this is actually a fuzzy search speeds up things a lot, see the comparison below.

E.g. due to a recent version bump in PulseProcessing we noticed that it takes forever to execute st.select_runs (not much available if you version bump at low level.) This meant a lot of redundant looping over files that were clearly not the file that was requested.

Can you give a minimal working example (or illustrate with a figure)?
before:
afbeelding

after
afbeelding

Copy link
Collaborator

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice Christmas present. Thanks a lot Joran.

@JoranAngevaare JoranAngevaare merged commit feb7e61 into master Dec 29, 2020
@JoranAngevaare JoranAngevaare deleted the increase_select_runs_performance branch December 29, 2020 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants