-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve mongo-backend-loading #336
improve mongo-backend-loading #336
Conversation
This reverts commit 6f5814b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DataNotAvailable
exception needs to be moved if we want it to be throwable.
strax/storage/mongo.py
Outdated
chunks_registry = self.db[self.col_name].find( | ||
{**query, **{"chunk_i": {'$exists':True}}}, | ||
{"chunk_i": 1, "data": 1}) | ||
if chunks_registry is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find
returns a Cursor, which never evaluates to None
, even if the cursor is empty. The pymongo documentation doesn't really give us much to go on for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, changed it
strax/storage/mongo.py
Outdated
f'Projection failed, got doc with no "chunk_i":\n{doc}') | ||
# Update our registry with this chunks info. Use chunk_i as | ||
# chunk_key | ||
self.chunks_registry[chunk_key] = doc.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integers as dictionary keys can cause issues later if you try to json-encode it. Probably not an issue here, but something to remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it just to be on the safe side
What is the problem / what does the code in this PR do
Slightly tweak w.r.t. #335 (and #315). Only query the database for chunks once.
Can you briefly describe how it works?
Use
find
to find all chunks, save it in ram and load it from there instead of doing up to 600 queries.Performance
Using
Before
23 s ± 175 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
After
1.91 s ± 70.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)