-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better queries in online monitor #375
Conversation
# might be getting a previously failed document. | ||
# might be getting a previously failed document. Sort argument | ||
# should be obsolete (due to the self.col.delete_many in the | ||
# MongoSaver) but rather safe than sorry. | ||
doc = self.db[self.col_name].find_one({ | ||
**query, 'metadata': {"$exists": True}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The $exists
operator is apparently rather slow. Even though the remainder of the query is indexed, it might be worth thinking how to adjust this so we don't use this operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Darryl, thanks, this PR is designed to do exactly that:
"We should distinguish between metadata and chunk documents. -> To this end I propose to add a provides_meta boolean to each document. This prevents $exists-operations."
But I first need to have a set of documents with this entry in the document before I change the query. I want to change it to:
{**query, 'provides_meta': True}, # <-change to this after TTL has flushed
As far as querying against a boolean versus an integer, that is probably going to depend on the details. Finding one 0/false among a sea of integers/trues is simple, and finding a specific integer in a sea of different integers is also going to be easier than one specific true in a sea of true. |
Alright, thanks Darryl. Given your response I don't see a reason to change True to 1 and False to 0. |
What is the problem / what does the code in this PR do
As explained in #365 and #374 there are two things that need addressing:
collection.delete_many
operation at the init of the saver. This should only happen when the data was not correctly stored. If it is already stored, you won't be accessing theMongoSaver
anyways. This only happens when during progressing that there occurred an error, after this, there is an exception added to the metadata. We should delete this stuff if we try to process the same run again.metadata
andchunk
documents. -> To this end I propose to add aprovides_meta
boolean to each document. This prevents$exists
-operations.Implementation
Please notice that I'm planning on a two stage PR, there are two places that need to be uncommented after a week when all the documents are updated with the
provides_meta
boolean.Question to mongo expert (Darryl)
Does querying against a number provide improved performance w.r.t. querying against a boolean?