Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better queries in online monitor #375

Merged
merged 3 commits into from
Jan 7, 2021
Merged

Conversation

JoranAngevaare
Copy link
Contributor

@JoranAngevaare JoranAngevaare commented Jan 4, 2021

What is the problem / what does the code in this PR do
As explained in #365 and #374 there are two things that need addressing:

  • When re-making data, we should clear previous entries. -> To this end I added a collection.delete_many operation at the init of the saver. This should only happen when the data was not correctly stored. If it is already stored, you won't be accessing the MongoSaver anyways. This only happens when during progressing that there occurred an error, after this, there is an exception added to the metadata. We should delete this stuff if we try to process the same run again.
  • We should distinguish between metadata and chunk documents. -> To this end I propose to add a provides_meta boolean to each document. This prevents $exists-operations.

Implementation
Please notice that I'm planning on a two stage PR, there are two places that need to be uncommented after a week when all the documents are updated with the provides_meta boolean.

Question to mongo expert (Darryl)
Does querying against a number provide improved performance w.r.t. querying against a boolean?

@JoranAngevaare JoranAngevaare linked an issue Jan 4, 2021 that may be closed by this pull request
# might be getting a previously failed document.
# might be getting a previously failed document. Sort argument
# should be obsolete (due to the self.col.delete_many in the
# MongoSaver) but rather safe than sorry.
doc = self.db[self.col_name].find_one({
**query, 'metadata': {"$exists": True}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $exists operator is apparently rather slow. Even though the remainder of the query is indexed, it might be worth thinking how to adjust this so we don't use this operator.

Copy link
Contributor Author

@JoranAngevaare JoranAngevaare Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Darryl, thanks, this PR is designed to do exactly that:

"We should distinguish between metadata and chunk documents. -> To this end I propose to add a provides_meta boolean to each document. This prevents $exists-operations."

But I first need to have a set of documents with this entry in the document before I change the query. I want to change it to:

{**query, 'provides_meta': True},  # <-change to this after TTL has flushed

@darrylmasson
Copy link
Contributor

As far as querying against a boolean versus an integer, that is probably going to depend on the details. Finding one 0/false among a sea of integers/trues is simple, and finding a specific integer in a sea of different integers is also going to be easier than one specific true in a sea of true.

@JoranAngevaare
Copy link
Contributor Author

Alright, thanks Darryl. Given your response I don't see a reason to change True to 1 and False to 0.

@JoranAngevaare JoranAngevaare merged commit e2e65e0 into master Jan 7, 2021
@JoranAngevaare JoranAngevaare deleted the better_om_queries branch January 7, 2021 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increase mongo performance
2 participants