-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove timeout on MessageOutboxDeliveryService #437
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #437 +/- ##
==========================================
+ Coverage 61.73% 61.76% +0.03%
==========================================
Files 232 232
Lines 11825 11827 +2
Branches 1510 1510
==========================================
+ Hits 7300 7305 +5
+ Misses 3998 3995 -3
Partials 527 527 ☔ View full report in Codecov by Sentry. |
@Enkidu93, do you have a reference to any documentation that uses the pattern that you are recommending? |
That is, do you have documentation that this solution will work, or did you test it out empirically? |
Also, can you change the dot net code to correct the pre-translation indexing as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a section where the documentation discusses techniques for resuming/starting change streams: https://www.mongodb.com/docs/manual/changeStreams/#resumeafter-for-change-streams
In the same area, you can see information about the method we're using which is to start from a particular time. If you look at the logs on Atlas, you can see that the change stream command is running from the same start time every time until an insertion has happened. And the command reruns because of the timeout. Inasmuch as I can verify locally, I have. Of course, it's not quite as convenient to see into things on the local instance as it is via Atlas.
Note: Since the outbox code is only on QA for the time being, the issue is only visible there. I'm concerned about pushing current QA code to production given the strain this puts on Mongo, but at the same time, this issue should actually be less problematic the more activity there is in the outbox.
In regards to updating the .NET in some way to do with indexing, I'm not sure what you mean? I created the index in the db. What do I need to do in the code?
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ddaspit and @johnml1135)
The removal of the timeout appears like the right choice - I didn't understand the rational behind the timeout (it was only used in Machine for the Locks to not grind the whole software to a halt). As for indexing, I think we would want a compond index. That way we can sort and index by (at least) the EngineId and CorpusId, if not also the Scripture reference. Here is how to implement it in C#. I don't know the best way to add it - would we be able to add a new parameter of type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)
Check that - CreateIndexModels are already being created for the pretranslations - but they might not be ideal. I pushed some code - what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
Yes, I already created a compound index in Atlas on pretranslations on the engineRef and modelRevision (the recommended index given the queries from SF although it looks like maybe a compound index that includes the textId as well could help so I'll go ahead and make that as well). But I see what you're saying: You are wanting me to verify that those indices exist programmatically in the the mongo instance on startup - I can do that. For Atlas, it probably isn't as necessary, but it won't hurt.
Reviewable status: 1 of 4 files reviewed, all discussions resolved (waiting on @ddaspit)
Can you review the code that I pushed and see if it resolves the issues? I also checked for other places in the code that look like they should have a multi-index but don't. |
…ful compound indices on pretranslations (and remove redundant index on engineRef).
I think this is the arrangement we want (although I'd like to spend some more time peeking at all the indices at some point and giving it more thought). We don't want to remove the simple indices even if we do add compound indices unless the simple index is a prefix of the compound index. We also want to avoid unnecessary indices because they take up memory and make writes more expensive. I went ahead and looked at how frequently the indices are being used and removed ones that have never been used. I removed redundant simple indices and added the suggested compound index on pretranslations. |
Please test on docker-compose with Atlas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 5 files reviewed, all discussions resolved (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)
Fixes #422
I've gone ahead and added the recommended index on pretranslations which should speed up queries there significantly, but ultimately, it wasn't the queries on pretranslations that were taxing mongo and causing it to send alerts.
Rather, it has to do with a bug in the
MessageOutboxDeliveryService
's subscription. A timeout of 10 seconds is used when waiting for a new message to appear. However, when looping back to wait again, it's looking for changes from the same time onward since the timestamp which it uses as a placeholder only updates when there's been a change (i.e., not when it's timed out). For now, I've simply removed the timeout which fits with usage elsewhere. Now it should wait until there's been a change without rerunning the changeStream command. In recent versions of Mongo, it's also possible to resume changeStreams. It may be better (if we have situations in which we want a timeout but will also re-wait for changes) to use the resumeToken strategy to resume rather than restart.This change is