-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the loop variable scheduler issue #4468
Conversation
cc : @leonlnj |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4468 +/- ##
===========================================
+ Coverage 58.98% 76.39% +17.41%
===========================================
Files 488 18 -470
Lines 41772 1521 -40251
===========================================
- Hits 24639 1162 -23477
+ Misses 15172 295 -14877
+ Partials 1961 64 -1897
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I'm actually pretty lost... can you help me understand please? https://goplay.tools/snippet/mNyKXEvyv4q I understand that the pointer inside the iterator ( can you also add a comment where you do schedule := s? feels like something i would just delete if i came across it. |
@wild-endeavor Because the jobStore store the |
ohh we're saving it, got it, didn't see that. thank you @leonlnj |
@pmahindrakar-oss could you still remove that weird file? i added a comment. |
@wild-endeavor this might be better example to showcase the issue https://goplay.tools/snippet/J1rE_i5BJxv and have removed the delve debugger file |
cf3efa6
to
32c6c70
Compare
6a92f3c
to
74d39e7
Compare
Signed-off-by: pmahindrakar-oss <[email protected]>
Signed-off-by: pmahindrakar-oss <[email protected]>
Signed-off-by: pmahindrakar-oss <[email protected]>
Signed-off-by: pmahindrakar-oss <[email protected]>
Signed-off-by: pmahindrakar-oss <[email protected]>
74d39e7
to
c5fa1b4
Compare
Tracking issue
#4285
Describe your changes
All kudos to @leonlnj who discovered this bug in Gojek production deployment and helped debug this nasty loop variable issue which caused unnecessary production issue where schedules ran with incorrect times
Overall the issue that was observed was when Gojek redeployed there scheduler, scheduler ran its regular catchup routine to recover from the missed schedule time during the redeploy and found schedules it needed to trigger.
This is well tested behavior in the fleet and we never encountered any issues with it.
But in case of Gojek, it mysteriously ran the schedules which shouldn't have been run and also ran with incorrect catcup from time.
There is much detail in the attached issue.
To summarize,
The unit test written by Gojek team nicely capture this behavior.
Have updated those tests with assertions and added the loop variable fix.
There is ongoing open golang issue to fix the gotcha which many go developers have complained or ran into golang/go#56010
Hopefully they fix that soon
Check all the applicable boxes