-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled flows accumulate sleep overhead between long appointment delta #1214
Comments
I refactored the scheduling to not do any sleep inside the scheduled class. It relies on the sleep in @petersilva , @MagikEh , @andreleblanc11 and I discussed on Teams whether the scheduled next time to gather should be calculated from the current time when the gather actually runs + scheduled_interval, or from the previously scheduled time + scheduled_interval. My example:
@MagikEh brought up good points about what would happen if polling (or other work) takes longer than the scheduled_interval. If every poll takes longer than the scheduled interval, the next poll would always happen immediately. If the poll takes longer than the scheduled interval occasionally, then using the last scheduled time to calculate the next gather time means that every subsequent gather after the long one would happen immediately, until the lost time is made up. Using the current time for scheduling the next gather, the next scheduled gather after the long poll would happen immediately, but then the next gather after that would be scheduled for a future time. This gives a bit of a cool down period. @petersilva said "An argument against precise intervals is it tends to synchronize work... that we don't want synchronized... if you have 150 polls... is there a good reason for them all to start polling at exactly the same time? It's not synchronized swimming." Calculating the next scheduled time from the last scheduled time would cause a lot of synchronization, leading to load spikes at common intervals (30s, 1m, 15m, 30m, etc.), so it seems like calculating based on the current gather time is the better option for us. |
fwiw, what james wants already works:
will run pretty close to every fifteen minutes starting at exactly those times. This is new in 3.0.55, because prior to that you would have had to also specify scheduled_hour 0,1,2,3... (all of them.) which was a bit ugly, now it just defaults to all of them if omitted. |
I re-read the original description from Andre, and I think the problem was partially caused by the housekeeping needing to run at roughly the same time as the appointment. It would sleep until housekeeping was required, which coincided with when the next appointment was scheduled. So it would run housekeeping, but by the time that finished and the scheduled gather was called again, the appointment was already in the past. But I added likely fixed because we merged #1223, that refactors the scheduling so all sleeping happens in the main flow run loop. Now the scheduled code doesn't need to worry about housekeeping at all. And if it is too late to the next scheduled appointment, it will still do the poll/gather and only "future" appointments might get skipped if they're already in the past. |
When @andreleblanc11 gets back, he can try to reproduce the issue he had with the new version. I agree that it is likely resolved... I'm just not sure. |
With the refactored code, there doesn't seem to be any skipped appointments like previously. We used to be missing posts because of missed appointments, but now we aren't getting the same scenario.
I also checked if we had similar results earlier (2 weeks ago), and the results were the same. I think this issue can be closed. |
A scheduled flow will sometimes miss appointments when there are big gaps between appointments.
I tried looking at the code to find if there was an area where the sleep interval wasn't being properly adjusted, but couldn't find anything significant.
We already have logic to adjust the total sleep time for the housekeeping interval duration. We also update the sleep time at every call of
wait_until
which should take into account the sleep overhead accounted in the while loop ofwait_until_seconds
sarracenia/sarracenia/flowcb/scheduled/__init__.py
Lines 165 to 170 in 6f25ee6
Regardless, @petersilva and I agreed it would be good that the code checks for a 1 minute buffer before and after the appointment time, so that we don't skip apointments for a little bit of overhead.This should cover most cases where there is some overhead.
That would likely mean changing line 188
sarracenia/sarracenia/flowcb/scheduled/__init__.py
Lines 185 to 190 in 6f25ee6
The text was updated successfully, but these errors were encountered: