-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: always wait 200ms before sending updates #9435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In looking at the old code vs. the new, it's not clear to me how it is functionally different. It looks like the same behavior occurs, with what used to be a ticker Stop/re-initialization being converted to a Reset.
It is possible I am missing something, however. The difference between Stop/re-initialization and Stop is lost on me, and the godoc doesn't help.
Always wait 200ms before calling the Node.UpdateAlloc RPC to send allocation updates to servers. Prior to this change we only reset the update ticker when an error was encountered. This meant the 200ms ticker was running while the RPC was being performed. If the RPC was slow due to network latency or server load and took >=200ms, the ticker would tick during the RPC. Then on the next loop only the select would randomly choose between the two viable cases: receive an update or fire the RPC again. If the RPC case won it would immediately loop again due to there being no updates to send. When the update chan receive is selected a single update is added to the slice. The odds are then 50/50 that the subsequent loop will send the single update instead of receiving any more updates. This could cause a couple of problems: 1. Since only a small number of updates are sent, the chan buffer may fill, applying backpressure, and slowing down other client operations. 2. The small number of updates sent may already be stale and not represent the current state of the allocation locally. A risk here is that it's hard to reason about how this will interact with the 50ms batches on servers when the servers under load. A further improvement would be to completely remove the alloc update chan and instead use a mutex to build a map of alloc updates. I wanted to test the lowest risk possible change on loaded servers first before making more drastic changes.
a65d7c8
to
e6fd258
Compare
It's super subtle and my commit message is probably far too verbose. The case I ran into was:
If But it gets worse! At this point not only might the update chan buffer be full (64 updates), but it's plausible many of the updates are now stale and should be replaced by allocs deeper in the chan's buffer (or blocking on sending to the buffer!). So this timing approach causes a cascading load effect when servers are under such load that The fix Stop including the That's it. That's the whole fix. Now it doesn't matter how long The downside is that now we're delaying update allocs further -- I observed this RPC taking 1-2.5s in high load tests: This means that alloc updates would be delayed by update to 2700ms in that cluster. That affects not only operators monitoring The reason I think this is an ok tradeoff is because it provides natural backpressure on heavily loaded servers. If the server is under so much load that
Effectively This is not optimal (as alluded to by the "gross" description). Servers should have intrinsic mechanisms for avoiding the 2 pathological overload cases I mention above. They should not rely on well-behaved clients. Ideas here are welcome but outside the scope of this modest improvement. On the client-side we could be smarter about when we update allocs. I put those ideas in a new issue here: #9451 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update looks good, and makes it more clear that the critical change is removing the if staggered
bit.
This pull request is being automatically deployed with Vercel (learn more). nomad-storybook – ./ui/stories🔍 Inspect: https://vercel.com/hashicorp/nomad-storybook/ajjr2hr0w nomad-ui – ./ui🔍 Inspect: https://vercel.com/hashicorp/nomad-ui/nla25dn1s |
Extension of #14673 Once Vault is initially fingerprinted, extend the period since changes should be infrequent and the fingerprint is relatively expensive since it is contacting a central Vault server. Also move the period timer reset *after* the fingerprint. This is similar to #9435 where the idea is to ensure the retry period starts *after* the operation is attempted. 15s will be the *minimum* time between fingerprints now instead of the *maximum* time between fingerprints. In the case of Vault fingerprinting, the original behavior might cause the following: 1. Timer is reset to 15s 2. Fingerprint takes 16s 3. Timer has already elapsed so we immediately Fingerprint again Even if fingerprinting Vault only takes a few seconds, that may very well be due to excessive load and backing off our fingerprints is desirable. The new bevahior ensures we always wait at least 15s between fingerprint attempts and should allow some natural jittering based on server load and network latency.
Extension of #14673 Once Vault is initially fingerprinted, extend the period since changes should be infrequent and the fingerprint is relatively expensive since it is contacting a central Vault server. Also move the period timer reset *after* the fingerprint. This is similar to #9435 where the idea is to ensure the retry period starts *after* the operation is attempted. 15s will be the *minimum* time between fingerprints now instead of the *maximum* time between fingerprints. In the case of Vault fingerprinting, the original behavior might cause the following: 1. Timer is reset to 15s 2. Fingerprint takes 16s 3. Timer has already elapsed so we immediately Fingerprint again Even if fingerprinting Vault only takes a few seconds, that may very well be due to excessive load and backing off our fingerprints is desirable. The new bevahior ensures we always wait at least 15s between fingerprint attempts and should allow some natural jittering based on server load and network latency.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Always wait 200ms before calling the Node.UpdateAlloc RPC to send
allocation updates to servers.
Prior to this change we only reset the update ticker when an error was
encountered. This meant the 200ms ticker was running while the RPC was
being performed. If the RPC was slow due to network latency or server
load and took >=200ms, the ticker would tick during the RPC.
Then on the next loop only the select would randomly choose between the
two viable cases: receive an update or fire the RPC again.
If the RPC case won it would immediately loop again due to there being
no updates to send.
When the update chan receive is selected a single update is added to the
slice. The odds are then 50/50 that the subsequent loop will send the
single update instead of receiving any more updates.
This could cause a couple of problems:
fill, applying backpressure, and slowing down other client
operations.
represent the current state of the allocation locally.
A risk here is that it's hard to reason about how this will interact
with the 50ms batches on servers when the servers under load.
A further improvement would be to completely remove the alloc update
chan and instead use a mutex to build a map of alloc updates. I wanted
to test the lowest risk possible change on loaded servers first before
making more drastic changes.