-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logs: fix missing allocation logs after update to Nomad 1.5.4 #17087
Conversation
d54c610
to
45af110
Compare
45af110
to
775c665
Compare
When the server restarts for the upgrade, it loads the `structs.Job` from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value for `Enabled` is the zero value, which is false. This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the `structs.Job` that the server has, which has `Enabled = false`, and logs will not be collected. This changeset fixes the bug by adding a new field `Disabled` which defaults to false (so that the zero value works), and deprecates the old field. Fixes #17076
775c665
to
4ef8b96
Compare
@schmichael I've addressed your comments and added some extra tests around those cases. I've also done bench testing across upgrades: both from 1.5.3 to 1.5.5-dev and from 1.5.3 to 1.5.4 to 1.5.5-dev. The nice thing I discovered with the fix is that allocations with missing |
When the server restarts for the upgrade, it loads the
structs.Job
from the Raft snapshot/logs. The jobspec has long since been parsed, so none of the guards around the default value are in play. The empty field value forEnabled
is the zero value, which is false.This doesn't impact any running allocation because we don't replace running allocations when either the client or server restart. But as soon as any allocation gets rescheduled (ex. you drain all your clients during upgrades), it'll be using the
structs.Job
that the server has, which hasEnabled = false
, and logs will not be collected.This changeset fixes the bug by adding a new field
Disabled
which defaults to false (so that the zero value works), and deprecates the old field.Fixes #17076