-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client/fingerprint/cpu: use fallback total compute value if cpu not detected #9589
Conversation
…etected Previously, Nomad would fail to startup if the CPU fingerprinter could not detect the cpu total compute (i.e. cores * mhz). This is common on some EC2 instance types (graviton class), where the env_aws fingerprinter will override the detected CPU performance with a more accurate value anyway. Instead of crashing on startup, have Nomad use a low default for available cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad to get past the useless cpu fingerprinting on those EC2 instances. The crashing error message is now a log statement suggesting the setting of cpu_total_compute in client config. Fixes #7989
Quick spot testing on a t4gsmall: env_aws enabled
env_aws disabled
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This seems like a much nicer user experience.
return fmt.Errorf("cannot detect cpu total compute. "+ | ||
"CPU compute must be set manually using the client config option %q", | ||
"cpu_total_compute") | ||
f.logger.Info("fallback to default cpu total compute, set client config option cpu_total_compute to override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log line is a little unfortunate in that it's important to see in non-AWS cases but in AWS cases we're going to fix it in the fingerprinter, which logs at debug. But I don't see a good way around that other than dropping this one to debug... which I think would be a bad idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I could be convinced to pick any of the possible combinations of Info/Debug. Realistically the most correct solution to all of this would involve plumbing to enable fingerprinters to coordinate. But for this, that would be like driving a thumbtack with sledgehammer.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Previously, Nomad would fail to startup if the CPU fingerprinter could
not detect the cpu total compute (i.e. cores * mhz). This is common on
some EC2 instance types (graviton class), where the env_aws fingerprinter
will override the detected CPU performance with a more accurate value
anyway.
Instead of crashing on startup, have Nomad use a low default for available
cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad
to get past the ignored cpu fingerprinting on those EC2 instances. The
crashing error message is now a log statement suggesting the setting of
cpu_total_compute in client config.
Fixes #7989