Skip to content

Commit

Permalink
client/fingerprint/cpu: use fallback total compute value if cpu not d…
Browse files Browse the repository at this point in the history
…etected

Previously, Nomad would fail to startup if the CPU fingerprinter could
not detect the cpu total compute (i.e. cores * mhz). This is common on
some EC2 instance types (graviton class), where the env_aws fingerprinter
will override the detected CPU performance with a more accurate value
anyway.

Instead of crashing on startup, have Nomad use a low default for available
cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad
to get past the useless cpu fingerprinting on those EC2 instances. The
crashing error message is now a log statement suggesting the setting of
cpu_total_compute in client config.

Fixes #7989
  • Loading branch information
shoenig committed Dec 9, 2020
1 parent e76add7 commit da1235f
Showing 1 changed file with 14 additions and 5 deletions.
19 changes: 14 additions & 5 deletions client/fingerprint/cpu.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ import (
"github.com/hashicorp/nomad/nomad/structs"
)

const (
// defaultCPUTicks is the default amount of CPU resources assumed to be
// available if the CPU performance data is unable to be detected. This is
// common on EC2 instances, where the env_aws fingerprinter will follow up,
// setting an accurate value.
defaultCPUTicks = 1000 // 1 core * 1 GHz
)

// CPUFingerprint is used to fingerprint the CPU
type CPUFingerprint struct {
StaticFingerprinter
Expand Down Expand Up @@ -64,12 +72,13 @@ func (f *CPUFingerprint) Fingerprint(req *FingerprintRequest, resp *FingerprintR
tt = cfg.CpuCompute
}

// Return an error if no cpu was detected or explicitly set as this
// node would be unable to receive any allocations.
// If we cannot detect the cpu total compute, fallback to a very low default
// value and log a message about configuring cpu_total_compute. This happens
// on Graviton instances where CPU information is unavailable. In that case,
// the env_aws fingerprinter updates the value with correct information.
if tt == 0 {
return fmt.Errorf("cannot detect cpu total compute. "+
"CPU compute must be set manually using the client config option %q",
"cpu_total_compute")
f.logger.Info("fallback to default cpu total compute, set client config option cpu_total_compute to override")
tt = defaultCPUTicks
}

resp.AddAttribute("cpu.totalcompute", fmt.Sprintf("%d", tt))
Expand Down

0 comments on commit da1235f

Please sign in to comment.