Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fingerprinting fails to detect cpu total compute in arm64 EC2 instances #9511

Closed
notnoop opened this issue Dec 3, 2020 · 3 comments
Closed

Comments

@notnoop
Copy link
Contributor

notnoop commented Dec 3, 2020

Nomad version

Latest nomad master

Operating system and Environment details

ubuntu@ip-172-31-66-11:~/go/src/github.com/hashicorp/nomad$ uname -a
Linux ip-172-31-66-11 5.4.0-1029-aws #30-Ubuntu SMP Tue Oct 20 10:08:09 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@ip-172-31-66-11:~/go/src/github.com/hashicorp/nomad$ curl http://169.254.169.254/latest/meta-data/instance-type; echo
t4g.large
ubuntu@ip-172-31-66-11:~/go/src/github.com/hashicorp/nomad$ cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 243.75
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

processor       : 1
BogoMIPS        : 243.75
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

Issue

On some arm64 Ec2 instances, Nomad fails to properly fingerprint cpu, even though they are specified in our AWS EC2 lookup table.

Consider t4g.large instance, it's specified in

"t4g.large": newCPU(2, 2.5),
. However, given that /cpu/cpuinfo doesn't contain a cpu MHz line, the basic detector fails anyway due to the check in
tt := int(stats.TotalTicksAvailable())
if cfg.CpuCompute > 0 {
f.logger.Debug("using user specified cpu compute", "cpu_compute", cfg.CpuCompute)
tt = cfg.CpuCompute
}
// Return an error if no cpu was detected or explicitly set as this
// node would be unable to receive any allocations.
if tt == 0 {
return fmt.Errorf("cannot detect cpu total compute. "+
"CPU compute must be set manually using the client config option %q",
"cpu_total_compute")
}
.

Reproduction steps

Start nomad 1.0.0-beta3 on a t4.large: nomad agent -dev

Observe:

$ ~/nomad/nomad agent -dev
==> No configuration files loaded
==> Starting Nomad agent...
==> Error starting agent: client setup failed: fingerprinting failed: cannot detect cpu total compute. CPU compute must be set manually using the client config option "cpu_total_compute"
    2020-12-03T17:19:42.390Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2020-12-03T17:19:42.390Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=
    2020-12-03T17:19:42.391Z [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
    2020-12-03T17:19:42.391Z [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2020-12-03T17:19:42.391Z [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2020-12-03T17:19:42.391Z [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2020-12-03T17:19:42.391Z [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2020-12-03T17:19:42.391Z [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2020-12-03T17:19:42.393Z [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:4647 Address:127.0.0.1:4647}]"
    2020-12-03T17:19:42.394Z [INFO]  nomad.raft: entering follower state: follower="Node at 127.0.0.1:4647 [Follower]" leader=
    2020-12-03T17:19:42.394Z [INFO]  nomad: serf: EventMemberJoin: ip-172-31-66-11.global 127.0.0.1
    2020-12-03T17:19:42.394Z [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=[service, batch, system, _core]
    2020-12-03T17:19:42.394Z [INFO]  client: using state directory: state_dir=/tmp/NomadClient304299959
    2020-12-03T17:19:42.394Z [INFO]  client: using alloc directory: alloc_dir=/tmp/NomadClient820006826
    2020-12-03T17:19:42.394Z [INFO]  nomad: adding server: server="ip-172-31-66-11.global (Addr: 127.0.0.1:4647) (DC: dc1)"
    2020-12-03T17:19:42.395Z [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=[arch, bridge, cgroup, cni, consul, cpu, host, memory, network, nomad, signal, storage, vault, env_aws, env_gce, env_azure]
    2020-12-03T17:19:42.395Z [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
    2020-12-03T17:19:42.395Z [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=
    2020-12-03T17:19:42.395Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=cgroup period=15s
    2020-12-03T17:19:42.395Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul period=15s
    2020-12-03T17:19:42.395Z [DEBUG] client.fingerprint_mgr.cpu: detected core count: cores=2
@shoenig
Copy link
Member

shoenig commented Dec 3, 2020

Duplicate of #7989 ?

@notnoop
Copy link
Contributor Author

notnoop commented Dec 3, 2020

Indeed. Sorry, didn't check for existing tickets earlier!

@notnoop notnoop closed this as completed Dec 3, 2020
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants