Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage of Allocation is always 0 bytes #9120

Closed
sbrl opened this issue Oct 17, 2020 · 14 comments
Closed

Memory Usage of Allocation is always 0 bytes #9120

sbrl opened this issue Oct 17, 2020 · 14 comments

Comments

@sbrl
Copy link

sbrl commented Oct 17, 2020

Nomad version

Nomad v0.12.5 (514b0d667b57068badb43795103fb7dd3a9fbea7)

Operating system and Environment details

$ uname -a
Linux DEVICE_NAME 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Raspbian
Description:	Raspbian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

4 x Raspberry Pi 4 w/4 GiB RAM (worker nodes), 1 x Raspberry Pi 4 w/2GiB RAM (server node, does not run tasks)

Issue

All allocated tasks (at least with the Docker driver) show a memory usage of 0 bytes. It never used to be this way, but then I checked one day and it showed 0 bytes for all my allocations and it hasn't fixed itself. Example screenshot from the web interface:

image

Reproduction steps

  1. Run any Nomad job
  2. Open the Nomad web interface
  3. Navigate to the allocation and view the resource usage graphs
  4. See error

Job file (if appropriate)

Any job file will do, but here's one of mine:

Click to expand

job "etherpad" {
	datacenters = ["dc1"]
	priority = 35
task "etherpad" {
	driver = "docker"
	
	config {
		image = "registry.internal.example.com:5000/etherpad"
		labels { group = "services" }
		
		volumes = [
			# /srv/etherpad/var			Main settings directory
			"/mnt/shared/services/etherpad/var:/srv/etherpad/var",
			# /srv/etherpad/APIKEY.txt	Persistent API key
			"/mnt/shared/services/etherpad/APIKEY.txt:/srv/etherpad/APIKEY.txt",
			# /srv/etherpad/APIKEY.txt	Some other kind of key
			"/mnt/shared/services/etherpad/SESSIONKEY.txt:/srv/etherpad/SESSIONKEY.txt"
		]
		
		port_map {
			main = 9001
		}
	}
	
	resources {
		memory = 200 # MiB
		network {
			port "main" {}
		}
	}
	
	service {
		name = "${TASK}"
		tags = [
			"service", "internal",
			"urlprefix-etherpad.example.com/",
			"auth=admin"
		],
		address_mode = "host"
		port = "main"
		
		check {
			type = "http"
			port = "main"
			interval = "60s"
			timeout = "5s"
			path = "/"
		}
	}
}

}

Nomad Client logs (if appropriate)

Logs available upon request, but the logging feature (at least for allocations) isn't working either

Nomad Server logs (if appropriate)

@futuralogic
Copy link

I am having the same problem - no memory stats for allocations - but only on one of my nomad instances of four.

I'd also add a point of clarification that the allocation doesn't show any RAM allocated, but the node itself does.

Node:

Screen Shot 2021-01-17 at 2 40 16 AM

Allocation:

image

Common variable between our situation is running on ARM (Raspberry Pi). I will try to add relevant details as my OS and architectures vary from the original problem. Hopefully the details of my builds will be helpful for further research.

I'm running Nomad/Consul across four Raspberry Pi's. The memory display is only working on three of them.

WORKING:

2x RPI 3 running Hypriot OS 1.11.1 - armhf kernel.

Linux DEVICE 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux
Docker: 18.06.3-ce
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

1x RPI 4 running Hypriot OS 1.12.3 - arm64 kernel (32-bit userland, i.e. docker)

Linux DEVICE 5.4.83-v8+ #1379 SMP PREEMPT Mon Dec 14 13:15:14 GMT 2020 aarch64 GNU/Linux
Docker: 20.10.2
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

Hypriot is a modified Raspbian build optimized for docker workloads. I've made no modifications to the original Hypriot image other than adding nomad/consul and jobs.

NOT WORKING:

1x RPI 4 running Ubuntu 20.04 Focal ARM64. I manually installed Docker, docker-compose per docker.com and setup nomad/consul. Image was created from latest Raspberry Pi image builder tool to flash the card with Ubuntu 20.04 LTS ARM64 lite edition (i.e. did not build OS myself).

Linux DEVICE 5.4.0-1026-raspi #29-Ubuntu SMP PREEMPT Mon Dec 14 17:01:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
Docker: 20.10.2
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

@sbrl
Copy link
Author

sbrl commented Jan 18, 2021

Hrm, that's very interesting that Ubuntu 20.04 doesn't appear to work, but Hypriot OS does. I wonder if Hypriot does something differently by default? It does say that it's tuned for Docker by default.

@futuralogic
Copy link

I thought it was weird as well.
Based on the container.go in InspectContainerWithOptions() - I presume this is how nomad inspects a container - it seems to call the Docker Engine API to inspect. (Sorry, I am not familiar with Go or the codebase to authoritatively state this.)

If that is the case, docker inspect doesn't show any memory used.

docker stats also displays no memory in use:

ONTAINER ID   NAME                                                         CPU %     MEM USAGE / LIMIT   MEM %     NET I/O          BLOCK I/O    PIDS
8adc893e5af2   mc-paper-server-arm64-e389b2dd-945b-6577-a461-df45cb42b7d0   42.06%    0B / 0B             0.00%     48.8MB / 352kB   0B / 303kB   48

It seems docker may be the culprit here?

Output of docker inspect:


ubuntu@host:/mnt/shared/nomad/jobs$ docker inspect 8adc
[
    {
        "Id": "8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101",
        "Created": "2021-01-18T02:10:30.703496654Z",
        "Path": "/runner/entrypoint",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 55003,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2021-01-18T02:10:32.16911133Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:1078eb7ec68613029733adaa5c89cb35868f3605fd787f466d8125c41a01c2c0",
        "ResolvConfPath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/hostname",
        "HostsPath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/hosts",
        "LogPath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101-json.log",
        "Name": "/mc-paper-server-arm64-e389b2dd-945b-6577-a461-df45cb42b7d0",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/alloc:/alloc",
                "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/local:/local",
                "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/secrets:/secrets",
                "/mnt/blah:/data"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {
                    "max-file": "2",
                    "max-size": "2m"
                }
            },
            "NetworkMode": "default",
            "PortBindings": {
                "19132/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19132/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19133/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "19133/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "25565/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ],
                "25565/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "host",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 2000,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": -1,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76-init/diff:/var/lib/docker/overlay2/6243e0bd492f6042f43edf4ad483bd7eb5045b9c15317fb471c2cf7df325cbd2/diff:/var/lib/docker/overlay2/f35c410ed29e3a07e4c449362895fadbee1cd84afdec9e0db2ee56f35e2493e1/diff:/var/lib/docker/overlay2/8e863c2778cf15b5c08f8aa1b5a3aa60a2dd5dbf182626ac2150994f79f94109/diff:/var/lib/docker/overlay2/7f97a7d374d260bbaedc954c56c5e449643650464297913063ef29f55bf5aaa6/diff:/var/lib/docker/overlay2/d3f17bbd14f3a136a07f518bc6b085c1e62193f579c2a0f143471642fdedfd4d/diff:/var/lib/docker/overlay2/fd40510034d439f01f86e400cd45947f81941a8fa192a1c807264ebfda34559c/diff",
                "MergedDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76/merged",
                "UpperDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76/diff",
                "WorkDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/alloc",
                "Destination": "/alloc",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/local",
                "Destination": "/local",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/secrets",
                "Destination": "/secrets",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/mnt/blahblahblah",
                "Destination": "/data",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "8adc893e5af2",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "19132/tcp": {},
                "19132/udp": {},
                "19133/tcp": {},
                "19133/udp": {},
                "25565/tcp": {},
                "25565/udp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "JVM_OPTS=-Xms2048M -Xmx5120M",
                "NOMAD_ADDR_mc=192.168.1.191:25806",
                "NOMAD_ADDR_mc_udp_1=192.168.1.191:31938",
                "NOMAD_ADDR_mc_udp_2=192.168.1.191:25484",
                "NOMAD_ALLOC_DIR=/alloc",
                "NOMAD_ALLOC_ID=e389b2dd-945b-6577-a461-df45cb42b7d0",
                "NOMAD_ALLOC_INDEX=0",
                "NOMAD_ALLOC_NAME=minecraft-arm64.mc-server[0]",
                "NOMAD_ALLOC_PORT_mc-udp-1=19132",
                "NOMAD_ALLOC_PORT_mc-udp-2=19133",
                "NOMAD_ALLOC_PORT_mc=25565",
                "NOMAD_CPU_LIMIT=2000",
                "NOMAD_DC=futura",
                "NOMAD_GROUP_NAME=mc-server",
                "NOMAD_HOST_ADDR_mc-udp-1=192.168.1.191:31938",
                "NOMAD_HOST_ADDR_mc-udp-2=192.168.1.191:25484",
                "NOMAD_HOST_ADDR_mc=192.168.1.191:25806",
                "NOMAD_HOST_IP_mc-udp-1=192.168.1.191",
                "NOMAD_HOST_IP_mc-udp-2=192.168.1.191",
                "NOMAD_HOST_IP_mc=192.168.1.191",
                "NOMAD_HOST_PORT_mc=25806",
                "NOMAD_HOST_PORT_mc_udp_1=31938",
                "NOMAD_HOST_PORT_mc_udp_2=25484",
                "NOMAD_IP_mc=192.168.1.191",
                "NOMAD_IP_mc_udp_1=192.168.1.191",
                "NOMAD_IP_mc_udp_2=192.168.1.191",
                "NOMAD_JOB_ID=minecraft-arm64",
                "NOMAD_JOB_NAME=minecraft-arm64",
                "NOMAD_MEMORY_LIMIT=5700",
                "NOMAD_NAMESPACE=default",
                "NOMAD_PORT_mc=25565",
                "NOMAD_PORT_mc_udp_1=19132",
                "NOMAD_PORT_mc_udp_2=19133",
                "NOMAD_REGION=global",
                "NOMAD_SECRETS_DIR=/secrets",
                "NOMAD_TASK_DIR=/local",
                "NOMAD_TASK_NAME=mc-paper-server-arm64",
                "TZ=America/Chicago",
                "PATH=/opt/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "PRODUCT=paper"
            ],
            "Cmd": null,
            "Image": "jcxldn/minecraft-runner:paper-alpine",
            "Volumes": null,
            "WorkingDir": "/data",
            "Entrypoint": [
                "/runner/entrypoint"
            ],
            "OnBuild": null,
            "Labels": {
                "com.hashicorp.nomad.alloc_id": "e389b2dd-945b-6577-a461-df45cb42b7d0"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "8b70f263abc724d678b09d7f4f68acb25cf5f405e48564476dd0a7409d1c2945",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "19132/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19132/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19133/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "19133/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "25565/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ],
                "25565/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ]
            },
            "SandboxKey": "/var/run/docker/netns/8b70f263abc7",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "9e463eae52177976954612f37310ad25026069f1e52cc054e18170c79cf6732c",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "eca62952ca3127291bf24f174e70b776c9e5e58199b2c8ec16a55b7fa7ea86fa",
                    "EndpointID": "9e463eae52177976954612f37310ad25026069f1e52cc054e18170c79cf6732c",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]

@tgross
Copy link
Member

tgross commented Jan 19, 2021

Based on the container.go in InspectContainerWithOptions() - I presume this is how nomad inspects a container - it seems to call the Docker Engine API to inspect. (Sorry, I am not familiar with Go or the codebase to authoritatively state this.)

Yup, that's exactly how it's done for Docker containers. Do y'all see this same behavior with the exec or raw_exec driver? We use a different method there using the gopsutil library, and I know there's some operating-system dependent code in that path. If that works, it might be worth seeing if we could do some sort of fallback behavior in the docker driver, but it'll be tricky to get all the handles we'd need given that Docker owns the process. We can definitely open an issue with Docker upstream as well to see if they can fix the problem at their end.

@sbrl
Copy link
Author

sbrl commented Jan 23, 2021

@tgross: I've tried with exec with this test noamd jobspec, but I get this error and all allocations fail:

failed to launch command with executor: rpc error: code = Unknown desc = container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: cannot set memory limit: container could not join or create cgroup

I would imagine that any method used for exec / raw_exec should work.

Opening an issue upstream with Docker sounds like a good idea too, but I unsure if I'd be able to find the right place to report such an issue. I'm happy to test stuff though.

@tgross
Copy link
Member

tgross commented Jan 25, 2021

Oh right. If I recall from a previous issue, the Raspberry Pi distros don't have the cgroups you need enabled by default. See the exec driver docs section on Resource Isolation and add a cgroup_enable flag for the missing isolation to you boot command line and that should work fine.

@sbrl
Copy link
Author

sbrl commented Jan 31, 2021

@tgross thanks for the reply. I see the problem now, but the link you've provided unfortunately does not contain any useful information about how to resolve it. If I understand it correctly, it tells me how I can check that cgroups are enabled, but not how to change the settings.

@futuralogic
Copy link

futuralogic commented Feb 1, 2021

@tgross Thanks for the tip - that was the problem. Memory status is now updating. Thanks!

@sbrl I wasn't sure how to enable cgroups either, but the Resource Isolation documentation page linked to does mention "Some Linux distributions do not boot with all required cgroups enabled by default." I Googled cgroups on Ubuntu 20.04 (which is my distro in question), and it mentions modifying the cmdline.txt file used at boot.

It should be found somewhere under /boot I'd guess. I think on a typical Raspberry debian distro it's directly in the /boot dir but YMMV.

For Ubuntu 20.04 it's under /boot/firmware/cmdline.txt

Here was the output of the cmd from the Nomad docs that displays enabled cgroups:

awk '{print $1 " " $4}' /proc/cgroups
#subsys_name enabled
cpuset 1
cpu 1
cpuacct 1
blkio 1
memory 0
devices 1
freezer 1
net_cls 1
perf_event 1
net_prio 1
pids 1
rdma 1

As you can see "memory" was disabled.

I simply added cgroup_enable=memory to cmdline.txt and rebooted.

Full cmdline.txt for reference:
net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc cgroup_enable=memory

I'd suggest this issue can be closed. Thanks for everyone's help and input.

@tgross
Copy link
Member

tgross commented Feb 1, 2021

Glad to hear that's working and thanks for the assist there @futuralogic!

@3nprob
Copy link

3nprob commented Feb 9, 2021

I'm actually experiencing this now, since recently (can't tell exacty whem, but some weeks ago?). Was always working fine before. Not sure if it's related to an OS system upgrade or a Nomad upgrade. So see it across all clients, though, Debian 11.

This is with the docker driver.

Oddly enough, the cgroups seem to be in order:

$ awk '{print $1 " " $4}' /proc/cgroups
#subsys_name enabled
cpuset 1
cpu 1
cpuacct 1
blkio 1
memory 1
devices 1
freezer 1
net_cls 1
perf_event 1
net_prio 1
hugetlb 1
pids 1
rdma 1

EDIT: Hold off, seems cgroup_enable=memory is missing from cmdline - not sure why it may suddently started requiring that but in either case, will see if that does fix it. The cgroup is already enabled though, as can be seen above.

# docker inspect $CONTAINER | grep -i 'cgroup'
            "CgroupnsMode": "private",
            "Cgroup": "",
            "CgroupParent": "",
            "DeviceCgroupRules": null,

@sbrl
Copy link
Author

sbrl commented Apr 5, 2021

It's very delayed (sorry about that!)) but I've finally managed to find some time to try out the fix suggested above. On my Raspberry Pis, editing /boot/cmdline.txt and appending the following:

cgroup_enable=memory cgroup_memory=1

...fixed the issue. Note that it has to be on the 1st line there - no line breaks (\n) are allowed.

I implemented the following bash function that applies the fix automatically:

check_cgroups_memory() {
	echo ">>> Checking memory cgroups";
	
	cgroups_enabled="$(awk '/memory/ { print $2 }' < /proc/cgroups)";
	
	if [[ "${cgroups_enabled}" -ne 0 ]]; then
		echo ">>> memory cgroups already enabled";
		return 0;
	fi
	
	
	filepath_cmdline="/boot/cmdline.txt";
	if [[ ! -e "${filepath_cmdline}" ]]; then
		filepath_cmdline="/boot/firmware/cmdline.txt";
	fi
	if [[ ! -e "${filepath_cmdline}" ]]; then
		echo ">>> Failed to find cmdline.txt; can't check for cgroups";
		return 1;
	fi
	
	if grep -q cgroup_enable=memory /boot/cmdline.txt; then
		echo ">>> memory cgroups already present in cmdline.txt, a reboot is required to apply the update";
		return 0;
	fi
	
	echo ">>> memory cgroups not present in cmdline.txt, enabling....";
	(tr -d '\n' <"${filepath_cmdline}" && echo " cgroup_enable=memory cgroup_memory=1") | sudo tee "${filepath_cmdline}.new";
	
	sudo mv "${filepath_cmdline}" "${filepath_cmdline}.old-$(date +"%Y-%m-%d")";
	sudo mv "${filepath_cmdline}.new" "${filepath_cmdline}";
	
	echo ">>> New contents of cmdline.txt:";
	cat "${filepath_cmdline}";
	echo ">>> A reboot is required to apply the changes.";
}

@AlekseyMelikov
Copy link

AlekseyMelikov commented Feb 13, 2022

I am having this issue now. I don’t remember when it appeared, but I remember that the problem was still in the versions:

Nomad v1.2.1 (719c53ac0ebee95d902faafe59a30422a091bc31)
Consul v1.10.4 Revision 7bbad6fe Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Docker version 20.10.11, build dea9396

I have now updated to

Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)
Consul v1.11.3 Revision e319d7ed Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Docker version 20.10.12, build e91ed57

Linux 5c24b868 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux

No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 11 (bullseye)
Release:	11
Codename:	bullseye

but the problem persists.

Description of the problem - Memory Usage of any Allocation is always 0 bytes
2

Host Resource Utilization is showing correctly
1

docker stats

CONTAINER ID   NAME            CPU %     MEM USAGE / LIMIT   MEM %     NET I/O   BLOCK I/O         PIDS
406f2[EDITED]   [EDITED]         0.03%     30.83MiB / 100MiB   30.83%    0B / 0B   4.1kB / 13.5MB    8
c667b[EDITED]   [EDITED]         0.00%     212KiB / 3.745GiB   0.01%     0B / 0B   0B / 0B           1
e46fe[EDITED]   [EDITED]         0.02%     40.43MiB / 200MiB   20.21%    0B / 0B   365kB / 369kB     8
47ae4[EDITED]   [EDITED]         0.00%     216KiB / 3.745GiB   0.01%     0B / 0B   0B / 0B           1
c8258[EDITED]   [EDITED]         0.03%     48.54MiB / 200MiB   24.27%    0B / 0B   0B / 8.19kB       11
22961[EDITED]   [EDITED]  	 0.01%     7.527MiB / 50MiB    15.05%    0B / 0B   750kB / 0B        2
21e1b[EDITED]   [EDITED]         0.28%     95.11MiB / 400MiB   23.78%    0B / 0B   58.5MB / 47MB     19
0f64b[EDITED]   [EDITED]         0.05%     58.89MiB / 100MiB   58.89%    0B / 0B   51.7MB / 2.09MB   18
caa34[EDITED]   [EDITED]         0.14%     42.91MiB / 100MiB   42.91%    0B / 0B   34.8MB / 0B       10
d13ea[EDITED]   [EDITED]   	 0.01%     10.52MiB / 50MiB    21.03%    0B / 0B   30.4MB / 0B       2
d3689[EDITED]   [EDITED]         1.87%     246.3MiB / 400MiB   61.58%    0B / 0B   33.7MB / 1.5MB    8
db532[EDITED]   [EDITED]         0.20%     129.3MiB / 600MiB   21.54%    0B / 0B   59MB / 57.1MB     31
60f28[EDITED]   [EDITED]         2.15%     12.92MiB / 100MiB   12.92%    0B / 0B   12.6MB / 6MB      5
e4914[EDITED]   [EDITED]         0.01%     16.39MiB / 50MiB    32.78%    0B / 0B   2.2MB / 69.6kB    7
1as2c[EDITED]   [EDITED]         0.38%     80.51MiB / 400MiB   20.13%    0B / 0B   54.7MB / 373kB    7
dd8bb[EDITED]   [EDITED]         0.12%     39.33MiB / 100MiB   39.33%    0B / 0B   29MB / 0B         12

cat /proc/cgroups

#subsys_name	hierarchy	num_cgroups	enabled
cpuset	0	102	1
cpu	0	102	1
cpuacct	0	102	1
blkio	0	102	1
memory	0	102	1
devices	0	102	1
freezer	0	102	1
net_cls	0	102	1
perf_event	0	102	1
net_prio	0	102	1
hugetlb	0	102	1
pids	0	102	1
rdma	0	102	1
Nomad logs
Feb 13 08:26:58 host-name systemd[1]: Started Nomad.
Feb 13 08:26:59 host-name nomad[697]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
Feb 13 08:26:59 host-name nomad[697]: ==> Loaded configuration from /etc/nomad.d/client.hcl, /etc/nomad.d/nomad.hcl, /etc/nomad.d/server.hcl
Feb 13 08:26:59 host-name nomad[697]: ==> Starting Nomad agent...
Feb 13 08:27:00 host-name nomad[697]: ==> Nomad agent configuration:
Feb 13 08:27:00 host-name nomad[697]:        Advertise Addrs: HTTP: 172.16.0.2:4646; RPC: 172.16.0.2:4647; Serf: 172.16.0.2:4648
Feb 13 08:27:00 host-name nomad[697]:             Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Feb 13 08:27:00 host-name nomad[697]:                 Client: true
Feb 13 08:27:00 host-name nomad[697]:              Log Level: INFO
Feb 13 08:27:00 host-name nomad[697]:                 Region: global (DC: dc-name)
Feb 13 08:27:00 host-name nomad[697]:                 Server: true
Feb 13 08:27:00 host-name nomad[697]:                Version: 1.2.6
Feb 13 08:27:00 host-name nomad[697]: ==> Nomad agent started! Log data will stream in below:
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.273Z [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.273Z [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.273Z [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.273Z [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.273Z [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.331Z [INFO]  nomad.raft: restored from snapshot: id=61-278615-1644597830311
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.443Z [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:172.16.0.2:4647 Address:172.16.0.2:4647}]"
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.443Z [INFO]  nomad.raft: entering follower state: follower="Node at 172.16.0.2:4647 [Follower]" leader=
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.445Z [INFO]  nomad: serf: EventMemberJoin: host-name.global 172.16.0.2
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.445Z [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.446Z [INFO]  nomad: serf: Attempting re-join to previously known node: dc-name-host-name.global: 172.16.0.2:4648
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.446Z [INFO]  nomad: started scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"]
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.447Z [INFO]  nomad: serf: Re-joined to previously known node: dc-name-host-name.global: 172.16.0.2:4648
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.447Z [INFO]  client: using state directory: state_dir=/opt/nomad/data/client
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.448Z [INFO]  nomad: adding server: server="host-name.global (Addr: 172.16.0.2:4647) (DC: dc-name)"
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.449Z [INFO]  client: using alloc directory: alloc_dir=/opt/nomad/data/alloc
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.449Z [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.449Z [WARN]  client: could not initialize cpuset cgroup subsystem, cpuset management disabled: error="not implemented for cgroup v2 unified hierarchy"
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.642Z [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.646Z [WARN]  client.fingerprint_mgr.cpu: failed to detect set of reservable cores: error="not implemented for cgroup v2 unified hierarchy"
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.693Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.695Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.699Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.705Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens10
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.720Z [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.720Z [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:26:59.721Z [INFO]  client.plugin: starting plugin manager: plugin-type=device
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.116Z [ERROR] client.driver_mgr.exec: failed to reattach to executor: driver=exec error="error creating rpc client for executor plugin: Reattachment process not found" task_id=2cdc7213-925b-7b29-8aa1-28f4ad0e03d2/[EDITED]
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.136Z [INFO]  client: started client: node_id=f03bd130-5e77-6809-81f2-5470f161b8d5
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.138Z [INFO]  client.gc: marking allocation for GC: alloc_id=e946c362-18e7-e330-f335-9cbe04ccc5ad
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=32054f2c-da70-2e80-fe9b-6d0ef865fd80
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=78ee19e2-302e-c65e-7dd7-221f911fc9fc
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=ced42f55-4d57-d0de-df46-278440049f0a
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=26d044c4-7388-f49e-4c93-0367b0783bf2
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=5ccc1846-fa40-52d6-4f6f-02d9baa0523b
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=9d3b36fa-7012-9569-d939-b2b102490570
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=a52e1bf7-2c51-a4f4-6bba-43668b8ad84a
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=bb17689f-cc33-e4c0-ef21-abe80f0dd0ac
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.141Z [INFO]  client.gc: marking allocation for GC: alloc_id=0ddd1372-12e8-4c80-260f-b64c712bdee6
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.142Z [INFO]  client.gc: marking allocation for GC: alloc_id=2cdc7213-925b-7b29-8aa1-28f4ad0e03d2
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.142Z [INFO]  client.gc: marking allocation for GC: alloc_id=482d787f-af8a-17fe-74d7-2953d798768e
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.142Z [INFO]  client.gc: marking allocation for GC: alloc_id=27e811a4-9990-8c47-76df-f3806f11bbaa
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.142Z [INFO]  client.gc: marking allocation for GC: alloc_id=cd0083e7-adc0-cb28-f4b4-ad11fde6a550
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.142Z [INFO]  client.gc: marking allocation for GC: alloc_id=9a57aff0-3441-8082-ece8-2ebc4b1ef382
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.987Z [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.987Z [INFO]  nomad.raft: entering candidate state: node="Node at 172.16.0.2:4647 [Candidate]" term=62
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.992Z [INFO]  nomad.raft: election won: tally=1
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.992Z [INFO]  nomad.raft: entering leader state: leader="Node at 172.16.0.2:4647 [Leader]"
Feb 13 08:27:00 host-name nomad[697]:     2022-02-13T08:27:00.993Z [INFO]  nomad: cluster leadership acquired
Feb 13 08:27:01 host-name nomad[697]:     2022-02-13T08:27:01.079Z [INFO]  client: node registration complete
Feb 13 08:27:09 host-name nomad[697]:     2022-02-13T08:27:09.960Z [INFO]  client: node registration complete
Feb 13 08:27:14 host-name nomad[697]:     2022-02-13T08:27:14.655Z [INFO]  client.fingerprint_mgr.consul: consul agent is available
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.224Z [INFO]  agent: (runner) creating new runner (dry: false, once: false)
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.225Z [INFO]  agent: (runner) creating watcher
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.228Z [INFO]  agent: (runner) starting
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.231Z [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/ea8246f8-ad62-8218-7424-ef2d2f765293[EDITED]
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.232Z [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/ea8246f8-ad62-8218-7424-ef2d2f765293[EDITED]
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.233Z [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/ea8246f8-ad62-8218-7424-ef2d2f765293[EDITED]
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.299Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=92de7609-8869-9feb-5613-1d8f1d818e59 task=oathkeeper @module=logmon path=/opt/nomad/data/alloc/92de7609-8869-9feb-5613-1d8f1d818e59/alloc/logs/[EDITED]
Feb 13 08:27:18 host-name nomad[697]:     2022-02-13T08:27:18.304Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=92de7609-8869-9feb-5613-1d8f1d818e59 task=oathkeeper @module=logmon path=/opt/nomad/data/alloc/92de7609-8869-9feb-5613-1d8f1d818e59/alloc/logs/[EDITED]
Feb 13 08:30:54 host-name nomad[697]:     2022-02-13T08:30:54.615Z [ERROR] http: request failed: method=GET path=/v1/client/allocation/undefined/stats error="alloc lookup failed: index error: UUID must be 36 characters" code=500
Consul logs
Feb 13 08:26:58 host-name systemd[1]: Started "HashiCorp Consul - A service mesh solution".
Feb 13 08:26:59 host-name consul[689]: ==> Starting Consul agent...
Feb 13 08:26:59 host-name consul[689]:            Version: '1.11.3'
Feb 13 08:26:59 host-name consul[689]:            Node ID: '64ad536f-4aca-61cf-a324-f98f0ed1677e'
Feb 13 08:26:59 host-name consul[689]:          Node name: 'host-name'
Feb 13 08:26:59 host-name consul[689]:         Datacenter: 'dc-name' (Segment: '<all>')
Feb 13 08:26:59 host-name consul[689]:             Server: true (Bootstrap: true)
Feb 13 08:26:59 host-name consul[689]:        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Feb 13 08:26:59 host-name consul[689]:       Cluster Addr: 172.16.0.2 (LAN: 8301, WAN: 8302)
Feb 13 08:26:59 host-name consul[689]:            Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true, Auto-Encrypt-TLS: false
Feb 13 08:26:59 host-name consul[689]: ==> Log data will now stream in as it occurs:
Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.366Z [WARN]  agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.366Z [WARN]  agent: bootstrap = true: do not enable unless necessary
Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.417Z [WARN]  agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode.
Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.417Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.442Z [INFO]  agent.server.raft: restored from snapshot: id=48-1409193-1644602007773
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.011Z [INFO]  agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:64ad536f-4aca-61cf-a324-f98f0ed1677e Address:172.16.0.2:8300}]"
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.011Z [INFO]  agent.server.raft: entering follower state: follower="Node at 172.16.0.2:8300 [Follower]" leader=
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.013Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: host-name.dc-name 172.16.0.2
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.014Z [INFO]  agent.server.serf.wan: serf: Attempting re-join to previously known node: dc-name-host-name.dc-name: 172.16.0.2:8302
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.015Z [INFO]  agent.server.serf.wan: serf: Re-joined to previously known node: dc-name-host-name.dc-name: 172.16.0.2:8302
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.018Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: host-name 172.16.0.2
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.018Z [INFO]  agent.router: Initializing LAN area manager
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.018Z [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: dc-name-host-name: 172.16.0.2:8301
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.019Z [INFO]  agent.server.serf.lan: serf: Re-joined to previously known node: dc-name-host-name: 172.16.0.2:8301
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.020Z [INFO]  agent.server: Adding LAN server: server="host-name (Addr: tcp/172.16.0.2:8300) (DC: dc-name)"
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.020Z [INFO]  agent.server: Handled event for server in area: event=member-join server=host-name.dc-name area=wan
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.021Z [WARN]  agent: grpc: addrConn.createTransport failed to connect to {dc-name-172.16.0.2:8300 0 host-name <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp <nil>->172.16.0.2:8300: operation was canceled". Reconnecting...
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.035Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.036Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.036Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.043Z [WARN]  agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.044Z [INFO]  agent: Started gRPC server: address=[::]:8502 network=tcp
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.045Z [INFO]  agent: started state syncer
Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.045Z [INFO]  agent: Consul agent running!
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.232Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.232Z [INFO]  agent.server.raft: entering candidate state: node="Node at 172.16.0.2:8300 [Candidate]" term=50
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.238Z [INFO]  agent.server.raft: election won: tally=1
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.238Z [INFO]  agent.server.raft: entering leader state: leader="Node at 172.16.0.2:8300 [Leader]"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.238Z [INFO]  agent.server: cluster leadership acquired
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.239Z [INFO]  agent.server: New leader elected: payload=host-name
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.722Z [INFO]  agent: Synced node info
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.723Z [INFO]  agent.server: initializing acls
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.723Z [INFO]  agent.leader: started routine: routine="legacy ACL token upgrade"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.723Z [INFO]  agent.leader: started routine: routine="acl token reaping"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.726Z [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.726Z [INFO]  agent.leader: started routine: routine="federation state pruning"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  agent.leader: started routine: routine="intermediate cert renew watch"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  agent.leader: started routine: routine="CA root pruning"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  agent.leader: started routine: routine="CA root expiration metric"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  agent.leader: started routine: routine="CA signing expiration metric"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  agent.leader: started routine: routine="virtual IP version check"
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO]  agent.server: deregistering member: member=c807ea31 partition=default reason=reaped
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.739Z [INFO]  agent: Deregistered service: service=_nomad-task-32054f2c-da70-2e80-fe9b-6d0ef865fd80-group-prometheus-prometheus-prometheus
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.740Z [INFO]  agent: Deregistered service: service=_nomad-task-5ccc1846-fa40-52d6-4f6f-02d9baa0523b-group-envoy-envoy-
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.741Z [INFO]  agent: Synced check: check=_nomad-check-ed7b6ce5bc6c5af7ca61be80e1c836df45b44455
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.743Z [INFO]  agent: Synced check: check=_nomad-check-689ac211032c7cf7fd8003fb2c299ee11fd17a58
Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.745Z [INFO]  agent: Synced check: check=_nomad-check-b305d921ca9bcc3f684e0294bcc862256703ff60
Feb 13 08:27:05 host-name consul[689]: 2022-02-13T08:27:05.017Z [INFO]  agent: Synced check: check=_nomad-check-689ac211032c7cf7fd8003fb2c299ee11fd17a58
Feb 13 08:27:05 host-name consul[689]: 2022-02-13T08:27:05.397Z [INFO]  agent: Synced check: check=_nomad-check-b305d921ca9bcc3f684e0294bcc862256703ff60
Feb 13 08:27:05 host-name consul[689]: 2022-02-13T08:27:05.399Z [INFO]  agent: Synced check: check=_nomad-check-ed7b6ce5bc6c5af7ca61be80e1c836df45b44455
Feb 13 08:27:28 host-name consul[689]: 2022-02-13T08:27:28.952Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: c807ea31 172.16.0.3
Feb 13 08:27:28 host-name consul[689]: 2022-02-13T08:27:28.952Z [INFO]  agent.server: member joined, marking health alive: member=c807ea31 partition=default
Feb 13 08:27:34 host-name consul[689]: 2022-02-13T08:27:34.281Z [ERROR] agent.dns: recurse failed: error="read udp 116.203.25.69:36315->1.1.1.1:53: i/o timeout"
Feb 13 08:27:34 host-name consul[689]: 2022-02-13T08:27:34.291Z [ERROR] agent.dns: recurse failed: error="read udp 116.203.25.69:37048->1.1.1.1:53: i/o timeout"

@sbrl
Copy link
Author

sbrl commented Feb 17, 2022

@AlekseyMelikov best to open a new issue. This issue is specifically to do with Raspberry Pis / ARM devices. You're using x86_64 there, so while the symptoms are the same the solution is likely to be very different.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants