You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docker version
Client: Docker Engine - Community
Version: 19.03.6
API version: 1.40
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:27:49 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.16
Git commit: 369ce74a3c
Built: Thu Feb 13 01:26:21 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
Operating system and Environment details
Linux ip-10-0-74-196 4.15.0-1063-aws #67-Ubuntu SMP Mon Mar 2 07:24:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Issue
We're seeing segfaults occur on Nomad client instances that are running multiple jobs at the default settings. We are definitely seeing lots of containers get killed as they get close to or exceed those limits, so there's generally a lot of churn of jobs being killed and getting rescheduled in this environment.
As you can see from the dmesg output, it's also then failing to restart via systemd.
We're simply allocating memory in a container until it gets killed.
Job file (if appropriate)
It's a pretty generic job file, with count = 1 and no memory/cpu reservations specified. I'm happy to provide it out of band if you think it's helpful.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Output from
nomad version
Also docker version, in case it's involved:
Operating system and Environment details
Issue
We're seeing segfaults occur on Nomad client instances that are running multiple jobs at the default settings. We are definitely seeing lots of containers get killed as they get close to or exceed those limits, so there's generally a lot of churn of jobs being killed and getting rescheduled in this environment.
As you can see from the dmesg output, it's also then failing to restart via systemd.
@notnoop had mentioned, in chat:
Hope that also helps.
Reproduction steps
We're simply allocating memory in a container until it gets killed.
Job file (if appropriate)
It's a pretty generic job file, with count = 1 and no memory/cpu reservations specified. I'm happy to provide it out of band if you think it's helpful.
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: