-
Notifications
You must be signed in to change notification settings - Fork 657
system-dockerd segfault at 8 ip #2484
Comments
I tried this to check kernel-config: https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh Looks no abnormal. |
I can get this error when I run a container with bridge network by system-docker.
Every time I start the container on system-docker, I can see this log in dmesg. There are to ways to avoid this problem:
|
Trying to track down an issue I'm seeing with Rancher on a physical server. I can't say if I saw this same error on other physical servers, but those servers would actually download the cloud-config I passed in as part of my kernel parameters. For a reason I haven't found yet, I have a server that boots Rancher but does not download the cloud-config passed in as a kernel parameter. The only error I see on the running server is about This is occurring with a Rancher 1.5 iso. |
After making a few port configurations on the switch to allow the server to connect faster, Rancher now grabs the config file as it was doing on other hardware. It does seem that if it takes too long for Rancher to get an IP, the download of the cloud-config will fail, and not be re-tried during the boot process. |
@geauxvirtual Please file another issue and show more details, I think what you mentioned should be irrelevant to this issue. |
@niusmallnan We are having this issue as well, using ROS 1.4.2, it seems system-docker will segfault randomly.
We also have a ticket open at https://support.rancher.com/hc/en-us/requests/3546. It seems to be the same memory location every time, do we know what's in memory at |
Still seeing this error in $ dmesg | grep segfault
[ 30.587148] system-dockerd[1232]: segfault at 8 ip 0000000000541d26 sp 000000c421421308 error 4 in system-dockerd[400000+1486000]
[ 31.995174] system-dockerd[1633]: segfault at 8 ip 0000000000541d26 sp 000000c42141f308 error 4 in system-dockerd[400000+1486000]
$ sudo ros -v
version v1.5.5 from os image rancher/os:v1.5.5 Running on bare metal. I am using bonded NICs attached to VLANs, not sure if that matters outside of providing context for the below rancher:
network:
interfaces:
bond1:
bond_opts:
downdelay: "200"
lacp_rate: "1"
miimon: "100"
mode: "4"
updelay: "200"
vlans: 100:vlan100,300:vlan300
xmit_hash_policy: layer3+4
vlans: 100:vlan100,300:vlan300
vlan300:
dhcp: true
vlans: 300:vlan300
eth*:
bond: bond1
vlan100:
dhcp: true
match: vlan100
vlans: 100:vlan100 Here's some of the surrounding output from
I have not tried the suggestion [https://github.com//issues/2484#issuecomment-445088750] - will Rancher OS function normally with This is reproduced on 5 hardware systems currently with the same configuration. @niusmallnan Let me know if I can help with anything on this. |
What we got here: Im starting to think we got a networking problem due to Architecture and design flaws. I have observed this across 3 systems on 1.5.5 with similar hardware. |
After further digging, I keep coming back to alpine and musl. It seems to be related to either how musl handles DNS resolution or some form of hardening. |
RancherOS Version: (ros os version)
v1.4.0/v1.4.1
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
All
Check the output of dmesg:
Now it seems no effect, but I hope to find the root cause.
The text was updated successfully, but these errors were encountered: