-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predictable Network Interface Names lost in 0.8-1 #155
Comments
Copying debug logs from #153 as it contains important information about the network manipulation:
|
One reason why the network interfaces are not renamed is that they might be still in use. I see numerous |
With booster 0.8 release network udev events are handled concurrently. initializedIfnames slice might be modified by different threads. To prevent corruption we need to proted the access with mutex. Issue #155
With booster 0.8 release network udev events are handled concurrently. initializedIfnames slice might be modified by different threads. To prevent corruption we need to protect the access with a mutex. Issue #155
Your reply prompted me to remember that I'm using a standard network:
dhcp: on Commenting out the On the rest of the servers I use Clevis to unlock from a Tang instance. Perhaps there's code in there waiting for some use of the started network before shutting it down, so if it's never needed, it never gets shut down? |
One possible explanation could be that To check this theory I pushed a change to |
I adapted your |
@anatol can I suggest updating the |
I just pushed |
Using the latest
Another server which does need Clevis for unlocking failed to boot. This means I've seen the same lock up behaviour on two different servers now. I recorded an IPMI video so you can see the Booster debug output. |
The IPMI log is interesting. It shows that the NIC received "link is up" event but later attempts to use it hangs the thread. So something strange happens with the NIC. Let's debug it. I pushed a change to |
Hi @benalexau had you a chance to try the patch above? I am interested to debug and fix the problem you see. |
Hi @anatol. Sorry for the delay; it has been a very busy week! I built |
Sorry for the unfortunate delay. I would like to return back to this issue. Is it something you still experience with the latest booster? |
Arch Linux's latest |
Please try Also, record the full booster log and post it here. I am interested to learn the output of |
I built I found rebooting results in different network messages on each occasion, although none complete the boot by unlocking via Clevis. Here are some screen shots: We cannot access the log because it never boots. In case it assists, here is the network after a normal boot:
|
The booster logs show I checked the list of booster differences between 0.7 and 0.8 and there are a few changes that potentially can introduce the issue:
Let's start with the DHCP change. I reveted the DHCP library update and pushed it to |
I built Note the above MAC address for In other words, there is no network cable in
This also didn't work. We still end up with different results on each boot. Two examples: I note the first screen shot includes, "interface eth1 is not in 'active' list, skipping it". The same message appears in the second screen shot, except that time it is "eth0". For completeness I also amended the YAML to use I thought it would be worthwhile to disable the on-board Ethernet interfaces as an experiment, but there was no BIOS option to do so (only to disable the option ROM from PXE etc). It seems this is related to ordering. The first ethernet interface is tried (either for DHCP or consideration of whether it is in the "active" list) and no others are considered. |
Thank you for the analysis.
So there must be a bug in the interface filtering logic. I'll look at it. But the real issue is that not all the network interfaces are set up correctly. booster launches a new goroutine for each detected interface. And thus each of your cards Looking at your logs I see a suspicious In other words udev socket returns an error and booster/uevent.go is unable to process it as it happens inside the golang stdlib. One option to handle this situation is to switch to another udev reader implementation. Here is one candidate https://github.com/pilebones/go-udev |
So I went ahead and moved udev listener to https://github.com/pilebones/go-udev implementation. The tests pass. System boots fine for me (limited testing). Please pull Post the logs from your machine either in case of failure and success. |
It might be related to the bug. From that issue:
Let's check what values do you have there. Here are my values:
|
I tried this:
Result: Appending Anything else I should try? |
Setting host's sysctl like |
Thanks. I adopted the following boot entry and booted successfully using
After booting I reverted to standard Arch Linux As an experiment I removed the I reinstalled So we do have a workaround using the above boot entry. |
Hurray! Thanks for confirming it. So the problem is indeed in the lack of memory for NICs. That probably comes from the fact that we initialize two cards in parallel and thus peak memory requirement has increased. I guess if NIC filtering worked then you should be able to load (only one) card without changing the sysctl parameters. I need to look at the filtering issue. The old udev error message was useless the new library provided a cleaner error message. I'll test the new library and if everything is ok then I switch to it permanently. |
…udev/netlink github.com/s-urbaniak/uevent.go is unable to handle errors from udev netlik listener. It just panics. Issue #155
It helps with debugging in case if a host has multiple network cards Issue #155
@benalexau returning back to the MAC address filtering mentioned here #155 (comment) I just added an extra debug statement to print a NIC mac address that helps to understand whether the NIC is filtered out or not. I pushed the change to One thing to keep in mind is that this memory problem happens in the kernel at driver load time. i.e. even before the udev reads the "NIC is added" event. Thus filtering in booster does not help much here. It will prevent DHCP initializing of that NIC though. If you want to disable that NIC completely then the best option is to blacklist the NIC module or maybe there is a boot parameter to disable a specific device. |
…udev/netlink github.com/s-urbaniak/uevent.go is unable to handle errors from udev netlik listener. It just panics. Issue #155
It helps with debugging in case if a host has multiple network cards Issue #155
@anatol
|
It helps with debugging in case if a host has multiple network cards Issue #155
…udev/netlink github.com/s-urbaniak/uevent.go is unable to handle errors from udev netlik listener. It just panics. Issue #155
…udev/netlink github.com/s-urbaniak/uevent.go is unable to handle errors from udev netlik listener. It just panics. Issue #155
…udev/netlink github.com/s-urbaniak/uevent.go is unable to handle errors from udev netlik listener. It just panics. Issue #155
The changes to make udev easier to debug mentioned above landed master branch. |
I have a server which after upgrading to
booster
0.8-1 no longer provides Predictable Network Interface Names.When I downgrade to
booster
0.7-3, the predictable network interface names are restored after reboot.With
booster
0.8-1 the following confirms the legacy names are used even for/sys/class/net
:If we review say
eth0
:The text was updated successfully, but these errors were encountered: