-
-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NanoPi R5S | eth0 not configured on reboot #6951
Comments
Can you check: journalctl -u ifup@eth0 Good to know that the MAC addresses of the two LAN ports are randomised. But at least the interfaces do not seem to be switched ( |
In case of a failed power up, the output is:
|
It just does not get a DHCP lease. It does immediately work when you run Hmm
This actually looks like it was not able to send the |
It seems I am too having similar issue of unreliable/hang of ifup service during startup boot. Most of the time, waited for more that 1 minute and had to power it down and up again to try to get pass that upon new boot. Happy to help with troubleshooting if you need anything. |
I have the same problem and think I found a workaround for the moment. I have 2 installations, one on a sdcard and one on the eMMC. I checked both installations for multiple hours and found nothing, they seem to be identical. My "dirty" workaround is the following:
|
Can you both check my last comment and test whether restarting the service and/or assigning a static IP helps? |
Restarting the service (networking.service), when configured with dhcp, works most of the time (if I remember correctly). The main problem is, that the boot dialog hangs at [email protected] for over 15 minutes. If you give me some test cases (dhcp, static, reboot/poweroff, etc.), I can run them on my device. |
This should not have any effect:
Hmm, it should have a default startup timeout of 90 seconds. Similarly, the DHCP client itself has a 60 seconds timeout, before it gives of for a while (several minutes at least), which can be seen in Stephan's logs. If it hangs for more than 90 seconds in your case, then there must be a more low-level hang overriding the DHCP and systemd timeouts.
As said, it would be great to test whether a static IP does not cause that issues, i.e. whether only the DHCP client has issues at this stage. Also can you add your journactl -u ifup@eth0 |
So, I reverted all changes and tried multiple tests with dhcp and static ip settings. Before you read further: I think it has nothing to do with dhcp or static ip, instead with the device not initialize correctly or a wrong device state due to services not waiting. test (dhcp)
test (static)
last test
Here a screenshot of the boot process and the log: Edit: as soon as I added the timeout (sleep command) both versions (dhcp and static ip) worked without a problem |
Now I see the problem: This service waits for We hence changed But, since on stock Debian, the two services are not related to each other, if is perfectly possible that What we hence need to assure is that udev settles, hence our LED udev rules have finished, before Please try this: G_CONFIG_INJECT '\[Unit\]' '[Unit]' /etc/systemd/system/ifupdown-pre.service.d/dietpi.conf
G_CONFIG_INJECT 'Before=' 'Before=network-pre.target' /etc/systemd/system/ifupdown-pre.service.d/dietpi.conf '\[Unit\]'
G_CONFIG_INJECT 'Wants=' 'Wants=network-pre.target' /etc/systemd/system/ifupdown-pre.service.d/dietpi.conf '\[Unit\]'
reboot |
Thanks for the detailed information and quick fix. Unfortunately, it failed after the 13th attempt. With dhcp, it worked 4 times and the 5th failed with wrong LED state (1 and 2 are on), the service As soon as the system is accessible, I attach the journal log. |
Would be good to know how |
Here is the journal.txt Before I found this specific ticket/issue, I read some other NanoPi related ones which mentioned, that the SoC is not always reset properly. It is a wild guess but would match with the missing CPU governors. |
Not related to the issue, but your network configuration seems to have issues as well: Jun 02 21:57:02 DietPi systemd[1]: Starting [email protected] - ifup for eth0...
Jun 02 21:57:02 DietPi ifup[464]: ifup: /etc/network/interfaces.d/if.tar:1: misplaced option
Jun 02 21:57:02 DietPi systemd[1]: Starting networking.service - Raise network interfaces...
Jun 02 21:57:02 DietPi ifup[466]: ifup: /etc/network/interfaces.d/if.tar:1: misplaced option However, the bigger problem is that indeed the network adapters seem to be ready very late:
These are our LED udev rules, trying to bring up the interface to apply the LED trigger (else it would just be lid, regardless whether a cable is attached or not). The adapters have been detected at that time:
There also the missing CPU governors show up. So yeah, something is broken since latest Linux 6.6.32 update. This all worked perfectly fine some time ago, also with older Linux 6.6, IIRC. Can you try this: apt install linux-{image,dtb}-current-rockchip64=24.2.1 Confirm the downgrade. |
- Network | Assure that "ifupdown-pre.service" finishes before "[email protected]" instances, like it does for "networking.service", by ordering it before "network-pre.target". A case has been found where udev settles very late, so that hotplug interfaces could be brought up earlier, causing race conditions with udev rules: #6951 (comment)
Yeah, not a problem.
I will try it tomorrow, as soon as I get back from work. |
Our I'll also run some tests now on my R5S. |
With SD card, it works fine here as well. I cannot test the eMMC currently, since it is still on an old 5.10.160 kernel, where I want to test the conversion to more recent Linux builds. I'll see whether I can finish this tomorrow evening. Just to rule it out, does it change anything for you when you remove our udev rules? mv /etc/udev/rules.d/dietpi-eth-leds.rules{,_bak} However, since you have issues even with the CPU governor, I think there is a more fundamental problem with the latest kernel when booting from eMMC. |
Another report, which seems to be the same issue at first view, booting with NVMe in this case: https://dietpi.com/forum/t/20268 |
So, today it was way more difficult to reproduce the error. Disabling the LED udev rule didn't change anything, I guess (stopped after 10 reboots). Downgrading the kernel had some or no effect. The only thing I can confirm is, that the CPU governor has nothing to do with the kernel versions or the failed state. Changing the systemd service dependencies was a big step forward. |
So far so good then, so enforcing the systemd order helped against the case where our udev rules brought down an interface which is in the process of If so, we could further investigate this way: G_CONFIG_INJECT 'udev_log=' 'udev_log=debug' /etc/udev/udev.conf
reboot This a mass of detailed udev logs, which device is detected when, which udev rules it triggered, what they did, in case whether they took very long etc etc: journalctl In case, makes sense to filter it a little, like journalctl --grep eth0 to check logs for eth0 in particular. Did we actually check for kernel errors already? dmesg -l 0,1,2,3 |
I removed eth1 and eth2 from If I remember correctly, there were no kernel errors during the boot process. |
So, after 80 or more reboots over two days, I can't reproduce the issue. I also reverted the udev debug log, switch kernels and played with power supply, because the last time it happened, I switched the power supply to monitor the consumption via FNIRSi FNB58 (after our changes). Either it is a weird power glitch/SoC reset problem or a timing problem between the services. Update: Now, when I switch to static ip address, it sometimes hangs @ |
Did the udev logs contain any hint why the Ethernet devices take so long to get ready, or some detach/attach loops? |
I disabled the logs and switched to static ip afterwards, because I wanted to setup my system for now. I will re-enable the debug log and gather enough information. Update: Finally, I got the error again. Let's hope, that the error log will be enough to find the cause. (~15min. until I can access the login prompt) |
Hey, a little bit late with my response due to overtime. I could not find any clue, but maybe you will. Here is my log journal.log.gz |
Definitely try to remove our udev rule. Or did you try this already and still ran into the same issue?
Does this mean, you were not able to trigger the issue within 10 reboots, when the udev rule was disabled, but you were able to trigger it after 20 reboots, with the udev rules in place? If so, it has not yet been ruled out, that those do cause the issue, for which, we would need to trigger the issue with the rules disabled: mv /etc/udev/rules.d/dietpi-eth-leds.rules{,_bak} Hence, I am still not sure whether its a symptom or reason, but all seems to process fine, until one of the LED rules triggers So our ordering, which seem to have helped you, did not help in this particular case, since the timeouts made
Here the example of Jun 04 23:23:28 DietPi systemd[1]: Starting networking.service - Raise network interfaces...
# ExecStart timeout causing SIGTERM
Jun 04 23:28:28 DietPi systemd[1]: networking.service: start operation timed out. Terminating.
# SIGTERM timeout causing SIGKILL
Jun 04 23:29:58 DietPi systemd[1]: networking.service: State 'stop-sigterm' timed out. Killing.
Jun 04 23:29:58 DietPi systemd[1]: networking.service: Killing process 481 (ifup) with signal SIGKILL.
# SIGKILL timeout
Jun 04 23:31:28 DietPi systemd[1]: networking.service: Processes still around after SIGKILL. Ignoring.
# ExecStopPost timeout causing SIGTERM
Jun 04 23:32:59 DietPi systemd[1]: networking.service: State 'stop-post' timed out. Terminating.
# SIGTERM timeout causing SIGKILL
Jun 04 23:34:29 DietPi systemd[1]: networking.service: State 'final-sigterm' timed out. Killing.
Jun 04 23:34:29 DietPi systemd[1]: networking.service: Killing process 481 (ifup) with signal SIGKILL.
# SIGKILL timeout
Jun 04 23:35:59 DietPi systemd[1]: networking.service: Processes still around after final SIGKILL. Entering failed mode.
Jun 04 23:35:59 DietPi systemd[1]: networking.service: Failed with result 'timeout'.
Jun 04 23:35:59 DietPi systemd[1]: networking.service: Unit process 481 (ifup) remains running after unit stopped.
Jun 04 23:35:59 DietPi systemd[1]: Failed to start networking.service - Raise network interfaces. 23:23:28 til 23:35:59 are 12.5 minutes. Totally insane, of course. For the But I could not find any hint in the logs, why In any case, we really need to find a better way to enable the Ethernet LEDs, so that they light up/blink on link/activity, but keep them off until there is really a cable connected and link up. The whole reason for this An alternative would be to leave the LEDs untouched at boot, and instead configure them via |
Could you (both of you) test whether this has been solved? We did some changes around network services. |
I'm still on v9.5.1, do you want me to update to give some tests? |
Uploading both logs for passed and failed scenarios, hope it helps. |
@jysqice does this apply to mainline kernel (which we use) as well? |
@yjteng Armbian btw provides an overlay to apply the LED triggers via device tree: https://github.com/armbian/build/blob/main/patch/kernel/archive/rockchip64-6.6/overlay/rockchip-rk3568-nanopi-r5s-leds.dts It should not have an effect on the underlying issue, but I recognise you use |
Creating a bug report/issue
Required Information
DietPi version |
Distro version |
bookworm
Kernel version |
Linux r5s 6.6.16-current-rockchip64 #1 SMP PREEMPT Fri Feb 23 08:25:28 UTC 2024 aarch64 GNU/Linux
SBC model |
NanoPi R5S/R5C (aarch64)
R5SPower supply used | USB Hub power supply
SD card used | Several disks used, Samsung, Sandisk, no name
Additional Information (if applicable)
Steps to reproduce
LAN1
)The power on display shows that a new connection needs to be established (e.g. via dietpi-config)
Expected behaviour
Actual behaviour
WAN
)Extra details
ip a
in case of running system:ip a
in case of problem state:ethtool eth0
in case of running system:ethtool eth0
in case of problem state:Contents of
/etc/network/interfaces
:Strange behaviour of MAC of eth0 and eth1: The values (given by
ip a
) change on reboot, see output above.MAC of eth2 stays on
c2:3b:04:cb:97:8b
The text was updated successfully, but these errors were encountered: