-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raspberry Pi 3B+ freezes with a network share mounted and ethernet connected #2598
Comments
You wrote there is a Plex media server, which provides the CIFS share. Which OS / hardware architecture is running on this server? |
I'd like to leave an anecdote, to try and give some idea of the complexities involved, and why knowledge of the entire system is required. I have a Zylel NAS, and a Humax PVR on my home network. I want to be able to share the data on the PVR to other devices on the network, and it has a option just for that. BUT, there is some incompatibility between the version of server SW on the PVR and that on the NAS, so if I turn sharing on on the PVR, at some point the PVR will lock up solid, and needs to be power cycled. Humax says it's Zylels fault, Zylel say it's Humax's fault. In the meantime, nothing gets fixed. |
In this particular case, since the majority of people do NOT see the issue, is even more important to determine the environment around the device, because that is surely what is causing the problem to occur. That is NOT to say there isn't an issue on the Pi, just that the particular environment is exposing that issue. |
@JamesH65 The Plex media server is running on the 3B+. The CIFS share is provided by the router. |
To clarify.
Cannot really speculate where the problem is at the moment. |
Sorry, for the confusion. Your Netgear router exposes the CIFS share. Does the issue still occure if you change cifs parameter from |
@lategoodbye
|
Could you please provide the output of the following commands from the Pi after ~ 1h uptime:
Did you use the apt files from https://dev2day.de/pms/ for the Plex media server installation? Edit: I tried to reproduce this issue at home with my TP-Link Archer C1200, but my RPi 3B+ is still available after 20 hours. I will try to setup a SAMBA server ... |
Here are the results of the commands after around ~ 1h uptime:
Yes, i use the apt files from there to install plex media server. |
Thanks for providing the outputs. There is one obviously difference to my results: i don't have EEE TX LPI transitions. Does the Pi still freeze after executing the following command (necessary after every reboot): |
I tried the command and the 3B+ survived the night for the first time after 3 months! That is really good. |
Great. Please keep the setup at least for 1 day, so we are sure that's not an exception. I'm not the Ethernet expert, but this command disabled a power saving feature. I guess LPI is required on all devices in the network and that's the reason i can't reproduce it. Wild theory: The combination of CIFS share and Plex server wake up the Pi too often. But it's also possible there is a bug in the LPI handling of the kernel. I hope the Raspberry Pi guys can take over. Btw if you want this change persistent add dtparam=eee=off to config.txt and reboot. |
Yes, i will test that a few days now and post here the results. But this is literally the first time a 3B+ survived the night for me :p This change should be included in the RPi firmware if indeed fixes the issue! But the Plex server doesn't access the CIFS share, it does it only when i trigger it (as far as i know)... Anyway, the Ethernet chip on 3B+ must be a big piece of trash if we need to disable half of the features to make it work properly... |
Interesting. EEE is energy efficient ethernet. We've had one bug fix already from the supplier of the chip (long cables causing EEE problems), sounds like there needs to be another one. Or we just turn it off, which would be my preference. OOI, what length cables does your network have? |
EEE gives us a useful power saving, so I'm reluctant to just switch it off for everybody. I think the Plex media server, by default, runs a media scan in the early hours of the morning. That is likely to be the trigger for the failure. |
@pelwell maybe it gives useful power saving, but it doesn't work! @JamesH65 Right now i have an uptime of 38 hours, which i never saw before on a 3B+ connected via ethernet. Let's see if it survives the next night... |
@pelwell PMS opens several ports (incl. DLNA server) so i think there could be a lot of reasons for this issue. I think it should be possible to reproduce it with 2x RPI 3B+ (1 CIFS server + 1 PMS) and a LPI capable network. Maybe we could increase the value of tx_lpi_timer. |
I had a similar problem with my RPI 3B+. The only thing i am running on it is a haproxy instance inside a docker container. I have disabled the reboot cron job and disabled eee this morning. And it looks like it fixed the problem. (My cable is about 1 meter long.) |
Unfortunately the 3B+ was dead again today at the morning with EEE off... What a shame... I did one change in comparison to the first night where it survived. I have assigned a static IP address to the 3B+ ethernet mac address in the router so it always has the same IP (normally i do this always straight ahead but this time the first night i didn't do it since i thought this will be just a quick test). So maybe DHCP i also a variable here which triggers the issue? The next night i will try to remove the static ip address from the 3B+ again and see if it dies. |
@tilosp As you run into the same issue, do you have a HDMI screen or serial connection connected to the Pi to see what's happend if the issue appeared? |
@lategoodbye no i am running it headless |
Let me rephrase my question: could you please connect a HDMI screen or serial connection to the Pi and make the issue occur? |
Yes i can connect a monitor and try to make the issue occur. But i can only try it next weekend because i am currently about 150km away from my pi. Are there some specific commands I should run if i get the issue to occur? |
In case you still can run commands, please run the following commands:
Otherwise try to make a screenshot of the possible kernel panic. |
Today at the morning my 3B+ was again dead. I did remove the MAC address of the Pi from my router so it did not have a static IP, also the EEE was on. I also moved the 3B+ next to my TV so i could connect it to HDMI after it dies as suggested by @lategoodbye, but unfortunately it gave me "no signal" after i plugged in the HDMI on my TV so it was completely dead. |
What was the state of Pi LEDs? |
So would it be correct to say that turning off EEE did make quite a difference, but didn't completely fix the issue, just made it less likely to happen? I wonder if we are seeing two issues here. |
@lategoodbye @JamesH65 |
Do you use a case for the Raspberry Pi and in case what kind? |
I use a case but i also already tried numerous times without a case and nothing changed. So today at the morning the 3B+ was dead again, i did set EEE to off so it seems that it is not the issue. |
It's a shame that the useful bit - what CPU0 was doing - is off the top of the screen. You could try setting |
@JamesH65 @pelwell |
It might be - it depends on which context and how hard it crashes. It's going to be in |
Ok i will later check out if i can find the kernel panic there. |
The height is what matters. I found that it looks the same whether or not I set the width, but you may get different results. |
I did check
So it seems that after the drive was mounted the 3B+ instantly died. At 7:18 i restarted the Pi. Also a few seconds before that, there are the following lines in the log:
The Pi did try a few times to mount it but with failures... If you need any more specific info i can check the log... I have also set |
I know you have claimed repeatedly that Plex doesn't run an overnight scan on your system, but I humbly suggest that you may be mistaken. Notice that the automount request is running on behalf of Even if I'm right it wouldn't excuse the crashes and lockups, but it would help to understand why they are occurring. |
As you can see here periodical scanning is disabled. From the Plex documentation about the
So it still doesn't explain why it only crashes exactly at around 2:00 in the morning. It should crash every time i copy some new media to the drive since then the scanning starts, but this is not the case. Anyway like you say, it doesn't matter here. Plex might be the trigger but the issue is with the 3B+ which should slowly be resolved after almost 4 months now... |
Do you have anything relevant enabled in the "Scheduled Tasks" section? At what time does they run? |
Actually that was my first time that i checked that section. Those are default values in there, i never touched them Scheduled Tasks
It already crashed before 2:00... |
Here is the Kernel panic from today, there is a little more information. I hope that this can help... |
The panic looks similiar like #2538 Any idea to trigger the scheduled task on demand? Btw my RPI 3B+ has now a uptime of 1 day and 10 hours, so your Plex media server seems to have more "load" during scheduled tasks. |
That trace is very helpful. With a certain amount of guesswork I would say that I can see the kernel version, but if you know exactly which build it is that would help to remove some of the uncertainty. A serial cable to capture the crash log would be even better, but this is already massive help. |
@lategoodbye @pelwell |
@merdok If you get your kernels from rpi-update then the content of Is the txq_pend queue access in lan78xx_tx_bh safe? The skb list structure (skb_buff_head) includes a lock but the standard skb queue API methods don't call it themselves - it is up to the caller to do so depending on the use case. Some guidelines on the context of calls to the netdev_ops methods and their interactions with bottom halves would be useful. |
I think the netdev_ops methods are called in atomic context, interlocked against the bottom half handlers, so the txq_pend access is (at least at a basic level) safe, but something is clearly wrong - either a race condition or a buffer overflow/memory corruption. |
@pelwell |
@merdok At least you can avoid the manual restarts after the crash by adding |
I currently switched back to WiFi, since it works rock solid using that :) @pelwell any progress? Do you need me to do any more tests? |
Progress is being made in a parallel issue (#2608) which may turn out to be caused by the same bug. We've worked out that under some circumstances the pending queue can appear to have one more sk_buff on it than is actually the case. This causes a naive list walk to misinterpret the end marker or sentinel (actually the object that represents the list as a whole) as being an sk_buff. Since the sentinel doesn't have an end pointer, attempts to use it end up using some other data as a pointer, hence the crash. I'm going to put this issue on hold (but leave it open) until we have some more answers in #2608. |
@JamesH65 cool, thanks for the info. I will check that out and let you know if this is fixed. When will the fix be available through rpi-update ? |
The potential fix is now in rpi-update kernel. |
That's good news. Please close the issue when you feel the matter is resolved. |
I will try it for a few more days and if everything is fine, i will close the issue! |
Ok, I guess that this fix is definitely working. My 3B+ survived now 4 night and has an uptime of 4 days. Thanks! |
Since the release of the 3B+ i am struggling with the it ALWAYS freezing overnight when i mount my network share and connect the Pi using ethernet to my router.
I had multiple 3B+ in the past months and each of them had the same issue. I always wake up to my Pi being dead and the network chip is burning hot.
The facts are:
I use fstab to mount the share with the following command:
//ip/USB_Storage/ /home/pi/network_drive cifs vers=3.0,defaults,guest,uid=1000,iocharset=utf8,x-systemd.automount 0 0
I am using Raspbian with all latest updates (dist-upgrade & rpi-update).
Kernel currently at 4.14.50 but it was happening on all previous before.
I am using Plex media server with default configuration.
There is only a LAN cable and power cable connected to my Pi, no other ports are used.
Using official Raspberry Pi power supply.
Router is a Netgear x4s and i use the built in readyshare server to expose the share in my network.
There is no network traffic overnight.
There is no activity on the Pi overnight.
I do not think this is network or router related issue because my 3B runs fine with the same configuration also no other devices making problems.
My 3B without any problem had 100 days+ of uptime, with the same configuration and 3B+ can't get past 24 hours of uptime.
At first i thought it was a defective 3B+ but after multiple times I got a replacement and bought new ones the issue is still here. All 6 RPi 3B+ which i got had the issue. I bought a new power supply and a new sd card, i also reinstalled my Raspbian image from scratch multiple times all without success. Then i slowly did narrow down the issue to the ethernet controller.
The text was updated successfully, but these errors were encountered: