-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raspberry Pi 3B+ transferring large amounts of data with the ethernet interface locks up system #2608
Comments
Any chance to connect on the Debug UART to see a possible kernel panic? |
I will try the serial port and see what happens - after that I will try the JTAG port debugging method - hopefully this weekend. |
Here is a crash without a swap file on the external disk - my original tests were done both with and without a swap partition enabled - my next report will include the results with the swap partition enabled |
Thanks, could you please reproduce it sometimes and check if PC stays the same? Edit: Btw i'm able to reproduce the issue. |
@magore Thanks. That crash log fits with another, partial log in another thread. It's crashing on this line: https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/usb/lan78xx.c#L3296 One can see that it was processing a pending queue of 2 packets, the first of which was 74 bytes long, and that it had just moved onto the second packet when it crashed due to an invalid The crash is likely to be because the second
I'm very interested to know (especially now that @lategoodbye can reproduce it) whether the faulting addresses are always similar and how many other details of the crash details match. It might be worth hacking the driver to put an explicit check for an |
Here a short summary of my test results: Scenario: Case 1: Case 2: Case 3: |
That code seems to be peeking at items in a linked list, while it is possible that additional items are being appended to the list simultaneously in another thread.
Is this always safe without locks? |
I've not found any clear description of the locking rules, but it looks like the netdev_ops methods (e.g. the function that appends to this queue) run mutually exclusively to the bottom half handlers (from which this is called). However, I'd be happy to be proved wrong on this occasion. |
The way I understand the code is that when an outbound packet need to be sent lan78xx_start_xmit() is called, adding packet to the linked list. Tasklet lan78xx_bh() -> lan78xx_tx_bh() peeks at list first (without locking) and then takes packets from list (does lock) and sends them. |
I followed lan78xx_start_xmit back up the chain of functions that call it and found that the caller had previously called (something like) rcu_lock_bh, which would seem to interlock against all bottom half tasklets. |
@pelwell During my tests the panic occured always (3 times) at the same address: |
Here is my second crash test and capture file - with 2G swap enabled. ASIDE: - I noticed two Undervoltage events in the logs that did not align with measured values at the GPIO 5V header - before the start of the test - but not durring. I also monitor both the GPIO voltages and USB voltages while testing. See above |
Interesting - not only is the PC the same, so is the rogue pointer value. |
@lategoodbye Could you try with this patch?: From 9b5ffd549bd8a978b3202aa0d049496aaa954c28 Mon Sep 17 00:00:00 2001
From: Phil Elwell <[email protected]>
Date: Sat, 7 Jul 2018 19:12:26 +0100
Subject: [PATCH] lan78xx: Dump pending tx queue if end pointer is invalid
---
drivers/net/usb/lan78xx.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 9eada92..356a9cb 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3276,6 +3276,23 @@ static void rx_complete(struct urb *urb)
netif_dbg(dev, rx_err, dev->net, "no read resubmitted\n");
}
+static void lan78xx_dump_skb_queue(struct sk_buff_head *skbq)
+{
+ struct sk_buff *skb;
+ int pkt_cnt = 0;
+
+ for (skb = skbq->next; pkt_cnt < skbq->qlen; skb = skb->next) {
+ pr_err("skb %d@%p:\n", pkt_cnt, skb);
+ print_hex_dump(" ", ">", DUMP_PREFIX_OFFSET,
+ 16, 1, skb, sizeof(*skb), false);
+ pr_err(" data@%p:\n", skb->data);
+ print_hex_dump(" ", ">", DUMP_PREFIX_OFFSET,
+ 16, 1, skb->data, skb->len, false);
+ pkt_cnt++;
+ }
+
+}
+
static void lan78xx_tx_bh(struct lan78xx_net *dev)
{
int length;
@@ -3293,6 +3310,11 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
count = 0;
length = 0;
for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {
+ if ((u32)skb->end < 0x80000000) {
+ pr_err("Invalid end pointer in sk_buff %p\n", skb);
+ lan78xx_dump_skb_queue(tqp);
+ break;
+ }
if (skb_is_gso(skb)) {
if (pkt_cnt) {
/* handle previous packets first */
--
2.7.4
|
Was there a specific branch you would perfer for my test ? |
Let's stick to rpi-4.14.y. |
Seems list is edited while it is being read.
|
Since it's always the same address (aee94558) in my case, I wonder if it's the address of the sentinel node of the list. Which brings me back to my theory that items are appended to the list concurrently, and it ends up in a situation in which "next" does not point to the newest item yet (but still to start of list/senitel as the last item of list should), but it did already increment the total number of items (qlen). |
I am still waiting for my system to crash |
You need to run |
I altered the patch slightly so that it prints out the value of
And that confirmed my suspicions:
Possible fixes:
or Replace:
With:
As that standard macro does seem to test if skb->next does not point to sentinel, instead of using qlen. |
I used dmesg to get the results |
@maxnet Very good work. Have you tried with the change to queue_walk? That does seem to be a more robust mechanism for walking the queue than the current for loop. ALthough I am slightly concerned that there could still be concurrency issues even with that, if the queue isn't locked although I haven't looked at how items are added/removed to be sure. |
Good call, @maxnet - you've saved us a lot of time.
The use of locking primitives usually includes a memory barrier, but I'm concerned that one may be missing (or being skipped) in this case. As an experiment, try adding a |
Yes. Although I now have the problem that my kernel with the change won't boot if I have And I also managed to get a different kernel panic in process usb-storage if I the stress the Pi by starting
But only seen this once, and probably unrelated to the one reported in this thread as well. ==
When adding/removing there does is a lock on the queue.
I assume we could use that lock when reading as well, and call __skb_dequeue() instead of skb_dequeue() in that case as we already hold the lock. E.g. would that be like: (untested)
Or should the lock be held a little bit longer as there is also other code using list slightly below E.g.: (untested)
But then the lock would also be held over a memory allocation function (alloc_skb) |
Personally i prefer the skb_queue_walk approach (currently testing), which should avoid a performance loss. In my results above i only tested the upstream variants only once so it's possible to reproduce it there. |
Hmm, seems skb_queue_walk does not fully solve it.
|
I currently try the following:
|
Consider trying the barrier in lan78xx_start_xmit. |
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Source: linux-mvista-2.4 MR: 94992, 00000 Type: Integration Disposition: Merged from linux-mvista-2.4 ChangeID: 2be27d444f61c5542df5c00892817124582428a4 Description: commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Corey Minyard <[email protected]> Signed-off-by: Jeremy Puhlman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit f6ed63b)
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]>
BugLink: http://bugs.launchpad.net/bugs/1811877 commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Stefan Bader <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39ac upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream. The skb size calculation in lan78xx_tx_bh is in race with the start_xmit, which could lead to rare kernel oopses. So protect the whole skb walk with a spin lock. As a benefit we can unlink the skb directly. This patch was tested on Raspberry Pi 3B+ Link: raspberrypi/linux#2608 Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Cc: stable <[email protected]> Signed-off-by: Floris Bos <[email protected]> Signed-off-by: Stefan Wahren <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
By locks up I mean - no console response - blank console - no logs written - CPU lan chip and power supply chip stay at just over 50C as if they are doing something - . red power light no green light
Transfer tests via rsync of 500G of data.
I tried this on several RPI 3B+ bought over the last few month starting in march - all have the problem
Does not happen on RPI3B
Note: Using an external USB Ethernet 1000T adapter or the wifi interface fixes the lockups
Used latest RASPBIAN
Steps to reproduce
If you also try rpi-update the results will be the same regardless of options
( I used a 5.1V 3A supply tested with a HP Agilent 34401A meter at the GPIO headers while the PI was under very heavy loads) measure noise with a RIGOL 1054 scope.
It does not matter if you used a powered external drive or a power hub and USB drive as the results are the same - tested.
I used an Ubuntu 16.04 desktop and rsynced a copy of the complete system to the USB drive on the PI "pi-desktop" - about 500G. Example: ionice -c 3 rsync --delete -a -H -x -S --numeric-ids --info=progress2 -delete --exclude ".gvfs" / root@pi-desktop:/backup/
Wait for the crash - typically less then 60G of data for this to happen
There were only two conditions where this worked
Test was repeated many times over several days
What Always failed - using the internal interface
Latest version RASPIAN as of 1 July 2018 - any update method using rpi-update NEXT or defaults
Settings that did not impact the crashing with internal adapter
Note - this really slows down the copy process but still dies
dtparam=eee=off
sdram_freq=450
arm_freq=1200
Other tests I tried other then rsync
dd if=/dev/zero bs=1M status=progress | ssh root@pi-desktop "cat >/dev/null"
I also tried various tuning parameters for fun with no impact on crashing
But this does speed up transfer speeds...
sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
sysctl -w net.ipv4.route.flush=1
The text was updated successfully, but these errors were encountered: