Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi 3B+ transferring large amounts of data with the ethernet interface locks up system #2608

Closed
magore opened this issue Jul 5, 2018 · 58 comments

Comments

@magore
Copy link

magore commented Jul 5, 2018

By locks up I mean - no console response - blank console - no logs written - CPU lan chip and power supply chip stay at just over 50C as if they are doing something - . red power light no green light
Transfer tests via rsync of 500G of data.
I tried this on several RPI 3B+ bought over the last few month starting in march - all have the problem
Does not happen on RPI3B
Note: Using an external USB Ethernet 1000T adapter or the wifi interface fixes the lockups
Used latest RASPBIAN

Steps to reproduce

  1. Install the latest version of RASPBIAN and apply all updates
    If you also try rpi-update the results will be the same regardless of options
  2. install openssh server
  3. Attach a known working power supply
    ( I used a 5.1V 3A supply tested with a HP Agilent 34401A meter at the GPIO headers while the PI was under very heavy loads) measure noise with a RIGOL 1054 scope.
  4. Attach a powered USB hard drive with an EXT4 file system
    It does not matter if you used a powered external drive or a power hub and USB drive as the results are the same - tested.
  5. mount the USB drive partition on a folder - say /backup
  6. Connect PI to 1000T network switch and computer - or directly to the computer with 1000T interface - the resulting crash will still be the same - tested
  7. Use rsync to copy a large amount of data
    I used an Ubuntu 16.04 desktop and rsynced a copy of the complete system to the USB drive on the PI "pi-desktop" - about 500G. Example: ionice -c 3 rsync --delete -a -H -x -S --numeric-ids --info=progress2 -delete --exclude ".gvfs" / root@pi-desktop:/backup/
    Wait for the crash - typically less then 60G of data for this to happen

There were only two conditions where this worked

  1. Not using the builtin Ethernet interface and instead using an external USB 1000T Ethernet adapter on the PI for the connection - this finished without errors at about 20 mega bytes per second average
  2. using the WIFI connection - this worked without errors at about 5.7 mega bytes per second average
    Test was repeated many times over several days

What Always failed - using the internal interface
Latest version RASPIAN as of 1 July 2018 - any update method using rpi-update NEXT or defaults
Settings that did not impact the crashing with internal adapter

  1. ethtool --offload eth0 rx off tx off
    Note - this really slows down the copy process but still dies
  2. ethtool -K eth0 tx-tcp-segmentation off
  3. /boot/config.txt
    dtparam=eee=off
    sdram_freq=450
    arm_freq=1200

Other tests I tried other then rsync
dd if=/dev/zero bs=1M status=progress | ssh root@pi-desktop "cat >/dev/null"

I also tried various tuning parameters for fun with no impact on crashing
But this does speed up transfer speeds...
sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'
sysctl -w net.ipv4.route.flush=1

@lategoodbye
Copy link
Contributor

Any chance to connect on the Debug UART to see a possible kernel panic?

@magore
Copy link
Author

magore commented Jul 6, 2018

I will try the serial port and see what happens - after that I will try the JTAG port debugging method - hopefully this weekend.

@magore
Copy link
Author

magore commented Jul 7, 2018

Here is a crash without a swap file on the external disk - my original tests were done both with and without a swap partition enabled - my next report will include the results with the swap partition enabled
Here is a crash log - I started logging when the system booted until it crashed
The important error: Unable to handle kernel paging request at virtual address 40018684
Details in: minicom.txt
Interesting line in log: [ 1155.191116] PC is at lan78xx_bh+0x118/0x7a0

@lategoodbye
Copy link
Contributor

lategoodbye commented Jul 7, 2018

Thanks, could you please reproduce it sometimes and check if PC stays the same?

Edit: Btw i'm able to reproduce the issue.

@pelwell
Copy link
Contributor

pelwell commented Jul 7, 2018

@magore Thanks. That crash log fits with another, partial log in another thread. It's crashing on this line: https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/usb/lan78xx.c#L3296

One can see that it was processing a pending queue of 2 packets, the first of which was 74 bytes long, and that it had just moved onto the second packet when it crashed due to an invalid end (of data) pointer in the sk_buff structure that holds the second packet. The faulting address was 0x40018684, which is 0x40018680 (the bad end) + 4. 0x40018680 is an interesting address for the kernel because it is clearly outside the kernel space (which starts at 0x80000000 in a downstream kernel), so it is somewhere in user-space. We'd have to investigate how user code partitions its address space.

The crash is likely to be because the second sk_buff:

  1. has been overwritten by something else,
  2. is something which used to be an sk_buff but which has now been recycled and turned into something else, or
  3. was never an sk_buff but is actually something else being misinterpreted because the link in the previous entry in the chain was corrupted.

I'm very interested to know (especially now that @lategoodbye can reproduce it) whether the faulting addresses are always similar and how many other details of the crash details match. It might be worth hacking the driver to put an explicit check for an skb->end address less than 0x80000000 and dumping key skb fields and some of the packet data before failing gracefully.

@lategoodbye
Copy link
Contributor

Here a short summary of my test results:

Scenario:
rsync the rootfs of my Notebook on a USB HDD connected to a Raspberry Pi 3B+

Case 1:
Kernel: Downstream 4.14.52 (bcm2709_defconfig)
tx-tcp-segmentation off
Result: usually crash after ~ 13 GB data

Case 2:
Kernel: Upstream 4.18rc3 (multi_v7_defconfig)
tx-tcp-segmentation on
Result: hang after 34 GB

Case 3:
Kernel: Upstream 4.18rc3 (multi_v7_defconfig)
tx-tcp-segmentation off
Result: rsync completed after 42 GB

@maxnet
Copy link
Contributor

maxnet commented Jul 7, 2018

That code seems to be peeking at items in a linked list, while it is possible that additional items are being appended to the list simultaneously in another thread.

for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {

Is this always safe without locks?
E.g. is it guaranteed that when items are appended ->next is set properly before qlen is incremented?

@pelwell
Copy link
Contributor

pelwell commented Jul 7, 2018

I've not found any clear description of the locking rules, but it looks like the netdev_ops methods (e.g. the function that appends to this queue) run mutually exclusively to the bottom half handlers (from which this is called). However, I'd be happy to be proved wrong on this occasion.

@maxnet
Copy link
Contributor

maxnet commented Jul 7, 2018

The way I understand the code is that when an outbound packet need to be sent lan78xx_start_xmit() is called, adding packet to the linked list.

Tasklet lan78xx_bh() -> lan78xx_tx_bh() peeks at list first (without locking) and then takes packets from list (does lock) and sends them.
Isn't it possible that the tasklet is executed on different CPU core simultaneous to start_xmit ?

@pelwell
Copy link
Contributor

pelwell commented Jul 7, 2018

I followed lan78xx_start_xmit back up the chain of functions that call it and found that the caller had previously called (something like) rcu_lock_bh, which would seem to interlock against all bottom half tasklets.

@lategoodbye
Copy link
Contributor

@pelwell During my tests the panic occured always (3 times) at the same address:
[ 1340.510482] Unable to handle kernel paging request at virtual address 40018784

@magore
Copy link
Author

magore commented Jul 7, 2018

Here is my second crash test and capture file - with 2G swap enabled.
FYI - voltage at PI GPIO pins 5.1V +/- 30mv under load while test running (should have mentioned that I was monitoring it in my last report)
minicom2.txt
Interesting lines: Unable to handle kernel paging request at virtual address 40018684
[ 8978.978622] Unable to handle kernel paging request at virtual address 40018684
[ 8979.036115] PC is at lan78xx_bh+0x118/0x7a0

ASIDE: - I noticed two Undervoltage events in the logs that did not align with measured values at the GPIO 5V header - before the start of the test - but not durring. I also monitor both the GPIO voltages and USB voltages while testing. See above

@pelwell
Copy link
Contributor

pelwell commented Jul 7, 2018

Interesting - not only is the PC the same, so is the rogue pointer value.

@pelwell
Copy link
Contributor

pelwell commented Jul 7, 2018

@lategoodbye Could you try with this patch?:

From 9b5ffd549bd8a978b3202aa0d049496aaa954c28 Mon Sep 17 00:00:00 2001
From: Phil Elwell <[email protected]>
Date: Sat, 7 Jul 2018 19:12:26 +0100
Subject: [PATCH] lan78xx: Dump pending tx queue if end pointer is invalid

---
 drivers/net/usb/lan78xx.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 9eada92..356a9cb 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3276,6 +3276,23 @@ static void rx_complete(struct urb *urb)
 	netif_dbg(dev, rx_err, dev->net, "no read resubmitted\n");
 }
 
+static void lan78xx_dump_skb_queue(struct sk_buff_head *skbq)
+{
+	struct sk_buff *skb;
+	int pkt_cnt = 0;
+
+	for (skb = skbq->next; pkt_cnt < skbq->qlen; skb = skb->next) {
+		pr_err("skb %d@%p:\n", pkt_cnt, skb);
+		print_hex_dump("  ", ">", DUMP_PREFIX_OFFSET,
+			       16, 1, skb, sizeof(*skb), false);
+		pr_err("  data@%p:\n", skb->data);
+		print_hex_dump("  ", ">", DUMP_PREFIX_OFFSET,
+			       16, 1, skb->data, skb->len, false);
+		pkt_cnt++;
+	}
+
+}
+
 static void lan78xx_tx_bh(struct lan78xx_net *dev)
 {
 	int length;
@@ -3293,6 +3310,11 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
 	count = 0;
 	length = 0;
 	for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {
+		if ((u32)skb->end < 0x80000000) {
+			pr_err("Invalid end pointer in sk_buff %p\n", skb);
+			lan78xx_dump_skb_queue(tqp);
+			break;
+		} 
 		if (skb_is_gso(skb)) {
 			if (pkt_cnt) {
 				/* handle previous packets first */
-- 
2.7.4

@magore
Copy link
Author

magore commented Jul 7, 2018

Was there a specific branch you would perfer for my test ?
I am cross compiling the current branch with your added patches now - I will report back
I am now running the rsync test on the rpi-4.14.y branch with your patch

@pelwell
Copy link
Contributor

pelwell commented Jul 7, 2018

Let's stick to rpi-4.14.y.

@maxnet
Copy link
Contributor

maxnet commented Jul 7, 2018

Seems list is edited while it is being read.
Says "Invalid end pointer in sk_buff aee94558", but there is no aee94558 in the list by the time lan78xx_dump_skb_queue() prints the list.

[  753.439541] Invalid end pointer in sk_buff aee94558
[  753.439551] skb 0@a3f5c900:
[  753.439558]   >00000000: 40 7b df a3 58 45 e9 ae 00 00 00 00 00 00 00 00
[  753.439562]   >00000010: 80 0c 73 9d 00 40 e9 ae 42 00 00 00 00 00 00 00
[  753.439565]   >00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  753.439568]   >00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  753.439570]   >00000040: 00 00 00 00 00 00 00 00 00 00 00 00 30 10 7e 80
[  753.439573]   >00000050: 00 00 00 00 00 00 00 00 00 00 00 00 4a 00 00 00
[  753.439577]   >00000060: 00 00 00 00 00 00 00 00 00 00 00 00 c0 03 00 00
[  753.439580]   >00000070: 00 00 00 00 10 01 10 00 02 00 00 00 00 00 00 00
[  753.439583]   >00000080: 00 ac ac c5 00 00 00 00 01 00 00 00 00 00 00 00
[  753.439585]   >00000090: 00 00 00 00 00 00 00 00 08 00 10 01 fc 00 ee 00
[  753.439588]   >000000a0: 30 2d 2a b0 40 2d 2a b0 00 2c 2a b0 e6 2c 2a b0
[  753.439591]   >000000b0: 02 00 00 00 01 00 00 00
[  753.439593]   data@b02a2ce6:
[  753.439596]   >00000000: 42 00 40 06 00 00 00 00 30 5a 3a 45 ea 97 b8 27
[  753.439599]   >00000010: eb a2 9b 65 08 00 45 08 00 34 16 b1 40 00 40 06
[  753.439601]   >00000020: 3d 31 c0 a8 b2 ab c0 a8 b2 dd 00 16 a9 d8 c6 7e
[  753.439604]   >00000030: be e2 2c 29 f8 e2 80 10 0e e4 e7 00 00 00 01 01
[  753.439607]   >00000040: 08 0a c4 da a7 2d 00 86 11 f7
[  753.439610] skb 1@a3df7b40:
[  753.439612]   >00000000: 58 45 e9 ae 00 c9 f5 a3 00 00 00 00 00 00 00 00
[  753.439615]   >00000010: 80 0c 73 9d 00 40 e9 ae 42 00 00 00 00 00 00 00
[  753.439618]   >00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  753.439622]   >00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  753.439625]   >00000040: 00 00 00 00 00 00 00 00 00 00 00 00 30 10 7e 80
[  753.439627]   >00000050: 00 00 00 00 00 00 00 00 00 00 00 00 4a 00 00 00
[  753.439630]   >00000060: 00 00 00 00 00 00 00 00 00 00 00 00 c0 03 00 00
[  753.439633]   >00000070: 00 00 00 00 10 01 10 00 02 00 00 00 00 00 00 00
[  753.439636]   >00000080: 00 ac ac c5 00 00 00 00 02 00 00 00 00 00 00 00
[  753.439638]   >00000090: 00 00 00 00 00 00 00 00 08 00 10 01 fc 00 ee 00
[  753.439641]   >000000a0: 30 43 33 b2 40 43 33 b2 00 42 33 b2 e6 42 33 b2
[  753.439643]   >000000b0: 02 00 00 00 01 00 00 00
[  753.439646]   data@b23342e6:
[  753.439648]   >00000000: 42 00 40 06 00 00 00 00 30 5a 3a 45 ea 97 b8 27
[  753.439651]   >00000010: eb a2 9b 65 08 00 45 08 00 34 16 b2 40 00 40 06
[  753.439654]   >00000020: 3d 30 c0 a8 b2 ab c0 a8 b2 dd 00 16 a9 d8 c6 7e
[  753.439657]   >00000030: be e2 2c 2a 04 32 80 10 0e e4 e7 00 00 00 01 01
[  753.439659]   >00000040: 08 0a c4 da a7 2d 00 86 11 f7
[ 1137.832666] Invalid end pointer in sk_buff aee94558
[ 1137.832684] skb 0@9d57ff00:
[ 1137.832694]   >00000000: 00 e0 0c b2 58 45 e9 ae 00 00 00 00 00 00 00 00
[ 1137.832699]   >00000010: 00 00 73 9d 00 40 e9 ae 42 00 00 00 00 00 00 00
[ 1137.832710]   >00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1137.832716]   >00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1137.832721]   >00000040: 00 00 00 00 00 00 00 00 00 00 00 00 30 10 7e 80
[ 1137.832727]   >00000050: 00 00 00 00 00 00 00 00 00 00 00 00 4a 00 00 00
[ 1137.832734]   >00000060: 00 00 00 00 00 00 00 00 00 00 00 00 c0 03 00 00
[ 1137.832739]   >00000070: 00 00 00 00 10 01 10 00 02 00 00 00 00 00 00 00
[ 1137.832744]   >00000080: a5 84 1d eb 00 00 00 00 01 00 00 00 00 00 00 00
[ 1137.832750]   >00000090: 00 00 00 00 00 00 00 00 08 00 10 01 fc 00 ee 00
[ 1137.832759]   >000000a0: 30 41 65 83 40 41 65 83 00 40 65 83 e6 40 65 83
[ 1137.832765]   >000000b0: 02 00 00 00 01 00 00 00
[ 1137.832769]   data@836540e6:
[ 1137.832774]   >00000000: 42 00 40 06 00 00 00 00 30 5a 3a 45 ea 97 b8 27
[ 1137.832779]   >00000010: eb a2 9b 65 08 00 45 08 00 34 b6 1f 40 00 40 06
[ 1137.832784]   >00000020: 9d c2 c0 a8 b2 ab c0 a8 b2 dd 00 16 a9 cc e7 41
[ 1137.832788]   >00000030: e9 4c 0d 8e d8 2c 80 10 33 ef e7 00 00 00 01 01
[ 1137.832794]   >00000040: 08 0a c4 e0 84 b6 00 87 89 59
[ 1137.832796] skb 1@b20ce000:
[ 1137.832804]   >00000000: 58 45 e9 ae 00 ff 57 9d 00 00 00 00 00 00 00 00
[ 1137.832809]   >00000010: 00 00 73 9d 00 40 e9 ae 42 00 00 00 00 00 00 00
[ 1137.832813]   >00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1137.832818]   >00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1137.832824]   >00000040: 00 00 00 00 00 00 00 00 00 00 00 00 30 10 7e 80
[ 1137.832827]   >00000050: 00 00 00 00 00 00 00 00 00 00 00 00 4a 00 00 00
[ 1137.832833]   >00000060: 00 00 00 00 00 00 00 00 00 00 00 00 c0 03 00 00
[ 1137.832843]   >00000070: 00 00 00 00 10 01 10 00 02 00 00 00 00 00 00 00
[ 1137.832892]   >00000080: a5 84 1d eb 00 00 00 00 02 00 00 00 00 00 00 00
[ 1137.832898]   >00000090: 00 00 00 00 00 00 00 00 08 00 10 01 fc 00 ee 00
[ 1137.832904]   >000000a0: 30 6b d8 a3 40 6b d8 a3 00 6a d8 a3 e6 6a d8 a3
[ 1137.832909]   >000000b0: 02 00 00 00 01 00 00 00
[ 1137.832913]   data@a3d86ae6:
[ 1137.832919]   >00000000: 42 00 40 06 00 00 00 00 30 5a 3a 45 ea 97 b8 27
[ 1137.832925]   >00000010: eb a2 9b 65 08 00 45 08 00 34 b6 20 40 00 40 06
[ 1137.832930]   >00000020: 9d c1 c0 a8 b2 ab c0 a8 b2 dd 00 16 a9 cc e7 41
[ 1137.832936]   >00000030: e9 4c 0d 8e e3 7c 80 10 33 ef e7 00 00 00 01 01
[ 1137.832942]   >00000040: 08 0a c4 e0 84 b6 00 87 89 5a

@maxnet
Copy link
Contributor

maxnet commented Jul 7, 2018

Since it's always the same address (aee94558) in my case, I wonder if it's the address of the sentinel node of the list.

Which brings me back to my theory that items are appended to the list concurrently, and it ends up in a situation in which "next" does not point to the newest item yet (but still to start of list/senitel as the last item of list should), but it did already increment the total number of items (qlen).

@magore
Copy link
Author

magore commented Jul 7, 2018

I am still waiting for my system to crash
Has anyone else been following this discussion in the linux-arm-kernel mailing list ?
https://www.spinics.net/lists/kexec/msg20813.html
They are talk about interrupt processing in the lan75xx driver - perhaps this is related ?

@maxnet
Copy link
Contributor

maxnet commented Jul 7, 2018

I am still waiting for my system to crash

You need to run dmesg yourself to see if you got the "Invalid end pointer in sk_buff" message.
It no longer panics (crashes) with the patch.

@maxnet
Copy link
Contributor

maxnet commented Jul 7, 2018

I altered the patch slightly so that it prints out the value of tqp (=sentinel) as well.

pr_err("Invalid end pointer in sk_buff %p pkt_cnt %d tqp %p \n", skb, pkt_cnt, tqp);

And that confirmed my suspicions:

[  230.606762] Invalid end pointer in sk_buff aee94558 pkt_cnt 1 tqp aee94558 
[  230.606784] skb 0@a5768840:
[  230.606794]   >00000000: 80 3d 34 b2 58 45 e9 ae 00 00 00 00 00 00 00 00
[  230.606799]   >00000010: 40 06 cc b0 00 40 e9 ae 42 00 00 00 00 00 00 00
[  230.606809]   >00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  230.606815]   >00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  230.606822]   >00000040: 00 00 00 00 00 00 00 00 00 00 00 00 38 10 7e 80
[  230.606827]   >00000050: 00 00 00 00 00 00 00 00 00 00 00 00 4a 00 00 00
[  230.606833]   >00000060: 00 00 00 00 00 00 00 00 00 00 00 00 c0 03 00 00
[  230.606838]   >00000070: 00 00 00 00 10 01 10 00 02 00 00 00 00 00 00 00
[  230.606842]   >00000080: 3f 4a 32 6e 00 00 00 00 01 00 00 00 00 00 00 00
[  230.606848]   >00000090: 00 00 00 00 00 00 00 00 08 00 10 01 fc 00 ee 00
[  230.606855]   >000000a0: 30 b1 53 90 40 b1 53 90 00 b0 53 90 e6 b0 53 90
[  230.606862]   >000000b0: 02 00 00 00 01 00 00 00
[  230.606868]   data@9053b0e6:
[  230.606875]   >00000000: 42 00 40 06 00 00 00 00 30 5a 3a 45 ea 97 b8 27
[  230.606881]   >00000010: eb a2 9b 65 08 00 45 08 00 34 67 f1 40 00 40 06
[  230.606890]   >00000020: eb f0 c0 a8 b2 ab c0 a8 b2 dd 00 16 af ec e9 71
[  230.606896]   >00000030: 39 e3 bb f2 bf 2f 80 10 18 fe e7 00 00 00 01 01
[  230.606903]   >00000040: 08 0a 92 58 09 00 00 a7 7a 5c
[  230.606909] skb 1@b2343d80:
[  230.606915]   >00000000: 58 45 e9 ae 40 88 76 a5 00 00 00 00 00 00 00 00
[  230.606922]   >00000010: 40 06 cc b0 00 40 e9 ae 42 00 00 00 00 00 00 00
[  230.606928]   >00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  230.606934]   >00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  230.606940]   >00000040: 00 00 00 00 00 00 00 00 00 00 00 00 38 10 7e 80
[  230.606946]   >00000050: 00 00 00 00 00 00 00 00 00 00 00 00 4a 00 00 00
[  230.606952]   >00000060: 00 00 00 00 00 00 00 00 00 00 00 00 c0 03 00 00
[  230.606957]   >00000070: 00 00 00 00 10 01 10 00 02 00 00 00 00 00 00 00
[  230.606963]   >00000080: 3f 4a 32 6e 00 00 00 00 03 00 00 00 00 00 00 00
[  230.606969]   >00000090: 00 00 00 00 00 00 00 00 08 00 10 01 fc 00 ee 00
[  230.606975]   >000000a0: 30 9d 82 af 40 9d 82 af 00 9c 82 af e6 9c 82 af
[  230.606980]   >000000b0: 02 00 00 00 01 00 00 00
[  230.606985]   data@af829ce6:
[  230.606989]   >00000000: 42 00 40 06 00 00 00 00 30 5a 3a 45 ea 97 b8 27
[  230.606993]   >00000010: eb a2 9b 65 08 00 45 08 00 34 67 f2 40 00 40 06
[  230.606996]   >00000020: eb ef c0 a8 b2 ab c0 a8 b2 dd 00 16 af ec e9 71
[  230.607002]   >00000030: 39 e3 bb f2 ca 7f 80 10 18 fe e7 00 00 00 01 01
[  230.607022]   >00000040: 08 0a 92 58 09 00 00 a7 7a 5c

Possible fixes:

  • Lock the list while reading it.

or

Replace:

for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {

With:

skb_queue_walk(tqp, skb) {

As that standard macro does seem to test if skb->next does not point to sentinel, instead of using qlen.
https://github.com/raspberrypi/linux/blob/rpi-4.14.y/include/linux/skbuff.h#L3170

@magore
Copy link
Author

magore commented Jul 8, 2018

I used dmesg to get the results
patch-test.txt

@JamesH65
Copy link
Contributor

JamesH65 commented Jul 8, 2018

@maxnet Very good work. Have you tried with the change to queue_walk? That does seem to be a more robust mechanism for walking the queue than the current for loop. ALthough I am slightly concerned that there could still be concurrency issues even with that, if the queue isn't locked although I haven't looked at how items are added/removed to be sure.

@pelwell
Copy link
Contributor

pelwell commented Jul 8, 2018

Good call, @maxnet - you've saved us a lot of time.

skb_queue_tail claims the list spin_lock, then calls (indirectly) __skb_insert to add the item to the list. The function does update the pointers before incrementing the length, but instruction reordering and buffered writes mean that there are no guarantees about the order in which they actually occur.
Even on a single CPU, without a memory barrier between the two it is possible for an interrupt (hard or soft) to see the incremented length but not the new next pointer.

The use of locking primitives usually includes a memory barrier, but I'm concerned that one may be missing (or being skipped) in this case.

As an experiment, try adding a barrier() after the call to skb_queue_tail in lan78xx_start_xmit and rebuilding. I doubt that would be the final solution but it may indicate if we're on the right track.

@maxnet
Copy link
Contributor

maxnet commented Jul 8, 2018

@maxnet Very good work. Have you tried with the change to queue_walk?

Yes.
It seems to solve the panic described in the thread.

Although I now have the problem that my kernel with the change won't boot if I have quiet in cmdline.txt for some unknown reason.
No output on serial console. Only the kernel boot logo (4x Raspberry), and a cursor that does not blink.
Probably unrelated to the change though, as I have seen such strange boot problems on some of my kernel builds before (since 4.14.x)
Without quiet in cmdline.txt it does boot fine.

And I also managed to get a different kernel panic in process usb-storage if I the stress the Pi by starting memtester 10m while concurrently sending a lot of files to the Pi's hard drive through 2 rsync over ssh sessions.

[  594.356235] Unable to handle kernel paging request at virtual address 03056564
[  594.363590] pgd = 80004000
[  594.366332] [03056564] *pgd=00000000
[  594.369966] Internal error: Oops: 5 [#1] SMP ARM
[  594.374650] Modules linked in: rfcomm cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic fuse sg i2c_dev ip_tables x_tables ipv6 brcmfmac brcmutil cfg80211 snd_bcm2835(C) joydev snd_pcm rfkill snd_timer snd uio_pdrv_genirq uio fixed
[  594.395973] CPU: 2 PID: 84 Comm: usb-storage Tainted: G         C      4.14.52v7-aufs #1
[  594.404180] Hardware name: BCM2835
[  594.407627] task: aeeb0f00 task.stack: b3044000
[  594.412233] PC is at dequeue_task_fair+0x14c/0xc44
[  594.417095] LR is at __update_load_avg_cfs_rq+0x16c/0x258
[  594.422567] pc : [<80152818>]    lr : [<8014e408>]    psr: 20000093
[  594.428924] sp : b3045c10  ip : 0305651c  fp : b1045c64
[  594.434220] r10: 00000001  r9 : 00000000  r8 : 00000009
[  594.439515] r7 : b5031d78  r6 : aeeb0f80  r5 : 0000008a  r4 : 00000001
[  594.446134] r3 : 00000001  r2 : 00000000  r1 : 0000b770  r0 : 00000001
[  594.452754] Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  594.460076] Control: 10c5383d  Table: 13c1406a  DAC: 00000055
[  594.465902] Process usb-storage (pid: 84, stack limit = 0xb3044210)
[  594.472259] Stack: (0xb3045c10 to 0xb3046000)
[  594.476676] 5c00:                                     aeeb0f80 b1846400 b4acd000 b4acba00
[  594.484972] 5c20: b3045c64 b5031d78 00000009 80f94a3c b5031d40 80e92400 b3045c54 264aa53b
[  594.493264] 5c40: 00000003 264aa53b 00000003 aeeb0f00 b5031d40 00000009 b3045c94 b3045c68
[  594.501558] 5c60: 80148004 801526d8 00000000 00000000 80e9ad40 80f044c0 aeeb1364 b5031d40
[  594.509852] 5c80: aeeb0f00 80e9ad40 b3045cf4 b3045c98 80909c2c 80147f34 b22c4f88 aeee9430
[  594.518145] 5ca0: b3045d74 b3045cb0 806f97c8 80727284 b3045d1c 8090a198 34197000 aeeb1360
[  594.526439] 5cc0: aee2b108 00000000 b3045d9c b3044000 7fffffff 7fffffff 00000002 00000000
[  594.534735] 5ce0: b3044000 00000000 b3045d0c b3045cf8 8090a198 809097ec 00000000 aeee94dc
[  594.543031] 5d00: b3045d64 b3045d10 8090dcd4 8090a154 b3045d34 b3045d20 8090a198 809097ec
[  594.551327] 5d20: 00000000 b3045de8 aeee94e0 7fffffff aeee94e0 00000002 aeee94c4 aeee94dc
[  594.559619] 5d40: 7fffffff aeee94e0 00000002 00000000 b3044000 00000000 b3045dac b3045d68
[  594.567915] 5d60: 8090ae10 8090db14 b3045dbc 00000001 aeeb0f00 801493b0 aeee94e4 aeee94e4
[  594.576213] 5d80: aeee94c4 aeee94bc aeee94dc 00000000 00000013 aeee94c4 0000001f aeee9430
[  594.584508] 5da0: b3045dbc b3045db0 8090aed4 8090ad38 b3045dec b3045dc0 806fc978 8090aec0
[  594.592803] 5dc0: 0000001f aeee9430 aeee94bc aeee9430 aeee94bc 0001e000 c0010600 aeee9454
[  594.601096] 5de0: b3045e24 b3045df0 80734308 806fc850 9f95d200 00000013 00000000 01400000
[  594.609389] 5e00: c0010600 9f9586d0 aeee9430 9f9586d0 b591824f 0001e000 b3045e4c b3045e28
[  594.617683] 5e20: 807343c4 8073428c 0001e000 b3045e34 aeee9430 9f9586d0 b591824f b5918240
[  594.625975] 5e40: b3045e84 b3045e50 807344f0 80734374 00000000 b3045e60 8090a198 809097ec
[  594.634270] 5e60: 9f9586d0 00000000 aeee94f8 80f02d00 00000000 00000000 b3045f14 b3045e88
[  594.642565] 5e80: 80734dc0 807343d8 801a25f4 8010e8d4 aeee94fc 7fffffff aeee94fc 00000001
[  594.650860] 5ea0: 00000001 b3044000 b3045ecc aeee94f8 7fffffff aeee94fc 00000001 00000001
[  594.659155] 5ec0: b3045f14 b3045ed0 8090ae84 801edfa0 00000001 00000001 aeeb0f00 801493b0
[  594.667449] 5ee0: 00000100 00000200 aeee9018 aeee9430 00000000 aeee94f8 80f02d00 00000000
[  594.675744] 5f00: 00000000 9f9586d0 b3045f24 b3045f18 80733aac 80734d9c b3045f7c b3045f28
[  594.684038] 5f20: 80736324 80733aa0 34197000 aeeb1360 b336a09c 00000000 b3045f5c b3044000
[  594.692335] 5f40: 00000000 b336a040 b336a09c aeee9430 8073618c b336a080 00000000 b336a040
[  594.700627] 5f60: b336a09c aeee9430 8073618c b4a81a50 b3045fac b3045f80 8013e0e0 80736198
[  594.708923] 5f80: ffffffff b336a040 8013dfa4 00000000 00000000 00000000 00000000 00000000
[  594.717219] 5fa0: 00000000 b3045fb0 8010894c 8013dfb0 00000000 00000000 00000000 00000000
[  594.725513] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  594.733809] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[  594.742109] Code: e1a04000 f57ff05a e51bc040 e19aa000 (e1cc24d8) 
[  594.748297] ---[ end trace fa5488ba26256917 ]---
[  600.089709] lan78xx 1-1.1.1:1.0 eth0: Failed to read stat ret = 0xffffff92

But only seen this once, and probably unrelated to the one reported in this thread as well.

==

ALthough I am slightly concerned that there could still be concurrency issues even with that, if the queue isn't locked although I haven't looked at how items are added/removed to be sure.

When adding/removing there does is a lock on the queue.
E.g. the skb_dequeue() function called looks likes this:

struct sk_buff *skb_dequeue(struct sk_buff_head *list)
{
	unsigned long flags;
	struct sk_buff *result;

	spin_lock_irqsave(&list->lock, flags);
	result = __skb_dequeue(list);
	spin_unlock_irqrestore(&list->lock, flags);
	return result;
}

I assume we could use that lock when reading as well, and call __skb_dequeue() instead of skb_dequeue() in that case as we already hold the lock.
However my kernel programming knowledge is too limited to know the best place to take such locks.

E.g. would that be like: (untested)

	unsigned long flags;
	spin_lock_irqsave(&tqp->lock, flags); /* Take lock */

	for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {
		if (skb_is_gso(skb)) {
			if (pkt_cnt) {
				/* handle previous packets first */
				break;
			}
			count = 1;
			length = skb->len - TX_OVERHEAD;
			skb2 = __skb_dequeue(tqp);
			spin_unlock_irqrestore(&tqp->lock, flags); /* Release lock */
			goto gso_skb;
		}

		if ((skb_totallen + skb->len) > MAX_SINGLE_PACKET_SIZE)
			break;
		skb_totallen = skb->len + roundup(skb_totallen, sizeof(u32));
		pkt_cnt++;
	}
	spin_unlock_irqrestore(&tqp->lock, flags); /* Release lock */

Or should the lock be held a little bit longer as there is also other code using list slightly below

E.g.: (untested)

	unsigned long flags;
	spin_lock_irqsave(&tqp->lock, flags); /* Take lock */

	for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {
		if (skb_is_gso(skb)) {
			if (pkt_cnt) {
				/* handle previous packets first */
				break;
			}
			count = 1;
			length = skb->len - TX_OVERHEAD;
			skb2 = __skb_dequeue(tqp);
			goto gso_skb;
		}

		if ((skb_totallen + skb->len) > MAX_SINGLE_PACKET_SIZE)
			break;
		skb_totallen = skb->len + roundup(skb_totallen, sizeof(u32));
		pkt_cnt++;
	}

	/* copy to a single skb */
	skb = alloc_skb(skb_totallen, GFP_ATOMIC);
	if (!skb)
		goto drop;

	skb_put(skb, skb_totallen);

	for (count = pos = 0; count < pkt_cnt; count++) {
		skb2 = __skb_dequeue(tqp);
		if (skb2) {
			length += (skb2->len - TX_OVERHEAD);
			memcpy(skb->data + pos, skb2->data, skb2->len);
			pos += roundup(skb2->len, sizeof(u32));
			dev_kfree_skb(skb2);
		}
	}

gso_skb:
	spin_unlock_irqrestore(&tqp->lock, flags); /* Release lock */

But then the lock would also be held over a memory allocation function (alloc_skb)
I know from my experience of concurrent programming in userspace that one should avoid that, as memory allocation functions are expensive and can take a while, but I have no clue what the rules and best practices are in kernel programming in this regard.
So leave it up to you which of the options should be the best fix.

@lategoodbye
Copy link
Contributor

lategoodbye commented Jul 8, 2018

Personally i prefer the skb_queue_walk approach (currently testing), which should avoid a performance loss.

In my results above i only tested the upstream variants only once so it's possible to reproduce it there.

@maxnet
Copy link
Contributor

maxnet commented Jul 8, 2018

Hmm, seems skb_queue_walk does not fully solve it.
Looks like it hangs with a null pointer dereference error now
But forgot to configure the scrollback buffer of my serial terminal program large enough, so do not have the full message.

ter dereference at virtual address 000000a4
[ 1666.632531] pgd = 80004000
[ 1666.635423] [000000a4] *pgd=00000000
[ 1666.639204] Internal error: Oops: 17 [#1] SMP ARM
[ 1666.643971] Modules linked in: rfcomm cmac bnep hci_uart btbcm serdev bluetooth ecdh_generic fuse sg i2c_dev ip_tables x_tables ipv6 brcmfmac brcmutil snd_bcm2835(C) cfg80211 rfkill snd_pcm joydev snd_timer snd uio_pdrv_genirq uio fixed
[ 1666.665273] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C      4.14.52v7-aufs #1
[ 1666.673209] Hardware name: BCM2835
[ 1666.676652] task: 80f06cc0 task.stack: 80f00000
[ 1666.681248] PC is at lan78xx_bh+0x114/0x85c
[ 1666.685487] LR is at _raw_spin_unlock_irqrestore+0x3c/0x70
[ 1666.691045] pc : [<806e6524>]    lr : [<8090f264>]    psr: a0000113
[ 1666.697395] sp : 80f01db8  ip : 80f04174  fp : 80f01e04
[ 1666.702688] r10: 80e69a30  r9 : 00000018  r8 : b433e500
[ 1666.707981] r7 : b433e558  r6 : 00000002  r5 : 00000096  r4 : 00000000
[ 1666.714596] r3 : 0000004a  r2 : 0000004c  r1 : 00000094  r0 : 00002328
[ 1666.721213] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[ 1666.728445] Control: 10c5383d  Table: 0825406a  DAC: 00000055
[ 1666.734266] Process swapper/0 (pid: 0, stack limit = 0x80f00210)
[ 1666.740353] Stack: (0x80f01db8 to 0x80f02000)
[ 1666.744765] 1da0:                                                       807febb4 807ea88c
[ 1666.753058] 1dc0: 80f01df4 b433e538 80f04a8c 80f99190 80f03d68 80f94d00 80f04a8c b433e568
[ 1666.761349] 1de0: b433e56c 00000000 80e92310 00000000 00000018 80e69a30 80f01e2c 80f01e08
[ 1666.769641] 1e00: 801245bc 806e641c 8012454c 00000006 80f02098 00000040 00000007 00000100
[ 1666.777933] 1e20: 80f01e94 80f01e30 80101674 80124558 80f01e54 80f01e40 80179edc 00000001
[ 1666.786224] 1e40: 00200102 80a0221c 000215d5 80f02d00 80fb7580 00000007 80f94e70 80f03d68
[ 1666.794516] 1e60: 80e92378 80f02080 80175508 80e9a92c 00000000 00000000 00000001 b4803180
[ 1666.802807] 1e80: 80f00000 80e69a30 80f01ea4 80f01e98 8012408c 80101514 80f01ecc 80f01ea8
[ 1666.811098] 1ea0: 80175b58 80123fb8 80f01ee8 b6800000 00000000 ffffffff 80f01f1c 00000001
[ 1666.819391] 1ec0: 80f01ee4 80f01ed0 80101504 80175af4 80109290 60000113 80f01f44 80f01ee8
[ 1666.827683] 1ee0: 8090f53c 80101468 00000000 05308bac 80f01f48 00000000 80f00000 80f03dcc
[ 1666.835974] 1f00: 80f03d68 80f94a4a 00000001 80fb6c00 80e69a30 80f01f44 80f04174 80f01f38
[ 1666.844266] 1f20: 8010928c 80109290 60000113 ffffffff 8010928c 00000000 80f01f54 80f01f48
[ 1666.852558] 1f40: 8090ecb0 80109268 80f01f7c 80f01f58 801617ac 8090ec88 80e922c4 000000be
[ 1666.860850] 1f60: 00000001 ffffffff 80f03d40 00000001 80f01f8c 80f01f80 80161ac0 801616e4
[ 1666.869142] 1f80: 80f01fa4 80f01f90 80908ab8 80161aa4 80f0f7f0 80fb6c50 80f01ff4 80f01fa8
[ 1666.877434] 1fa0: 80e00e2c 80908a04 ffffffff ffffffff 00000000 80e00754 00000000 b57ffb00
[ 1666.885727] 1fc0: 00000000 80e69a30 00000000 80fb6e94 80f03d58 80e69a2c 80f08a30 0000406a
[ 1666.894019] 1fe0: 410fd034 00000000 00000000 80f01ff8 0000807c 80e00a4c 00000000 00000000
[ 1666.902322] [<806e6524>] (lan78xx_bh) from [<801245bc>] (tasklet_action+0x70/0x108)
[ 1666.910090] [<801245bc>] (tasklet_action) from [<80101674>] (__do_softirq+0x16c/0x3f8)
[ 1666.918119] [<80101674>] (__do_softirq) from [<8012408c>] (irq_exit+0xe0/0x144)
[ 1666.925532] [<8012408c>] (irq_exit) from [<80175b58>] (__handle_domain_irq+0x70/0xc0)
[ 1666.933475] [<80175b58>] (__handle_domain_irq) from [<80101504>] (bcm2836_arm_irqchip_handle_irq+0xa8/0xac)
[ 1666.943355] [<80101504>] (bcm2836_arm_irqchip_handle_irq) from [<8090f53c>] (__irq_svc+0x5c/0x7c)
[ 1666.952349] Exception stack(0x80f01ee8 to 0x80f01f30)
[ 1666.957467] 1ee0:                   00000000 05308bac 80f01f48 00000000 80f00000 80f03dcc
[ 1666.965760] 1f00: 80f03d68 80f94a4a 00000001 80fb6c00 80e69a30 80f01f44 80f04174 80f01f38
[ 1666.974051] 1f20: 8010928c 80109290 60000113 ffffffff
[ 1666.979175] [<8090f53c>] (__irq_svc) from [<80109290>] (arch_cpu_idle+0x34/0x4c)
[ 1666.986676] [<80109290>] (arch_cpu_idle) from [<8090ecb0>] (default_idle_call+0x34/0x48)
[ 1666.994882] [<8090ecb0>] (default_idle_call) from [<801617ac>] (do_idle+0xd4/0x14c)
[ 1667.002647] [<801617ac>] (do_idle) from [<80161ac0>] (cpu_startup_entry+0x28/0x2c)
[ 1667.010323] [<80161ac0>] (cpu_startup_entry) from [<80908ab8>] (rest_init+0xc0/0xc4)
[ 1667.018180] [<80908ab8>] (rest_init) from [<80e00e2c>] (start_kernel+0x3ec/0x3f8)
[ 1667.025768] Code: e0825003 e2866001 e1570004 0a000007 (e59430a4) 
[ 1667.032123] ---[ end trace 487c639b775d5cb9 ]---
[ 1667.036957] Kernel panic - not syncing: Fatal exception in interrupt
[ 1667.043403] CPU3: stopping
[ 1667.046144] CPU: 3 PID: 24 Comm: ksoftirqd/3 Tainted: G      D  C      4.14.52v7-aufs #1
[ 1667.054344] Hardware name: BCM2835
[ 1667.057794] [<80110960>] (unwind_backtrace) from [<8010ca4c>] (show_stack+0x20/0x24)
[ 1667.065648] [<8010ca4c>] (show_stack) from [<808f3e3c>] (dump_stack+0xd4/0x118)
[ 1667.073061] [<808f3e3c>] (dump_stack) from [<8010ecd4>] (handle_IPI+0x31c/0x33c)
[ 1667.080561] [<8010ecd4>] (handle_IPI) from [<801014d8>] (bcm2836_arm_irqchip_handle_irq+0x7c/0xac)
[ 1667.089648] [<801014d8>] (bcm2836_arm_irqchip_handle_irq) from [<8090f53c>] (__irq_svc+0x5c/0x7c)
[ 1667.098643] Exception stack(0xb496be88 to 0xb496bed0)
[ 1667.103762] be80:                   80fb7580 ffffffff 00000003 00000000 00000006 80f0209c
[ 1667.112054] bea0: 00000040 00000000 00000100 00000018 00000000 b496bf3c 80f04174 b496bed8
[ 1667.120345] bec0: 80101610 80101614 60000013 ffffffff
[ 1667.125465] [<8090f53c>] (__irq_svc) from [<80101614>] (__do_softirq+0x10c/0x3f8)
[ 1667.133052] [<80101614>] (__do_softirq) from [<80123d28>] (run_ksoftirqd+0x44/0x6c)
[ 1667.140818] [<80123d28>] (run_ksoftirqd) from [<801425d8>] (smpboot_thread_fn+0x124/0x1a0)
[ 1667.149202] [<801425d8>] (smpboot_thread_fn) from [<8013e0e0>] (kthread+0x13c/0x16c)
[ 1667.157055] [<8013e0e0>] (kthread) from [<8010894c>] (ret_from_fork+0x14/0x28)
[ 1667.164375] CPU1: stopping
[ 1667.167116] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D  C      4.14.52v7-aufs #1
[ 1667.175052] Hardware name: BCM2835
[ 1667.178499] [<80110960>] (unwind_backtrace) from [<8010ca4c>] (show_stack+0x20/0x24)
[ 1667.186350] [<8010ca4c>] (show_stack) from [<808f3e3c>] (dump_stack+0xd4/0x118)
[ 1667.193763] [<808f3e3c>] (dump_stack) from [<8010ecd4>] (handle_IPI+0x31c/0x33c)
[ 1667.201263] [<8010ecd4>] (handle_IPI) from [<801014d8>] (bcm2836_arm_irqchip_handle_irq+0x7c/0xac)
[ 1667.210352] [<801014d8>] (bcm2836_arm_irqchip_handle_irq) from [<8090f53c>] (__irq_svc+0x5c/0x7c)
[ 1667.219347] Exception stack(0xb4927f38 to 0xb4927f80)
[ 1667.224464] 7f20:                                                       00000000 04e59040
[ 1667.232756] 7f40: b4927f98 00000000 b4926000 80f03dcc 80f03d68 80f94a4a 00000001 410fd034
[ 1667.241048] 7f60: 00000000 b4927f94 80f04174 b4927f88 8010928c 80109290 68000013 ffffffff
[ 1667.249342] [<8090f53c>] (__irq_svc) from [<80109290>] (arch_cpu_idle+0x34/0x4c)
[ 1667.256843] [<80109290>] (arch_cpu_idle) from [<8090ecb0>] (default_idle_call+0x34/0x48)
[ 1667.265048] [<8090ecb0>] (default_idle_call) from [<801617ac>] (do_idle+0xd4/0x14c)
[ 1667.272812] [<801617ac>] (do_idle) from [<80161ac0>] (cpu_startup_entry+0x28/0x2c)
[ 1667.280487] [<80161ac0>] (cpu_startup_entry) from [<8010e750>] (secondary_start_kernel+0x144/0x16c)
[ 1667.289662] [<8010e750>] (secondary_start_kernel) from [<0010198c>] (0x10198c)
[ 1667.296981] CPU2: stopping
[ 1667.299722] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D  C      4.14.52v7-aufs #1
[ 1667.307658] Hardware name: BCM2835
[ 1667.311105] [<80110960>] (unwind_backtrace) from [<8010ca4c>] (show_stack+0x20/0x24)
[ 1667.318957] [<8010ca4c>] (show_stack) from [<808f3e3c>] (dump_stack+0xd4/0x118)
[ 1667.326368] [<808f3e3c>] (dump_stack) from [<8010ecd4>] (handle_IPI+0x31c/0x33c)
[ 1667.333867] [<8010ecd4>] (handle_IPI) from [<801014d8>] (bcm2836_arm_irqchip_handle_irq+0x7c/0xac)
[ 1667.342954] [<801014d8>] (bcm2836_arm_irqchip_handle_irq) from [<8090f53c>] (__irq_svc+0x5c/0x7c)
[ 1667.351949] Exception stack(0xb4931f38 to 0xb4931f80)
[ 1667.357067] 1f20:                                                       00000000 0059813c
[ 1667.365358] 1f40: b4931f98 00000000 b4930000 80f03dcc 80f03d68 80f94a4a 00000001 410fd034
[ 1667.373650] 1f60: 00000000 b4931f94 80f04174 b4931f88 8010928c 80109290 60000013 ffffffff
[ 1667.381944] [<8090f53c>] (__irq_svc) from [<80109290>] (arch_cpu_idle+0x34/0x4c)
[ 1667.389445] [<80109290>] (arch_cpu_idle) from [<8090ecb0>] (default_idle_call+0x34/0x48)
[ 1667.397650] [<8090ecb0>] (default_idle_call) from [<801617ac>] (do_idle+0xd4/0x14c)
[ 1667.405413] [<801617ac>] (do_idle) from [<80161ac0>] (cpu_startup_entry+0x28/0x2c)
[ 1667.413090] [<80161ac0>] (cpu_startup_entry) from [<8010e750>] (secondary_start_kernel+0x144/0x16c)
[ 1667.422263] [<8010e750>] (secondary_start_kernel) from [<0010198c>] (0x10198c)
[ 1667.429591] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

@lategoodbye
Copy link
Contributor

I currently try the following:

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 9eada92..c7f1a1e 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3283,7 +3283,7 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
 	struct skb_data *entry;
 	unsigned long flags;
 	struct sk_buff_head *tqp = &dev->txq_pend;
-	struct sk_buff *skb, *skb2;
+	struct sk_buff *skb;
 	int ret;
 	int count, pos;
 	int skb_totallen, pkt_cnt;
@@ -3292,15 +3292,15 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
 	pkt_cnt = 0;
 	count = 0;
 	length = 0;
-	for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {
+	skb_queue_walk(tqp, skb) {
 		if (skb_is_gso(skb)) {
 			if (pkt_cnt) {
 				/* handle previous packets first */
 				break;
 			}
 			count = 1;
+			skb = skb_dequeue(tqp);
 			length = skb->len - TX_OVERHEAD;
-			skb2 = skb_dequeue(tqp);
 			goto gso_skb;
 		}
 
@@ -3318,7 +3318,7 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
 	skb_put(skb, skb_totallen);
 
 	for (count = pos = 0; count < pkt_cnt; count++) {
-		skb2 = skb_dequeue(tqp);
+		struct sk_buff *skb2 = skb_dequeue(tqp);
 		if (skb2) {
 			length += (skb2->len - TX_OVERHEAD);
 			memcpy(skb->data + pos, skb2->data, skb2->len);

@pelwell
Copy link
Contributor

pelwell commented Jul 8, 2018

Consider trying the barrier in lan78xx_start_xmit.

woodsts pushed a commit to woodsts/linux-stable that referenced this issue Jul 22, 2018
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
woodsts pushed a commit to woodsts/linux-stable that referenced this issue Jul 22, 2018
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista-2.4 that referenced this issue Jul 26, 2018
Source: linux-mvista-2.4
MR: 94992, 00000
Type: Integration
Disposition: Merged from linux-mvista-2.4
ChangeID: 2be27d444f61c5542df5c00892817124582428a4
Description:

commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Corey Minyard <[email protected]>
Signed-off-by: Jeremy Puhlman <[email protected]>
freak07 pushed a commit to freak07/Kirisakura_Imagine that referenced this issue Jul 27, 2018
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

(cherry picked from commit f6ed63b)
abun880007 pushed a commit to Team-UB/android_kernel_samsung_universal9810 that referenced this issue Jul 29, 2018
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
abun880007 pushed a commit to Team-UB/android_kernel_samsung_universal9810 that referenced this issue Jul 30, 2018
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
abun880007 pushed a commit to Team-UB/android_kernel_samsung_universal9810 that referenced this issue Aug 3, 2018
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mvaisakh pushed a commit to mvaisakh/kernel-msm that referenced this issue Aug 31, 2018
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mvaisakh pushed a commit to mvaisakh/kernel-msm that referenced this issue Sep 13, 2018
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
QuinAsura pushed a commit to QuinAsura/linux that referenced this issue Sep 14, 2018
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jrziviani pushed a commit to jrziviani/linux-devel that referenced this issue Feb 13, 2019
BugLink: http://bugs.launchpad.net/bugs/1811877

commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
curtisy1 pushed a commit to curtisy1/android_kernel_nubia_nx606j that referenced this issue Mar 31, 2019
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Ante0 pushed a commit to Ante0/CarbonKernel that referenced this issue Apr 24, 2019
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Ante0 pushed a commit to Ante0/CarbonKernel that referenced this issue Apr 25, 2019
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
curtisy1 pushed a commit to curtisy1/android_kernel_nubia_nx606j that referenced this issue Jun 8, 2019
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
TheNotOnly pushed a commit to TheNotOnly/android_kernel_lge_sdm845-archived that referenced this issue Jun 24, 2019
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
TheNotOnly pushed a commit to TheNotOnly/android_kernel_lge_sdm845-archived that referenced this issue Jul 16, 2019
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
TheNotOnly pushed a commit to TheNotOnly/android_kernel_lge_sdm845-archived that referenced this issue Jul 16, 2019
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
TheNotOnly pushed a commit to TheNotOnly/android_kernel_lge_sdm845-archived that referenced this issue Jul 16, 2019
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
SyberHexen pushed a commit to SyberHexen/android_kernel_motorola_sdm632 that referenced this issue Sep 14, 2019
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
FraEgg pushed a commit to FraEgg/android_kernel_samsung_sdm670 that referenced this issue Mar 2, 2020
commit dea39ac upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
lzgmc pushed a commit to lzgmc/android_kernel_jd2019 that referenced this issue Jan 26, 2021
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
krazey pushed a commit to krazey/android_kernel_motorola_exynos9610 that referenced this issue Apr 22, 2022
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Coconutat pushed a commit to Coconutat/android_kernel_huawei_vtr_emui9_KernelSU that referenced this issue Apr 22, 2023
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
RemuruSama pushed a commit to RemuruSama/android_kernel_realme_RMX1805 that referenced this issue Jul 9, 2023
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
LinuxGuy312 pushed a commit to LinuxGuy312/android_kernel_realme_RMX1805 that referenced this issue Mar 15, 2024
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Huawei-Dev pushed a commit to Huawei-Dev/android_kernel_huawei_sydney that referenced this issue Dec 26, 2024
commit dea39aca1d7aef1e2b95b07edeacf04cc8863a2e upstream.

The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.

This patch was tested on Raspberry Pi 3B+

Link: raspberrypi/linux#2608
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <[email protected]>
Signed-off-by: Floris Bos <[email protected]>
Signed-off-by: Stefan Wahren <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants