"eth0: kevent 2 may have been dropped" is still here #309

wrobelda · 2013-06-04T20:32:03Z

As per above, I am unfortunately suffering from this error for a long time now.
No fix can make it vanish:

disabling Turbo Mode does not help at all
increasing the vm.min_free_kilobytes only postpones the error in time but it does show up eventually after triggering some CPU and network intensive processes - like torrent for that matter - and giving it some minutes to hours tops. Please note that I have a 100 Mbps Internet over Ethernet available for the PI.
as soon as the kevent error shows up, the wifi card that is connected to Pi and set in AP mode also starts to slow down and fail with "ieee80211 phy0: failed to reallocate TX buffer" every ~10 times a minute.
if the IO and CPU load continues, at some point the device will hang. And when it hangs it hangs - I cannot even use the serial console to kdb and see what happened.
both errors stop appearing if USB-load goes down, e.g. I stop using wifi (I am connected but not browsing) or close torrent application. But lets say I start copying some files to USB pendrive connected to my hub and both errors are back.

I have postponed with reporting this until I had possibility to test another Raspberry board so to make sure this is not a hardware that was malfunctioning. I also tested with different power source to no avil.

Also tried two days ago with recent fiq_split branch - same behaviour + extra mmc0 timeouts, but that's a different story.

Tested on:
me@rpi ~ $ uname -a
Linux rpi 3.6.11+ #456 PREEMPT Mon May 20 17:42:15 BST 2013 armv6l GNU/Linux
me@rpi ~ $ /opt/vc/bin/vcgencmd version
May 26 2013 21:47:02
Copyright (c) 2012 Broadcom
version 53261d4ede3ba2b660e4201aca9bd4544565a3ce (clean) (release)

...and also on:
root@rpi:/home/# uname -a
Linux rpi 3.6.11+ #462 PREEMPT Mon Jun 3 22:15:00 BST 2013 armv6l GNU/Linux
root@rpi:/home/# /opt/vc/bin/vcgencmd version
Jun 3 2013 22:37:15
Copyright (c) 2012 Broadcom
version 2bac1bb890aa545e180e3f0766fb67989b590d26 (clean) (release)

I am going to test it with some other USB Hub. But I would be surprised if the error were to be hub's fault, as the one I own is the one that's reported to be probably most compatible - Plugable 7 Port with external power amp.

wrobelda · 2013-06-05T06:22:41Z

Something interesting has shown up when I woke up this morning and checked dmesg:

[Wed Jun 5 03:08:08 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:08:08 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:08:09 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:09:10 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:10:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:10:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:10:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:10:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:10:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:10:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:10:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:10:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:10:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:10:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:10:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:10:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:10:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:10:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:10:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:10:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:10:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:10:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:10:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:10:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:10:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:10:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:11:27 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:11:28 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 03:12:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:12:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:12:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:12:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:12:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:12:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:12:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:12:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:12:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:12:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:12:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:12:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:12:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:12:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:12:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:12:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:12:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:12:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:12:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:12:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:12:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:12:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:14:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:14:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:14:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:14:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:14:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:14:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:14:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:14:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:14:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:14:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:14:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:14:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:14:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:14:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:14:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:14:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:14:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:14:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:14:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:14:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:14:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:14:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:16:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:16:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:16:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:16:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:16:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:16:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:16:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:16:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:16:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:16:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:16:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:16:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:16:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:16:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:16:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:16:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:16:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:16:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:16:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:16:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:16:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:16:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:18:35 2013] INFO: task scsi_eh_0:351 blocked for more than 120 seconds. [Wed Jun 5 03:18:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:18:35 2013] scsi_eh_0 D c0398690 0 351 2 0x00000000 [Wed Jun 5 03:18:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:18:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:18:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c02a1090>] (command_abort+0xa0/0xe8) [Wed Jun 5 03:18:35 2013] [<c02a1090>] (command_abort+0xa0/0xe8) from [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) [Wed Jun 5 03:18:35 2013] [<c0255ca0>] (scsi_error_handler+0x3a0/0x4b4) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:18:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 03:18:35 2013] INFO: task usb-storage:352 blocked for more than 120 seconds. [Wed Jun 5 03:18:35 2013] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Wed Jun 5 03:18:35 2013] usb-storage D c0398690 0 352 2 0x00000000 [Wed Jun 5 03:18:35 2013] [<c0398690>] (__schedule+0x2c4/0x5b0) from [<c03973d0>] (schedule_timeout+0x158/0x1e0) [Wed Jun 5 03:18:35 2013] [<c03973d0>] (schedule_timeout+0x158/0x1e0) from [<c0398b3c>] (wait_for_common+0xcc/0x198) [Wed Jun 5 03:18:35 2013] [<c0398b3c>] (wait_for_common+0xcc/0x198) from [<c0275538>] (usb_sg_wait+0x144/0x198) [Wed Jun 5 03:18:35 2013] [<c0275538>] (usb_sg_wait+0x144/0x198) from [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) [Wed Jun 5 03:18:35 2013] [<c02a2040>] (usb_stor_bulk_transfer_sglist.part.4+0xa4/0xf8) from [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) [Wed Jun 5 03:18:35 2013] [<c02a22e8>] (usb_stor_bulk_srb+0x48/0x50) from [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) [Wed Jun 5 03:18:35 2013] [<c02a23fc>] (usb_stor_Bulk_transport+0x10c/0x2f8) from [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) [Wed Jun 5 03:18:35 2013] [<c02a291c>] (usb_stor_invoke_transport+0x2c/0x4fc) from [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) [Wed Jun 5 03:18:35 2013] [<c02a3d5c>] (usb_stor_control_thread+0x19c/0x28c) from [<c003a7b4>] (kthread+0x88/0x94) [Wed Jun 5 03:18:35 2013] [<c003a7b4>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8) [Wed Jun 5 06:25:52 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 06:25:55 2013] ieee80211 phy0: failed to reallocate TX buffer [Wed Jun 5 06:26:06 2013] ieee80211 phy0: failed to reallocate TX buffer

popcornmix · 2013-06-05T09:49:45Z

Can you run the same test with wired ethernet? or a different wifi dongle?
Looks a bit like a memory leak (or an inappropraitely sized buffer) in the network stack that eventually brings the system down.

If you limit the download speed of the torrent application, does it run for longer? Does it run forever?

wrobelda · 2013-06-05T10:02:55Z

The wired ethernet is a WAN gateway and is connected at all times and so is wifi, which works in AP mode driven by hostapd. I am going to try with another wifi dongle, but will have to find one that supports AP well which will take some days.

The torrent application actually almost never reaches more than 10Mbps, but for the sake of tests I will limit it to e.g. 5Mbps and see.

I can't back it up with any evidence, but being myself a programmer it seems to me by intuition that this is indeed some memory leak.

popcornmix · 2013-06-05T10:07:02Z

So the Pi is acting as an wireless access point for other machines.
And the torrent app is run on another machine - not the pi?

wrobelda · 2013-06-05T10:07:48Z

Correct, Pi is wireless access point BUT the torrent app runs ON the Pi at the same time.

wrobelda · 2013-06-14T07:05:06Z

I did lower the torrent throughput to roughly 8 mbps and it still hangs after at most one day. The only time I can enjoy long uptimes is when I disable torrent at all.

This error has been here pretty much ever since the beginning of The Life of PI. Is there anything you can do about it or is this non-fixable and so I should probably go and return my PIs claiming they are not as advertised?

I am sorry to say that but this is what it looks like to me. The USB 2.0 subsystem is not usable.

popcornmix · 2013-06-14T14:38:00Z

It seems to be a memory allocation error in network stack.
Have you got 256M or 512M Pi? Running the cutdown firmware (gpu_mem=16)?

I would have thought increasing vm.min_free_kilobytes is the most likely solution.

What torrent client? It may be worth trying to find a lighter one.
Is it better if torrent client is run on another machine on network (but still using Pi as AP).

licaon-kter · 2013-06-16T14:33:52Z

wait, this is a bug now?
I get this all the time, and this Pi only has minimal network activity ( ssh connection for just one user, and sometimes some git updates )
I already have:

smc95xx.turbo_mode=N in /boot/cmdline.txt
vm.min_free_kbytes = 8192 in /etc/sysctl.d/local.conf

wrobelda · 2013-06-17T19:16:13Z

@licaon-kter had you abused the eth it a bit more, it would most likely hang after a period of time

@popcornmix
Have 512.
Not sure if that's what you mean by the cutdown firmware, but my gpu_mem is set to 128MB, although nothing is really making use of it.
Have tried with min_free_kilobytes up to 128MB (sic!), yet still would hang after pretty much same period of time (hard to tell, always around a day)
Tried with both transmission and rtorrent.
I will see if it changes if I download on a machine inside the network.

Ferroin · 2013-06-19T02:15:27Z

I think that adjusting vm.min_free_kbytes is just putting an (ugly) bandage on the problem. If the cause is indeed a memory leak, (which sounds like the most probable cause to me), the system will evenutually reach an OOM condition (this may take a long time if you have a lot of swap), raising vm.min_free_kbytes will just make the system thrash more as it approaches vm exhaustion.

nickriordan · 2013-06-23T20:31:08Z

I may be seeing this too. Have a pi running afpd and am copying around several hundred GB of files over Ethernet from my Mac to a locally attached USB WD hard drive. Everything runs smoothly for the first 50-70GB and then the Pi crashes. Only messages that I can see in any of the logs is "eth0: kevent 2 may have been dropped". Also I have disabled swap (silly idea - very slow on SD card and wears the card out quickly). 512MB Pi. Video split set to 32MB leaving around 480MB free for the Pi. Running as a server.

wrobelda · 2013-06-25T21:34:36Z

I did some more experiments. I powered the Pi with external 1.5 amps USB power source instead of from the HUB and also connected the WiFi card to Pi directly. The second USB port is still used by USB hub that in turns runs HDD only.

It seems more stable now with the uptime of 12 hours while constantly downloading some linux ISO images off the torrent. After having set the min_free_kbytes to 32M, there are also no errors in dmesg.
I will do some more testing with this configuration and then see if it breaks after I switch the WIFi to HUB. Its either crappy WiFi or the HUB's unreliable power source that cause these hangups. However given how good opinion the HUB I own has, I would expect that it's the WiFi that does not like the HUB.

wrobelda · 2013-06-27T06:10:52Z

I am afraid this bug is still here. Raspberry still hangs after as little as an hour.

ZeBadger · 2013-08-20T13:09:12Z

I have this problem with 2013-07-26-wheezy-raspbian on a 256mb pi model b

If I run the command "git clone http://github.com/raspberrypi/linux.git kernel" it works until about 14% then it just exits with no error messages (exit code 128)

I get "kevent 2 may have been dropped" in dmesg

Changing the vm.min_free_kbytes to 32768 and disabling turbo mode helps a little, but I've still not managed to complete that command.. although last time I was running X/Windows and when I tried to to "ls -la" I got an out of memory error.

licaon-kter · 2013-08-20T13:45:40Z

try to update first, maybe to the next branch via sudo BRANCH=next rpi-update as I haven't seen this for a while
also, github has failed for me in the past as well Hexxeh/rpi-firmware#7

P33M · 2014-05-05T13:01:03Z

CONFIG_COMPACTION and CONFIG_SLUB have fixed this in all but pathological cases.

fedeaf · 2014-05-24T17:45:45Z

IMO, this is not fixed at all. I keep getting this error consistently on the latest firmware with smc95xx.turbo_mode=N and vm.min_free_kbytes = 32768. This is on a headless 512MB raspi with gpu_mem = 16.
@P33M Please, reopen the ticket until it's definitely fixed.

P33M · 2014-05-24T17:50:44Z

Please post complete logs.

fedeaf · 2014-05-24T18:00:09Z

I've just taken the dmesg output here:
https://gist.github.com/fedeaf/4f1ffef2c1f09aa422c3

ATM, deluge is downloading one torrent at ~400KBps and I was streaming through samba a video with an average of 350KB/s to my laptop.

P33M · 2014-05-24T18:03:12Z

Is there any indication that the functionality when these messages are being generated is impaired? I.e. do you see skipped video or corrupt fragment downloads?

fedeaf · 2014-05-24T18:07:05Z

Yes, both are true.
Video stutter is common and downloads are usually damaged.
$ free -h
total used free shared buffers cached
Mem: 485M 468M 16M 528K 140M 234M
-/+ buffers/cache: 93M 391M
Swap: 989M 0B 989M

P33M · 2014-05-24T18:09:14Z

Can you post the output of cat /proc/buddyinfo (over several times when the network is active)?

fedeaf · 2014-05-24T19:08:47Z

Here is buddyinfo every 3 secs, while streaming a video and deluge also running:
https://gist.github.com/fedeaf/84313d2a0a29d01df2b5

Just before that I drop_caches, that's why there was some free memory at the start.

P33M · 2014-05-27T12:10:55Z

Normal downloads may end up corrupt if there is a smsc95xx driver bug. If it doesn't correctly report broken transfers in the case of a memory allocation failure, then that's an upstream bug. As we are pretty much the only hardcore users of this hardware then it'll be up to us to diagnose and fix anyway.

Torrents use chunk hash checks to ensure the validity of data transferred. Are you seeing corrupt torrent downloads?

Is the problem worse if you use smsc95xx.turbo_mode=Y?

fedeaf · 2014-05-27T14:54:13Z

I've seen corrupted torrents but that hasn't happened for sometime now, might have been related to something else.
I should test more but smsc95xx.turbo_mode seems to have no effect on this (at least for me).

Without any torrents downloading, I can happily stream a video through samba to my laptop and have no stutter even if I see some kevent2 on the logs. While I'm downloading something (even at low speeds ~ 300KB/s) streaming becomes totally impossible after some time. It seems to be related to writing to the USB hdd (ntfs).

Could this cause corruption at the time dumping data from ram to usb-hdd?

P33M · 2014-05-29T08:38:35Z

You are running out of CPU time.

NTFS file access is CPU-heavy (userspace filesystem driver)
SAMBA access is CPU-heavy
Torrenting is CPU-heavy

At the least, reformat the external drive as ext4. This will reduce the CPU time on file access substantially.

If you get corrupt torrent downloads without streaming and it appears dependent on "kevent 2" messages, then it warrants further investigation.

fedeaf · 2014-05-29T10:33:58Z

No, my raspi is OCed and has plenty of CPU available while doing that.
I also have kevent 2s even if there are no torrents or writes to the disk and I'm just streaming through samba. I'm pretty sure that the issue is not related to CPU load.
Will try to prove it somehow and hopefully getting samba out of the way too.

JamesH65 · 2014-05-29T12:59:56Z

Whilst doing the above, run 'top' in a terminal to see the CPU usage. I think that even when overclocked those operations will use a LOT of CPU.

Tuinslak · 2016-02-07T19:53:36Z

Same is happening on an RPI2. I have 2 shares mounted as NFS (ext4) and two other servers running rdiff-backup to the Pi (stores the files via NFS).

I have plenty of CPU time left, according to htop. RPI slightly overclocked.

CRTX · 2016-03-28T08:20:44Z

Why is this closed? This is still happening in my Raspberry Pi 3 with latest firmware 4.1.20-v7+ except with "kevent 0 may have been dropped"

ZeBadger · 2016-03-28T08:46:09Z

I think the problem was resolved by switching to a different SD card.

MrColdbird · 2016-07-20T08:46:42Z

Problem is still there, but I fear this issue rests elsewhere and not with the Raspberry pi.

The same issue occurs on regular desktop linux distributions (x86 / x86-64) whenever a network interface is hammered with a huge amount of concurrent connections.

Not only is my Raspberry pi affected, but so is my x86-64 based homemade linux router.
This is most likely a kernel issue of some kind, but given the huge time that has passed since its first discovery, and the fact that there still isn't a fix out, makes me assume that this issue doesn't rank high in the todo list of the linux kernel developers (which makes me sad).

syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack") Reported-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>

commit db57ccf upstream. syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [raspberrypi#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ raspberrypi#309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack") Reported-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit db57ccf upstream. syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack") Reported-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit db57ccf upstream. syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [raspberrypi#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ raspberrypi#309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e ("[NETFILTER]: Add NAT support for nf_conntrack") Reported-by: [email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…format [ Upstream commit 06d213d ] For incoming SCO connection with transparent coding format, alt setting of CVSD is getting applied instead of Transparent. Before fix: < HCI Command: Accept Synchron.. (0x01|0x0029) plen 21 #2196 [hci0] 321.342548 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #2197 [hci0] 321.343585 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Comp.. (0x2c) plen 17 #2198 [hci0] 321.351666 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) ........ > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2336 [hci0] 321.383655 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2337 [hci0] 321.389558 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2338 [hci0] 321.393615 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2339 [hci0] 321.393618 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2340 [hci0] 321.393618 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2341 [hci0] 321.397070 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2342 [hci0] 321.403622 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2343 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2344 [hci0] 321.403625 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2345 [hci0] 321.403625 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2346 [hci0] 321.404569 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2347 [hci0] 321.412091 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2348 [hci0] 321.413626 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2349 [hci0] 321.413630 > SCO Data RX: Handle 257 flags 0x00 dlen 48 #2350 [hci0] 321.413630 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #2351 [hci0] 321.419674 After fix: < HCI Command: Accept Synchronou.. (0x01|0x0029) plen 21 #309 [hci0] 49.439693 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Transmit bandwidth: 8000 Receive bandwidth: 8000 Max latency: 13 Setting: 0x0003 Input Coding: Linear Input Data Format: 1's complement Input Sample Size: 8-bit # of bits padding at MSB: 0 Air Coding Format: Transparent Data Retransmission effort: Optimize for link quality (0x02) Packet type: 0x003f HV1 may be used HV2 may be used HV3 may be used EV3 may be used EV4 may be used EV5 may be used > HCI Event: Command Status (0x0f) plen 4 #310 [hci0] 49.440308 Accept Synchronous Connection Request (0x01|0x0029) ncmd 1 Status: Success (0x00) > HCI Event: Synchronous Connect Complete (0x2c) plen 17 #311 [hci0] 49.449308 Status: Success (0x00) Handle: 257 Address: 1C:CC:D6:E2:EA:80 (Xiaomi Communications Co Ltd) Link type: eSCO (0x02) Transmission interval: 0x0c Retransmission window: 0x04 RX packet length: 60 TX packet length: 60 Air mode: Transparent (0x03) < SCO Data TX: Handle 257 flags 0x00 dlen 60 #312 [hci0] 49.450421 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #313 [hci0] 49.457927 > HCI Event: Max Slots Change (0x1b) plen 3 #314 [hci0] 49.460345 Handle: 256 Max slots: 5 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #315 [hci0] 49.465453 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #316 [hci0] 49.470502 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #317 [hci0] 49.470519 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #318 [hci0] 49.472996 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #319 [hci0] 49.480412 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #320 [hci0] 49.480492 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #321 [hci0] 49.487989 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #322 [hci0] 49.490303 < SCO Data TX: Handle 257 flags 0x00 dlen 60 #323 [hci0] 49.495496 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #324 [hci0] 49.500304 > SCO Data RX: Handle 257 flags 0x00 dlen 60 #325 [hci0] 49.500311 Signed-off-by: Kiran K <[email protected]> Signed-off-by: Lokendra Singh <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

While playing with event probes (eprobes), I tried to see what would happen if I attempted to retrieve the instruction pointer (%rip) knowing that event probes do not use pt_regs. The result was: BUG: kernel NULL pointer dereference, address: 0000000000000024 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:get_event_field.isra.0+0x0/0x50 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8 50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24 8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff916c9ea40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0 Call Trace: <TASK> get_eprobe_size+0xb4/0x640 ? __mod_node_page_state+0x72/0xc0 __eprobe_trace_func+0x59/0x1a0 ? __mod_lruvec_page_state+0xaa/0x1b0 ? page_remove_file_rmap+0x14/0x230 ? page_remove_rmap+0xda/0x170 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x18f/0x240 trace_event_raw_event_sched_wakeup_template+0x7a/0xb0 try_to_wake_up+0x260/0x4c0 __wake_up_common+0x80/0x180 __wake_up_common_lock+0x7c/0xc0 do_notify_parent+0x1c9/0x2a0 exit_notify+0x1a9/0x220 do_exit+0x2ba/0x450 do_group_exit+0x2d/0x90 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Obviously this is not the desired result. Move the testing for TPARG_FL_TPOINT which is only used for event probes to the top of the "$" variable check, as all the other variables are not used for event probes. Also add a check in the register parsing "%" to fail if an event probe is used. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>

commit 2673c60 upstream. While playing with event probes (eprobes), I tried to see what would happen if I attempted to retrieve the instruction pointer (%rip) knowing that event probes do not use pt_regs. The result was: BUG: kernel NULL pointer dereference, address: 0000000000000024 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [raspberrypi#1] PREEMPT SMP PTI CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ raspberrypi#309 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:get_event_field.isra.0+0x0/0x50 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8 50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24 8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff916c9ea40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0 Call Trace: <TASK> get_eprobe_size+0xb4/0x640 ? __mod_node_page_state+0x72/0xc0 __eprobe_trace_func+0x59/0x1a0 ? __mod_lruvec_page_state+0xaa/0x1b0 ? page_remove_file_rmap+0x14/0x230 ? page_remove_rmap+0xda/0x170 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x18f/0x240 trace_event_raw_event_sched_wakeup_template+0x7a/0xb0 try_to_wake_up+0x260/0x4c0 __wake_up_common+0x80/0x180 __wake_up_common_lock+0x7c/0xc0 do_notify_parent+0x1c9/0x2a0 exit_notify+0x1a9/0x220 do_exit+0x2ba/0x450 do_group_exit+0x2d/0x90 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Obviously this is not the desired result. Move the testing for TPARG_FL_TPOINT which is only used for event probes to the top of the "$" variable check, as all the other variables are not used for event probes. Also add a check in the register parsing "%" to fail if an event probe is used. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

If we try to emit an icmp error in response to a nonliner skb, we get BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220 Read of size 4 at addr ffff88811c50db00 by task iperf3/1691 CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309 [..] kasan_report+0x105/0x140 ip_compute_csum+0x134/0x220 iptunnel_pmtud_build_icmp+0x554/0x1020 skb_tunnel_check_pmtu+0x513/0xb80 vxlan_xmit_one+0x139e/0x2ef0 vxlan_xmit+0x1867/0x2760 dev_hard_start_xmit+0x1ee/0x4f0 br_dev_queue_push_xmit+0x4d1/0x660 [..] ip_compute_csum() cannot deal with nonlinear skbs, so avoid it. After this change, splat is gone and iperf3 is no longer stuck. Fixes: 4cb47a8 ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

commit 6a7ac3d upstream. If we try to emit an icmp error in response to a nonliner skb, we get BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220 Read of size 4 at addr ffff88811c50db00 by task iperf3/1691 CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309 [..] kasan_report+0x105/0x140 ip_compute_csum+0x134/0x220 iptunnel_pmtud_build_icmp+0x554/0x1020 skb_tunnel_check_pmtu+0x513/0xb80 vxlan_xmit_one+0x139e/0x2ef0 vxlan_xmit+0x1867/0x2760 dev_hard_start_xmit+0x1ee/0x4f0 br_dev_queue_push_xmit+0x4d1/0x660 [..] ip_compute_csum() cannot deal with nonlinear skbs, so avoid it. After this change, splat is gone and iperf3 is no longer stuck. Fixes: 4cb47a8 ("tunnels: PMTU discovery support for directly bridged IP packets") Signed-off-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

P33M closed this as completed May 5, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"eth0: kevent 2 may have been dropped" is still here #309

"eth0: kevent 2 may have been dropped" is still here #309

wrobelda commented Jun 4, 2013

wrobelda commented Jun 5, 2013

popcornmix commented Jun 5, 2013

wrobelda commented Jun 5, 2013

popcornmix commented Jun 5, 2013

wrobelda commented Jun 5, 2013

wrobelda commented Jun 14, 2013

popcornmix commented Jun 14, 2013

licaon-kter commented Jun 16, 2013

wrobelda commented Jun 17, 2013

Ferroin commented Jun 19, 2013

nickriordan commented Jun 23, 2013

wrobelda commented Jun 25, 2013

wrobelda commented Jun 27, 2013

ZeBadger commented Aug 20, 2013

licaon-kter commented Aug 20, 2013

P33M commented May 5, 2014

fedeaf commented May 24, 2014

P33M commented May 24, 2014

fedeaf commented May 24, 2014

P33M commented May 24, 2014

fedeaf commented May 24, 2014

P33M commented May 24, 2014

fedeaf commented May 24, 2014

P33M commented May 27, 2014

fedeaf commented May 27, 2014

P33M commented May 29, 2014

fedeaf commented May 29, 2014

JamesH65 commented May 29, 2014

Tuinslak commented Feb 7, 2016

CRTX commented Mar 28, 2016

ZeBadger commented Mar 28, 2016

MrColdbird commented Jul 20, 2016

"eth0: kevent 2 may have been dropped" is still here #309

"eth0: kevent 2 may have been dropped" is still here #309

Comments

wrobelda commented Jun 4, 2013

wrobelda commented Jun 5, 2013

popcornmix commented Jun 5, 2013

wrobelda commented Jun 5, 2013

popcornmix commented Jun 5, 2013

wrobelda commented Jun 5, 2013

wrobelda commented Jun 14, 2013

popcornmix commented Jun 14, 2013

licaon-kter commented Jun 16, 2013

wrobelda commented Jun 17, 2013

Ferroin commented Jun 19, 2013

nickriordan commented Jun 23, 2013

wrobelda commented Jun 25, 2013

wrobelda commented Jun 27, 2013

ZeBadger commented Aug 20, 2013

licaon-kter commented Aug 20, 2013

P33M commented May 5, 2014

fedeaf commented May 24, 2014

P33M commented May 24, 2014

fedeaf commented May 24, 2014

P33M commented May 24, 2014

fedeaf commented May 24, 2014

P33M commented May 24, 2014

fedeaf commented May 24, 2014

P33M commented May 27, 2014

fedeaf commented May 27, 2014

P33M commented May 29, 2014

fedeaf commented May 29, 2014

JamesH65 commented May 29, 2014

Tuinslak commented Feb 7, 2016

CRTX commented Mar 28, 2016

ZeBadger commented Mar 28, 2016

MrColdbird commented Jul 20, 2016