-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pi drops "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0" on USB access? #470
Comments
well , serial usb converter
Thus my question how you power the external device ? i explain my question here more : i use sutch usb <> rsrs232 myself at all of my pis, and without extra powersource this was NOT stable enoth to serve an heartbeat3 <>heartbeat3 realtime data transmision. |
currently the converter is powered directly by the pi. But your idea is interesting; i'll try it out using an additional active usb-hub. |
Stabilty are required ; - if you ran your PI in server mode ; especaly with external Hardware. The majorpoint that i try contribute still here are stabilty and required featgers for RPi SERVER mode. |
ok, i've tested it - i connected a 4-port, separately powered USB 2.0 hub and the issue still persist - the messages Damn... I've also found a very very similar issue here - http://www.raspberrypi.org/phpBB3/viewtopic.php?t=61244&p=455950 |
What is the interval between error messages? Values posted so far indicate >5,000 seconds Is there any negative effect on the system or network connections? |
Well, the messages appear only around the time when the temperature is gathered. So, the intervals are
pretty random at the beginning of the every minute. And - no, there are no any negative side effects (at least i do not see any) nor on the system, nor with the network i my constellation. |
Sutch happen on my PI when i have heayly SD card load + USB2 + network with smsc95xx . can allmost ignored as tcp just resend the tcp fragment(s) .. |
Hmm,, i my case there is almost no network activity. As to the "heavily loaded SD" / USB - in my case, at least, it's absolutely not the case. |
I'm having same issues with my RPi. I also have a FT232 USB-to-Serial adapter connected. I'm using 3.10.25 kernel |
Same problem here:
|
Also same problem I got after adding a FT232R USB to serial converter...
|
I had the same problem and solved it by reducing the USB speed to version 1.1. This is sufficient for most applications and my Raspi is running with WIFI USB and USB-RS232 adapter for 15 days without extra hub or power!! Add both commands below to your /boot/cmdline.txt file: smsc95xx.turbo_mode=N dwc_otg.speed=1 When this does not help sufficient try an old SD card. I also found problems with some |
Interesting, will try it out. Thx. |
👍 the only question now is whether the nic performance will be also affected.. According to http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=8010&start=75 - yes, it will decrease. I'll test and post... |
I tried adding But afterwards my USB webcam didn't work, and I got these error messages in kern.log: Mar 7 16:20:19 davepi kernel: [ 58.562682] INFO:: schedule_periodic: Insufficient periodic bandwidth for periodic transfer. so I don't think that's a workaround for everyone... |
It seems that your video stream is more than the USB1.1 speed Released in January 1996, USB 1.0 specified data rates of 1.5 Mbit/s ( Check your USB devices that they are all USB 2.0 compatible. Also check Try with only your USB webcam 2014-03-07 17:30 GMT+01:00 davebiffuk [email protected]:
|
Please retry (without dwc_otg.speed=1) with BRANCH=next firmware. See http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=70437 for information. |
Thanks P33M. I updated to BRANCH=next firmware last night and the "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0" messages have completely ceased. The Pi has been up for 9 hours now. So far so good! For completeness, this is:
with this cmdline.txt:
Thanks again! Dave |
Marking as fixed in BRANCH=next |
[ Upstream commit a7ac783 ] Lockdep complains about rtw_free_assoc_resources() taking the sta_hash_lock followed by it calling rtw_free_stainfo() which takes xmitpriv->lock. While the rtl8723bs_xmit_thread takes the sta_hash_lock while already holding the xmitpriv->lock: [ 103.849756] ====================================================== [ 103.849761] WARNING: possible circular locking dependency detected [ 103.849767] 5.15.0-rc1+ #470 Tainted: G C E [ 103.849773] ------------------------------------------------------ [ 103.849776] wpa_supplicant/695 is trying to acquire lock: [ 103.849781] ffffa5d0c0562b00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_free_stainfo+0x8a/0x510 [r8723bs] [ 103.849840] but task is already holding lock: [ 103.849843] ffffa5d0c05636a8 (&pstapriv->sta_hash_lock){+.-.}-{2:2}, at: rtw_free_assoc_resources+0x48/0x110 [r8723bs] [ 103.849881] which lock already depends on the new lock. [ 103.849884] the existing dependency chain (in reverse order) is: [ 103.849887] -> #1 (&pstapriv->sta_hash_lock){+.-.}-{2:2}: [ 103.849898] _raw_spin_lock_bh+0x34/0x40 [ 103.849913] rtw_get_stainfo+0x93/0x110 [r8723bs] [ 103.849948] rtw_make_wlanhdr+0x14a/0x270 [r8723bs] [ 103.849983] rtw_xmitframe_coalesce+0x5c/0x6c0 [r8723bs] [ 103.850019] rtl8723bs_xmit_thread+0x4ac/0x620 [r8723bs] [ 103.850050] kthread+0x143/0x160 [ 103.850058] ret_from_fork+0x22/0x30 [ 103.850067] -> #0 (&pxmitpriv->lock){+.-.}-{2:2}: [ 103.850077] __lock_acquire+0x1158/0x1de0 [ 103.850084] lock_acquire+0xb5/0x2b0 [ 103.850090] _raw_spin_lock_bh+0x34/0x40 [ 103.850095] rtw_free_stainfo+0x8a/0x510 [r8723bs] [ 103.850130] rtw_free_assoc_resources+0x53/0x110 [r8723bs] [ 103.850159] PHY_IQCalibrate_8723B+0x122b/0x36a0 [r8723bs] [ 103.850189] cfg80211_disconnect+0x173/0x320 [cfg80211] [ 103.850331] nl80211_disconnect+0x6e/0xb0 [cfg80211] [ 103.850422] genl_family_rcv_msg_doit+0xcd/0x110 [ 103.850430] genl_rcv_msg+0xce/0x1c0 [ 103.850435] netlink_rcv_skb+0x50/0xf0 [ 103.850441] genl_rcv+0x24/0x40 [ 103.850446] netlink_unicast+0x16d/0x230 [ 103.850452] netlink_sendmsg+0x22b/0x450 [ 103.850457] sock_sendmsg+0x5e/0x60 [ 103.850465] ____sys_sendmsg+0x22f/0x270 [ 103.850472] ___sys_sendmsg+0x81/0xc0 [ 103.850479] __sys_sendmsg+0x49/0x80 [ 103.850485] do_syscall_64+0x3b/0x90 [ 103.850493] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 103.850500] other info that might help us debug this: [ 103.850504] Possible unsafe locking scenario: [ 103.850507] CPU0 CPU1 [ 103.850510] ---- ---- [ 103.850512] lock(&pstapriv->sta_hash_lock); [ 103.850518] lock(&pxmitpriv->lock); [ 103.850524] lock(&pstapriv->sta_hash_lock); [ 103.850530] lock(&pxmitpriv->lock); [ 103.850535] *** DEADLOCK *** Push the taking of sta_hash_lock down into rtw_free_stainfo(), where the critical section is, this allows taking the lock after rtw_free_stainfo() has released pxmitpriv->lock. This requires changing rtw_free_all_stainfo() so that it does its freeing in 2 steps, first moving all stainfo-s to free to a local list while holding the sta_hash_lock and then walking that list to call rtw_free_stainfo() on them without holding the sta_hash_lock. Pushing the taking of sta_hash_lock down into rtw_free_stainfo(), also fixes a whole bunch of callers of rtw_free_stainfo() which were not holding that lock even though they should. Note that this also fixes the deadlock from the "remove possible deadlock when disconnect" patch in a different way. But the changes from that patch offer a nice locking cleanup regardless. Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
[ Upstream commit bdc1bbd ] The assoc_timer takes the pmlmepriv->lock and various functions which take the pmlmepriv->scanned_queue.lock first take the pmlmepriv->lock, this means that we cannot have code which waits for the timer (timer_del_sync) while holding the pmlmepriv->scanned_queue.lock to avoid a triangle deadlock: [ 363.139361] ====================================================== [ 363.139377] WARNING: possible circular locking dependency detected [ 363.139396] 5.15.0-rc1+ #470 Tainted: G C E [ 363.139413] ------------------------------------------------------ [ 363.139424] RTW_CMD_THREAD/2466 is trying to acquire lock: [ 363.139441] ffffbacd00699038 (&pmlmepriv->lock){+.-.}-{2:2}, at: _rtw_join_timeout_handler+0x3c/0x160 [r8723bs] [ 363.139598] but task is already holding lock: [ 363.139610] ffffbacd00128ea0 ((&pmlmepriv->assoc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x260 [ 363.139673] which lock already depends on the new lock. [ 363.139684] the existing dependency chain (in reverse order) is: [ 363.139696] -> #2 ((&pmlmepriv->assoc_timer)){+.-.}-{0:0}: [ 363.139734] del_timer_sync+0x59/0x100 [ 363.139762] rtw_joinbss_event_prehandle+0x342/0x640 [r8723bs] [ 363.139870] report_join_res+0xdf/0x110 [r8723bs] [ 363.139980] OnAssocRsp+0x17a/0x200 [r8723bs] [ 363.140092] rtw_recv_entry+0x190/0x1120 [r8723bs] [ 363.140209] rtl8723b_process_phy_info+0x3f9/0x750 [r8723bs] [ 363.140318] tasklet_action_common.constprop.0+0xe8/0x110 [ 363.140345] __do_softirq+0xde/0x485 [ 363.140372] __irq_exit_rcu+0xd0/0x100 [ 363.140393] irq_exit_rcu+0xa/0x20 [ 363.140413] common_interrupt+0x83/0xa0 [ 363.140440] asm_common_interrupt+0x1e/0x40 [ 363.140463] finish_task_switch.isra.0+0x157/0x3d0 [ 363.140492] __schedule+0x447/0x1880 [ 363.140516] schedule+0x59/0xc0 [ 363.140537] smpboot_thread_fn+0x161/0x1c0 [ 363.140565] kthread+0x143/0x160 [ 363.140585] ret_from_fork+0x22/0x30 [ 363.140614] -> #1 (&pmlmepriv->scanned_queue.lock){+.-.}-{2:2}: [ 363.140653] _raw_spin_lock_bh+0x34/0x40 [ 363.140675] rtw_free_network_queue+0x31/0x80 [r8723bs] [ 363.140776] rtw_sitesurvey_cmd+0x79/0x1e0 [r8723bs] [ 363.140869] rtw_cfg80211_surveydone_event_callback+0x3cf/0x470 [r8723bs] [ 363.140973] rdev_scan+0x42/0x1a0 [cfg80211] [ 363.141307] nl80211_trigger_scan+0x566/0x660 [cfg80211] [ 363.141635] genl_family_rcv_msg_doit+0xcd/0x110 [ 363.141661] genl_rcv_msg+0xce/0x1c0 [ 363.141680] netlink_rcv_skb+0x50/0xf0 [ 363.141699] genl_rcv+0x24/0x40 [ 363.141717] netlink_unicast+0x16d/0x230 [ 363.141736] netlink_sendmsg+0x22b/0x450 [ 363.141755] sock_sendmsg+0x5e/0x60 [ 363.141781] ____sys_sendmsg+0x22f/0x270 [ 363.141803] ___sys_sendmsg+0x81/0xc0 [ 363.141828] __sys_sendmsg+0x49/0x80 [ 363.141851] do_syscall_64+0x3b/0x90 [ 363.141873] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 363.141895] -> #0 (&pmlmepriv->lock){+.-.}-{2:2}: [ 363.141930] __lock_acquire+0x1158/0x1de0 [ 363.141954] lock_acquire+0xb5/0x2b0 [ 363.141974] _raw_spin_lock_bh+0x34/0x40 [ 363.141993] _rtw_join_timeout_handler+0x3c/0x160 [r8723bs] [ 363.142097] call_timer_fn+0x94/0x260 [ 363.142122] __run_timers.part.0+0x1bf/0x290 [ 363.142147] run_timer_softirq+0x26/0x50 [ 363.142171] __do_softirq+0xde/0x485 [ 363.142193] __irq_exit_rcu+0xd0/0x100 [ 363.142215] irq_exit_rcu+0xa/0x20 [ 363.142235] sysvec_apic_timer_interrupt+0x72/0x90 [ 363.142260] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 363.142283] __module_address.part.0+0x0/0xd0 [ 363.142309] is_module_address+0x25/0x40 [ 363.142334] static_obj+0x4f/0x60 [ 363.142361] lockdep_init_map_type+0x47/0x220 [ 363.142382] __init_swait_queue_head+0x45/0x60 [ 363.142408] mmc_wait_for_req+0x4a/0xc0 [mmc_core] [ 363.142504] mmc_wait_for_cmd+0x55/0x70 [mmc_core] [ 363.142592] mmc_io_rw_direct+0x75/0xe0 [mmc_core] [ 363.142691] sdio_writeb+0x2e/0x50 [mmc_core] [ 363.142788] _sd_cmd52_write+0x62/0x80 [r8723bs] [ 363.142885] sd_cmd52_write+0x6c/0xb0 [r8723bs] [ 363.142981] rtl8723bs_set_hal_ops+0x982/0x9b0 [r8723bs] [ 363.143089] rtw_write16+0x1e/0x30 [r8723bs] [ 363.143184] SetHwReg8723B+0xcc9/0xd30 [r8723bs] [ 363.143294] mlmeext_joinbss_event_callback+0x17a/0x1a0 [r8723bs] [ 363.143405] rtw_joinbss_event_callback+0x11/0x20 [r8723bs] [ 363.143507] mlme_evt_hdl+0x4d/0x70 [r8723bs] [ 363.143620] rtw_cmd_thread+0x168/0x3c0 [r8723bs] [ 363.143712] kthread+0x143/0x160 [ 363.143732] ret_from_fork+0x22/0x30 [ 363.143757] other info that might help us debug this: [ 363.143768] Chain exists of: &pmlmepriv->lock --> &pmlmepriv->scanned_queue.lock --> (&pmlmepriv->assoc_timer) [ 363.143809] Possible unsafe locking scenario: [ 363.143819] CPU0 CPU1 [ 363.143831] ---- ---- [ 363.143841] lock((&pmlmepriv->assoc_timer)); [ 363.143862] lock(&pmlmepriv->scanned_queue.lock); [ 363.143882] lock((&pmlmepriv->assoc_timer)); [ 363.143902] lock(&pmlmepriv->lock); [ 363.143921] *** DEADLOCK *** Make rtw_joinbss_event_prehandle() release the scanned_queue.lock before it deletes the timer to avoid this (it is still holding pmlmepriv->lock protecting against racing the timer). Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Jordy says: " In the xsk_map_delete_elem function an unsigned integer (map->max_entries) is compared with a user-controlled signed integer (k). Due to implicit type conversion, a large unsigned value for map->max_entries can bypass the intended bounds check: if (k >= map->max_entries) return -EINVAL; This allows k to hold a negative value (between -2147483648 and -2), which is then used as an array index in m->xsk_map[k], which results in an out-of-bounds access. spin_lock_bh(&m->lock); map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry old_xs = unrcu_pointer(xchg(map_entry, NULL)); // Oob write if (old_xs) xsk_map_sock_delete(old_xs, map_entry); spin_unlock_bh(&m->lock); The xchg operation can then be used to cause an out-of-bounds write. Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead to further memory corruption. " It indeed results in following splat: [76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108 [76612.904330] #PF: supervisor write access in kernel mode [76612.909639] #PF: error_code(0x0002) - not-present page [76612.914855] PGD 0 P4D 0 [76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP [76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470 [76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60 [76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31 [76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246 [76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000 [76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000 [76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007 [76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8 [76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0 [76613.005303] FS: 00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000 [76613.013517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0 [76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [76613.041086] PKRU: 55555554 [76613.043842] Call Trace: [76613.046331] <TASK> [76613.048468] ? __die+0x20/0x60 [76613.051581] ? page_fault_oops+0x15a/0x450 [76613.055747] ? search_extable+0x22/0x30 [76613.059649] ? search_bpf_extables+0x5f/0x80 [76613.063988] ? exc_page_fault+0xa9/0x140 [76613.067975] ? asm_exc_page_fault+0x22/0x30 [76613.072229] ? xsk_map_delete_elem+0x2d/0x60 [76613.076573] ? xsk_map_delete_elem+0x23/0x60 [76613.080914] __sys_bpf+0x19b7/0x23c0 [76613.084555] __x64_sys_bpf+0x1a/0x20 [76613.088194] do_syscall_64+0x37/0xb0 [76613.091832] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [76613.096962] RIP: 0033:0x7f80b6d1e88d [76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d [76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003 [76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000 [76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8 [76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040 [76613.188129] </TASK> Fix this by simply changing key type from int to u32. Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
commit 32cd3db upstream. Jordy says: " In the xsk_map_delete_elem function an unsigned integer (map->max_entries) is compared with a user-controlled signed integer (k). Due to implicit type conversion, a large unsigned value for map->max_entries can bypass the intended bounds check: if (k >= map->max_entries) return -EINVAL; This allows k to hold a negative value (between -2147483648 and -2), which is then used as an array index in m->xsk_map[k], which results in an out-of-bounds access. spin_lock_bh(&m->lock); map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry old_xs = unrcu_pointer(xchg(map_entry, NULL)); // Oob write if (old_xs) xsk_map_sock_delete(old_xs, map_entry); spin_unlock_bh(&m->lock); The xchg operation can then be used to cause an out-of-bounds write. Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead to further memory corruption. " It indeed results in following splat: [76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108 [76612.904330] #PF: supervisor write access in kernel mode [76612.909639] #PF: error_code(0x0002) - not-present page [76612.914855] PGD 0 P4D 0 [76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP [76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470 [76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60 [76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31 [76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246 [76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000 [76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000 [76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007 [76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8 [76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0 [76613.005303] FS: 00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000 [76613.013517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0 [76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [76613.041086] PKRU: 55555554 [76613.043842] Call Trace: [76613.046331] <TASK> [76613.048468] ? __die+0x20/0x60 [76613.051581] ? page_fault_oops+0x15a/0x450 [76613.055747] ? search_extable+0x22/0x30 [76613.059649] ? search_bpf_extables+0x5f/0x80 [76613.063988] ? exc_page_fault+0xa9/0x140 [76613.067975] ? asm_exc_page_fault+0x22/0x30 [76613.072229] ? xsk_map_delete_elem+0x2d/0x60 [76613.076573] ? xsk_map_delete_elem+0x23/0x60 [76613.080914] __sys_bpf+0x19b7/0x23c0 [76613.084555] __x64_sys_bpf+0x1a/0x20 [76613.088194] do_syscall_64+0x37/0xb0 [76613.091832] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [76613.096962] RIP: 0033:0x7f80b6d1e88d [76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d [76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003 [76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000 [76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8 [76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040 [76613.188129] </TASK> Fix this by simply changing key type from int to u32. Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 32cd3db upstream. Jordy says: " In the xsk_map_delete_elem function an unsigned integer (map->max_entries) is compared with a user-controlled signed integer (k). Due to implicit type conversion, a large unsigned value for map->max_entries can bypass the intended bounds check: if (k >= map->max_entries) return -EINVAL; This allows k to hold a negative value (between -2147483648 and -2), which is then used as an array index in m->xsk_map[k], which results in an out-of-bounds access. spin_lock_bh(&m->lock); map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry old_xs = unrcu_pointer(xchg(map_entry, NULL)); // Oob write if (old_xs) xsk_map_sock_delete(old_xs, map_entry); spin_unlock_bh(&m->lock); The xchg operation can then be used to cause an out-of-bounds write. Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead to further memory corruption. " It indeed results in following splat: [76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108 [76612.904330] #PF: supervisor write access in kernel mode [76612.909639] #PF: error_code(0x0002) - not-present page [76612.914855] PGD 0 P4D 0 [76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP [76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470 [76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60 [76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31 [76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246 [76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000 [76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000 [76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007 [76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8 [76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0 [76613.005303] FS: 00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000 [76613.013517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0 [76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [76613.041086] PKRU: 55555554 [76613.043842] Call Trace: [76613.046331] <TASK> [76613.048468] ? __die+0x20/0x60 [76613.051581] ? page_fault_oops+0x15a/0x450 [76613.055747] ? search_extable+0x22/0x30 [76613.059649] ? search_bpf_extables+0x5f/0x80 [76613.063988] ? exc_page_fault+0xa9/0x140 [76613.067975] ? asm_exc_page_fault+0x22/0x30 [76613.072229] ? xsk_map_delete_elem+0x2d/0x60 [76613.076573] ? xsk_map_delete_elem+0x23/0x60 [76613.080914] __sys_bpf+0x19b7/0x23c0 [76613.084555] __x64_sys_bpf+0x1a/0x20 [76613.088194] do_syscall_64+0x37/0xb0 [76613.091832] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [76613.096962] RIP: 0033:0x7f80b6d1e88d [76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d [76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003 [76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000 [76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8 [76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040 [76613.188129] </TASK> Fix this by simply changing key type from int to u32. Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Hi,
following configuration: an USB-to-Serial converter is connected to the pi.
A 1-wire sensor network (currently 9 temp. sensors) is connected to the converter.
Every minute a cron-scheduled job launches a script that pull's every's sensors data, which takes about 10-15 seconds overall.
And then, when the usb-access happen, then following error occurs:
Usual only once or twice, but sometimes the message occurs up to 3 times in a row(with e.g. 1 second interval).
Beside that error, i see that - i know its strange but it's so - the amount of dropped errors on the eth0 increases also by some value of 1 to 3 on each run..
PI's data: Model A, a freshly installed && dist-upgraded install (rasbian image "2013-09-25-wheezy-raspbian.img"), kernel
The error is reproducible every time.
The USB-to-serial adapter is a
Many thanks!
The text was updated successfully, but these errors were encountered: