Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi drops "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0" on USB access? #470

Closed
v2b1n opened this issue Dec 22, 2013 · 19 comments
Closed
Assignees

Comments

@v2b1n
Copy link

v2b1n commented Dec 22, 2013

Hi,

following configuration: an USB-to-Serial converter is connected to the pi.
A 1-wire sensor network (currently 9 temp. sensors) is connected to the converter.

Every minute a cron-scheduled job launches a script that pull's every's sensors data, which takes about 10-15 seconds overall.

And then, when the usb-access happen, then following error occurs:

Dec 22 17:19:07 pi1 kernel: [10128.035877] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0

Usual only once or twice, but sometimes the message occurs up to 3 times in a row(with e.g. 1 second interval).

Beside that error, i see that - i know its strange but it's so - the amount of dropped errors on the eth0 increases also by some value of 1 to 3 on each run..

root@pi1:/usr/local/bin/rrd# ifconfig 
eth0      Link encap:Ethernet  HWaddr b8:27:eb:ba:c4:1a  
          inet addr:192.168.1.80  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23955 errors:0 dropped:1076 overruns:0 frame:0
          TX packets:12277 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:3312814 (3.1 MiB)  TX bytes:5088796 (4.8 MiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:800 (800.0 B)  TX bytes:800 (800.0 B)

root@pi1:/usr/local/bin/rrd# 

PI's data: Model A, a freshly installed && dist-upgraded install (rasbian image "2013-09-25-wheezy-raspbian.img"), kernel

root@pi1:/usr/local/bin/rrd# uname -a
Linux pi1 3.10.24+ #614 PREEMPT Thu Dec 19 20:38:42 GMT 2013 armv6l GNU/Linux
root@pi1:/usr/local/bin/rrd# 

The error is reproducible every time.

The USB-to-serial adapter is a

Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC

Many thanks!

@ghost ghost assigned P33M Dec 23, 2013
@hvenzke
Copy link

hvenzke commented Jan 4, 2014

well , serial usb converter

  • addional hardware may consume some energy

Thus my question how you power the external device ?
may you used an powerd usb2 hub ?

i explain my question here more : i use sutch usb <> rsrs232 myself at all of my pis, and without extra powersource this was NOT stable enoth to serve an heartbeat3 <>heartbeat3 realtime data transmision.
seen eth and sdcard failtuers as well due overcomsume what the PI´s powersource lines via uSB can handel without modification to the PI its self.

@v2b1n
Copy link
Author

v2b1n commented Jan 5, 2014

currently the converter is powered directly by the pi.

But your idea is interesting; i'll try it out using an additional active usb-hub.

@hvenzke
Copy link

hvenzke commented Jan 5, 2014

Stabilty are required ; - if you ran your PI in server mode ; especaly with external Hardware.

The majorpoint that i try contribute still here are stabilty and required featgers for RPi SERVER mode.
thanks to i.e popcornmix and some others where i interacted here.

@v2b1n
Copy link
Author

v2b1n commented Jan 22, 2014

ok, i've tested it - i connected a 4-port, separately powered USB 2.0 hub and the issue still persist - the messages
still come...

Damn...

I've also found a very very similar issue here - http://www.raspberrypi.org/phpBB3/viewtopic.php?t=61244&p=455950

@P33M
Copy link
Contributor

P33M commented Jan 22, 2014

What is the interval between error messages? Values posted so far indicate >5,000 seconds

Is there any negative effect on the system or network connections?

@v2b1n
Copy link
Author

v2b1n commented Jan 24, 2014

Well, the messages appear only around the time when the temperature is gathered.
That happens every minute scheduled by cron.

So, the intervals are

root@pi1:~# tail /var/log/syslog -n 50
Jan 24 08:07:05 pi1 kernel: [131652.577887] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:08:04 pi1 kernel: [131711.465887] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:08:09 pi1 kernel: [131715.882482] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:08:12 pi1 kernel: [131718.818884] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:09:02 pi1 kernel: [131769.345751] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:09:08 pi1 kernel: [131775.234554] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:10:06 pi1 kernel: [131832.674347] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:11:03 pi1 kernel: [131890.106151] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:11:09 pi1 kernel: [131895.986948] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:12:04 pi1 kernel: [131950.930416] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:12:05 pi1 kernel: [131952.386613] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:13:03 pi1 kernel: [132009.810414] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:13:06 pi1 kernel: [132012.746813] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:13:10 pi1 kernel: [132017.171418] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:14:05 pi1 kernel: [132072.122879] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:15:08 pi1 kernel: [132135.467490] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:16:07 pi1 kernel: [132194.315469] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:18:05 pi1 kernel: [132312.139492] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:18:09 pi1 kernel: [132316.556095] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:19:08 pi1 kernel: [132375.436089] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:20:06 pi1 kernel: [132432.851884] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:21:07 pi1 kernel: [132493.676159] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:21:11 pi1 kernel: [132498.092751] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:22:02 pi1 kernel: [132549.627756] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:23:03 pi1 kernel: [132609.979959] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:23:09 pi1 kernel: [132615.868752] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:24:08 pi1 kernel: [132675.212820] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:25:03 pi1 kernel: [132729.692216] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:25:04 pi1 kernel: [132731.164416] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:25:05 pi1 kernel: [132732.628619] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:25:10 pi1 kernel: [132737.037213] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:26:12 pi1 kernel: [132798.853618] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:27:02 pi1 kernel: [132849.396481] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:27:04 pi1 kernel: [132850.868685] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:28:07 pi1 kernel: [132914.141280] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:29:03 pi1 kernel: [132970.148894] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:29:04 pi1 kernel: [132971.629084] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:30:02 pi1 kernel: [133029.572956] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:30:05 pi1 kernel: [133032.509356] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:30:08 pi1 kernel: [133035.445756] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:31:04 pi1 kernel: [133090.933295] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:31:10 pi1 kernel: [133096.830101] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:33:02 pi1 kernel: [133209.205369] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:33:09 pi1 kernel: [133216.558360] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:33:11 pi1 kernel: [133218.022564] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:35:03 pi1 kernel: [133329.909765] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:36:07 pi1 kernel: [133393.710436] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:37:04 pi1 kernel: [133451.142235] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:37:10 pi1 kernel: [133457.023035] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
Jan 24 08:37:11 pi1 kernel: [133458.495227] smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0
root@pi1:~# 

pretty random at the beginning of the every minute.

And - no, there are no any negative side effects (at least i do not see any) nor on the system, nor with the network i my constellation.

@hvenzke
Copy link

hvenzke commented Jan 24, 2014

Sutch happen on my PI when i have heayly SD card load + USB2 + network with smsc95xx .
-> issue with interupt sheduler.
This issue exist since +1J - with any kernels .

can allmost ignored as tcp just resend the tcp fragment(s) ..

@v2b1n
Copy link
Author

v2b1n commented Jan 24, 2014

Hmm,, i my case there is almost no network activity.

As to the "heavily loaded SD" / USB - in my case, at least, it's absolutely not the case.
The values that are gathered are written into appropriate RRDs, but these are sequential and very small
writes...

@wouterbouvy
Copy link

I'm having same issues with my RPi. I also have a FT232 USB-to-Serial adapter connected.
I'm also having trouble to interpret correct data through the USB-to-Serial adapter sometimes. It seems to miss some bytes once in a while.

I'm using 3.10.25 kernel

@The-Drake
Copy link

Same problem here:

pi@raspberrypi ~ $ /opt/vc/bin/vcgencmd version
Feb 21 2014 17:44:46
Copyright (c) 2012 Broadcom
version 67f299c3a92a793fb8b8efcaf94f7f553153f89a (clean) (release)

pi@raspberrypi ~ $ uname -a
Linux raspberrypi 3.10.30+ #642 PREEMPT Fri Feb 21 17:33:42 GMT 2014 armv6l GNU/Linux

pi@raspberrypi ~ $ lsusb
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC

pi@raspberrypi ~ $ usb-devices

T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480 MxCh= 1
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0002 Rev=03.10
S:  Manufacturer=Linux 3.10.30+ dwc_otg_hcd
S:  Product=DWC OTG Controller
S:  SerialNumber=bcm2708_usb
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480 MxCh= 3
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=9512 Rev=02.00
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub

T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  3 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=ec00 Rev=02.00
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=ff Driver=smsc95xx

T:  Bus=01 Lev=02 Prnt=02 Port=02 Cnt=02 Dev#=  4 Spd=12  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0403 ProdID=6001 Rev=06.00
S:  Manufacturer=FTDI
S:  Product=FT232R USB UART
S:  SerialNumber=XXXXXXX
C:  #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=90mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=ftdi_sio

@cadkiwi
Copy link

cadkiwi commented Feb 27, 2014

Also same problem I got after adding a FT232R USB to serial converter...
After many of these error messages the RPi disconnects from the LAN and has to be power cycled to come back to life... Very bad situation.

root@raspi:~/dosfstools# /opt/vc/bin/vcgencmd version
Jan  6 2014 21:19:57
Copyright (c) 2012 Broadcom
version b00bb3ae73bd2799df0e938b7a5f17f45303fb53 (clean) (release)

root@raspi:~/dosfstools# uname -a
Linux raspi 3.10.25+ #622 PREEMPT Fri Jan 3 18:41:00 GMT 2014 armv6l GNU/Linux

root@raspi:~/dosfstools# lsusb
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC

root@raspi:~/dosfstools# usb-devices
T:  Bus=01 Lev=00 Prnt=00 Port=00 Cnt=00 Dev#=  1 Spd=480 MxCh= 1
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=1d6b ProdID=0002 Rev=03.10
S:  Manufacturer=Linux 3.10.25+ dwc_otg_hcd
S:  Product=DWC OTG Controller
S:  SerialNumber=bcm2708_usb
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=0mA
I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub

T:  Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=480 MxCh= 3
D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=02 MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=9512 Rev=02.00
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 1 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=02 Driver=hub

T:  Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#=  3 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=ff(vend.) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=0424 ProdID=ec00 Rev=02.00
C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=2mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=ff Driver=smsc95xx

T:  Bus=01 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#=  4 Spd=12  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs=  1
P:  Vendor=0403 ProdID=6001 Rev=06.00
S:  Manufacturer=FTDI
S:  Product=FT232R USB UART
S:  SerialNumber=A702J95H
C:  #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=90mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=ftdi_sio

@Menox10
Copy link

Menox10 commented Mar 1, 2014

I had the same problem and solved it by reducing the USB speed to version 1.1. This is sufficient for most applications and my Raspi is running with WIFI USB and USB-RS232 adapter for 15 days without extra hub or power!!

Add both commands below to your /boot/cmdline.txt file:

smsc95xx.turbo_mode=N dwc_otg.speed=1

When this does not help sufficient try an old SD card. I also found problems with some
modern (3.3V) SD cards.

@v2b1n
Copy link
Author

v2b1n commented Mar 7, 2014

Interesting, will try it out. Thx.

@v2b1n
Copy link
Author

v2b1n commented Mar 7, 2014

👍
Looks promising. At least no error messages :-)

the only question now is whether the nic performance will be also affected..

According to http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=8010&start=75 -

yes, it will decrease.

I'll test and post...

@v2b1n v2b1n closed this as completed Mar 7, 2014
@v2b1n v2b1n reopened this Mar 7, 2014
@davebiffuk
Copy link

I tried adding
smsc95xx.turbo_mode=N dwc_otg.speed=1
to /boot/cmdline.txt

But afterwards my USB webcam didn't work, and I got these error messages in kern.log:

Mar 7 16:20:19 davepi kernel: [ 58.562682] INFO:: schedule_periodic: Insufficient periodic bandwidth for periodic transfer.
Mar 7 16:20:19 davepi kernel: [ 58.562682]
Mar 7 16:20:19 davepi kernel: [ 58.562730] ERROR::dwc_otg_hcd_urb_enqueue:554: DWC OTG HCD URB Enqueue failed adding QTD. Error status -1
Mar 7 16:20:19 davepi kernel: [ 58.562730]
Mar 7 16:20:19 davepi kernel: [ 58.562753] uvcvideo: Failed to submit URB 0 (-1).

so I don't think that's a workaround for everyone...

@Menox10
Copy link

Menox10 commented Mar 7, 2014

It seems that your video stream is more than the USB1.1 speed

Released in January 1996, USB 1.0 specified data rates of 1.5 Mbit/s (
Low-Bandwidth) and 12 Mbit/s (Full-Bandwidth).

Check your USB devices that they are all USB 2.0 compatible. Also check
your adapter (cable). Some chinese
adapters have high impedance DC cable. With 1-2 Ampere the voltage drop
could be to high.

Try with only your USB webcam

2014-03-07 17:30 GMT+01:00 davebiffuk [email protected]:

I tried adding
smsc95xx.turbo_mode=N dwc_otg.speed=1
to /boot/cmdline.txt

But afterwards my USB webcam didn't work, and I got these error messages
in kern.log:

Mar 7 16:20:19 davepi kernel: [ 58.562682] INFO:: schedule_periodic:
Insufficient periodic bandwidth for periodic transfer.
Mar 7 16:20:19 davepi kernel: [ 58.562682]
Mar 7 16:20:19 davepi kernel: [ 58.562730]
ERROR::dwc_otg_hcd_urb_enqueue:554: DWC OTG HCD URB Enqueue failed adding
QTD. Error status -1
Mar 7 16:20:19 davepi kernel: [ 58.562730]
Mar 7 16:20:19 davepi kernel: [ 58.562753] uvcvideo: Failed to submit URB
0 (-1).

so I don't think that's a workaround for everyone...

Reply to this email directly or view it on GitHubhttps://github.com//issues/470#issuecomment-37040580
.

@P33M
Copy link
Contributor

P33M commented Mar 19, 2014

Please retry (without dwc_otg.speed=1) with BRANCH=next firmware.

See http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=70437 for information.

@davebiffuk
Copy link

Thanks P33M. I updated to BRANCH=next firmware last night and the "smsc95xx 1-1.1:1.0 eth0: unexpected urb length 0" messages have completely ceased. The Pi has been up for 9 hours now. So far so good!

For completeness, this is:

Mar 19 2014 16:52:53 
Copyright (c) 2012 Broadcom
version e071d425eca6ab200aefcf89056f3dcaf9510093 (clean) (release)

with this cmdline.txt:

dwc_otg.fiq_enable=1 dwc_otg.fiq_fsm_enable=1 dwc_otg.fiq_fsm_mask=0x3 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

Thanks again!

Dave

@P33M P33M closed this as completed Mar 25, 2014
@P33M
Copy link
Contributor

P33M commented Mar 25, 2014

Marking as fixed in BRANCH=next

popcornmix pushed a commit that referenced this issue Nov 25, 2021
[ Upstream commit a7ac783 ]

Lockdep complains about rtw_free_assoc_resources() taking the sta_hash_lock
followed by it calling rtw_free_stainfo() which takes xmitpriv->lock.
While the rtl8723bs_xmit_thread takes the sta_hash_lock while already
holding the xmitpriv->lock:

[  103.849756] ======================================================
[  103.849761] WARNING: possible circular locking dependency detected
[  103.849767] 5.15.0-rc1+ #470 Tainted: G         C  E
[  103.849773] ------------------------------------------------------
[  103.849776] wpa_supplicant/695 is trying to acquire lock:
[  103.849781] ffffa5d0c0562b00 (&pxmitpriv->lock){+.-.}-{2:2}, at: rtw_free_stainfo+0x8a/0x510 [r8723bs]
[  103.849840]
               but task is already holding lock:
[  103.849843] ffffa5d0c05636a8 (&pstapriv->sta_hash_lock){+.-.}-{2:2}, at: rtw_free_assoc_resources+0x48/0x110 [r8723bs]
[  103.849881]
               which lock already depends on the new lock.

[  103.849884]
               the existing dependency chain (in reverse order) is:
[  103.849887]
               -> #1 (&pstapriv->sta_hash_lock){+.-.}-{2:2}:
[  103.849898]        _raw_spin_lock_bh+0x34/0x40
[  103.849913]        rtw_get_stainfo+0x93/0x110 [r8723bs]
[  103.849948]        rtw_make_wlanhdr+0x14a/0x270 [r8723bs]
[  103.849983]        rtw_xmitframe_coalesce+0x5c/0x6c0 [r8723bs]
[  103.850019]        rtl8723bs_xmit_thread+0x4ac/0x620 [r8723bs]
[  103.850050]        kthread+0x143/0x160
[  103.850058]        ret_from_fork+0x22/0x30
[  103.850067]
               -> #0 (&pxmitpriv->lock){+.-.}-{2:2}:
[  103.850077]        __lock_acquire+0x1158/0x1de0
[  103.850084]        lock_acquire+0xb5/0x2b0
[  103.850090]        _raw_spin_lock_bh+0x34/0x40
[  103.850095]        rtw_free_stainfo+0x8a/0x510 [r8723bs]
[  103.850130]        rtw_free_assoc_resources+0x53/0x110 [r8723bs]
[  103.850159]        PHY_IQCalibrate_8723B+0x122b/0x36a0 [r8723bs]
[  103.850189]        cfg80211_disconnect+0x173/0x320 [cfg80211]
[  103.850331]        nl80211_disconnect+0x6e/0xb0 [cfg80211]
[  103.850422]        genl_family_rcv_msg_doit+0xcd/0x110
[  103.850430]        genl_rcv_msg+0xce/0x1c0
[  103.850435]        netlink_rcv_skb+0x50/0xf0
[  103.850441]        genl_rcv+0x24/0x40
[  103.850446]        netlink_unicast+0x16d/0x230
[  103.850452]        netlink_sendmsg+0x22b/0x450
[  103.850457]        sock_sendmsg+0x5e/0x60
[  103.850465]        ____sys_sendmsg+0x22f/0x270
[  103.850472]        ___sys_sendmsg+0x81/0xc0
[  103.850479]        __sys_sendmsg+0x49/0x80
[  103.850485]        do_syscall_64+0x3b/0x90
[  103.850493]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  103.850500]
               other info that might help us debug this:

[  103.850504]  Possible unsafe locking scenario:

[  103.850507]        CPU0                    CPU1
[  103.850510]        ----                    ----
[  103.850512]   lock(&pstapriv->sta_hash_lock);
[  103.850518]                                lock(&pxmitpriv->lock);
[  103.850524]                                lock(&pstapriv->sta_hash_lock);
[  103.850530]   lock(&pxmitpriv->lock);
[  103.850535]
                *** DEADLOCK ***

Push the taking of sta_hash_lock down into rtw_free_stainfo(),
where the critical section is, this allows taking the lock after
rtw_free_stainfo() has released pxmitpriv->lock.

This requires changing rtw_free_all_stainfo() so that it does its freeing
in 2 steps, first moving all stainfo-s to free to a local list while
holding the sta_hash_lock and then walking that list to call
rtw_free_stainfo() on them without holding the sta_hash_lock.

Pushing the taking of sta_hash_lock down into rtw_free_stainfo(),
also fixes a whole bunch of callers of rtw_free_stainfo() which
were not holding that lock even though they should.

Note that this also fixes the deadlock from the "remove possible
deadlock when disconnect" patch in a different way. But the
changes from that patch offer a nice locking cleanup regardless.

Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Nov 25, 2021
[ Upstream commit bdc1bbd ]

The assoc_timer takes the pmlmepriv->lock and various functions which
take the pmlmepriv->scanned_queue.lock first take the pmlmepriv->lock,
this means that we cannot have code which waits for the timer
(timer_del_sync) while holding the pmlmepriv->scanned_queue.lock
to avoid a triangle deadlock:

[  363.139361] ======================================================
[  363.139377] WARNING: possible circular locking dependency detected
[  363.139396] 5.15.0-rc1+ #470 Tainted: G         C  E
[  363.139413] ------------------------------------------------------
[  363.139424] RTW_CMD_THREAD/2466 is trying to acquire lock:
[  363.139441] ffffbacd00699038 (&pmlmepriv->lock){+.-.}-{2:2}, at: _rtw_join_timeout_handler+0x3c/0x160 [r8723bs]
[  363.139598]
               but task is already holding lock:
[  363.139610] ffffbacd00128ea0 ((&pmlmepriv->assoc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x260
[  363.139673]
               which lock already depends on the new lock.

[  363.139684]
               the existing dependency chain (in reverse order) is:
[  363.139696]
               -> #2 ((&pmlmepriv->assoc_timer)){+.-.}-{0:0}:
[  363.139734]        del_timer_sync+0x59/0x100
[  363.139762]        rtw_joinbss_event_prehandle+0x342/0x640 [r8723bs]
[  363.139870]        report_join_res+0xdf/0x110 [r8723bs]
[  363.139980]        OnAssocRsp+0x17a/0x200 [r8723bs]
[  363.140092]        rtw_recv_entry+0x190/0x1120 [r8723bs]
[  363.140209]        rtl8723b_process_phy_info+0x3f9/0x750 [r8723bs]
[  363.140318]        tasklet_action_common.constprop.0+0xe8/0x110
[  363.140345]        __do_softirq+0xde/0x485
[  363.140372]        __irq_exit_rcu+0xd0/0x100
[  363.140393]        irq_exit_rcu+0xa/0x20
[  363.140413]        common_interrupt+0x83/0xa0
[  363.140440]        asm_common_interrupt+0x1e/0x40
[  363.140463]        finish_task_switch.isra.0+0x157/0x3d0
[  363.140492]        __schedule+0x447/0x1880
[  363.140516]        schedule+0x59/0xc0
[  363.140537]        smpboot_thread_fn+0x161/0x1c0
[  363.140565]        kthread+0x143/0x160
[  363.140585]        ret_from_fork+0x22/0x30
[  363.140614]
               -> #1 (&pmlmepriv->scanned_queue.lock){+.-.}-{2:2}:
[  363.140653]        _raw_spin_lock_bh+0x34/0x40
[  363.140675]        rtw_free_network_queue+0x31/0x80 [r8723bs]
[  363.140776]        rtw_sitesurvey_cmd+0x79/0x1e0 [r8723bs]
[  363.140869]        rtw_cfg80211_surveydone_event_callback+0x3cf/0x470 [r8723bs]
[  363.140973]        rdev_scan+0x42/0x1a0 [cfg80211]
[  363.141307]        nl80211_trigger_scan+0x566/0x660 [cfg80211]
[  363.141635]        genl_family_rcv_msg_doit+0xcd/0x110
[  363.141661]        genl_rcv_msg+0xce/0x1c0
[  363.141680]        netlink_rcv_skb+0x50/0xf0
[  363.141699]        genl_rcv+0x24/0x40
[  363.141717]        netlink_unicast+0x16d/0x230
[  363.141736]        netlink_sendmsg+0x22b/0x450
[  363.141755]        sock_sendmsg+0x5e/0x60
[  363.141781]        ____sys_sendmsg+0x22f/0x270
[  363.141803]        ___sys_sendmsg+0x81/0xc0
[  363.141828]        __sys_sendmsg+0x49/0x80
[  363.141851]        do_syscall_64+0x3b/0x90
[  363.141873]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[  363.141895]
               -> #0 (&pmlmepriv->lock){+.-.}-{2:2}:
[  363.141930]        __lock_acquire+0x1158/0x1de0
[  363.141954]        lock_acquire+0xb5/0x2b0
[  363.141974]        _raw_spin_lock_bh+0x34/0x40
[  363.141993]        _rtw_join_timeout_handler+0x3c/0x160 [r8723bs]
[  363.142097]        call_timer_fn+0x94/0x260
[  363.142122]        __run_timers.part.0+0x1bf/0x290
[  363.142147]        run_timer_softirq+0x26/0x50
[  363.142171]        __do_softirq+0xde/0x485
[  363.142193]        __irq_exit_rcu+0xd0/0x100
[  363.142215]        irq_exit_rcu+0xa/0x20
[  363.142235]        sysvec_apic_timer_interrupt+0x72/0x90
[  363.142260]        asm_sysvec_apic_timer_interrupt+0x12/0x20
[  363.142283]        __module_address.part.0+0x0/0xd0
[  363.142309]        is_module_address+0x25/0x40
[  363.142334]        static_obj+0x4f/0x60
[  363.142361]        lockdep_init_map_type+0x47/0x220
[  363.142382]        __init_swait_queue_head+0x45/0x60
[  363.142408]        mmc_wait_for_req+0x4a/0xc0 [mmc_core]
[  363.142504]        mmc_wait_for_cmd+0x55/0x70 [mmc_core]
[  363.142592]        mmc_io_rw_direct+0x75/0xe0 [mmc_core]
[  363.142691]        sdio_writeb+0x2e/0x50 [mmc_core]
[  363.142788]        _sd_cmd52_write+0x62/0x80 [r8723bs]
[  363.142885]        sd_cmd52_write+0x6c/0xb0 [r8723bs]
[  363.142981]        rtl8723bs_set_hal_ops+0x982/0x9b0 [r8723bs]
[  363.143089]        rtw_write16+0x1e/0x30 [r8723bs]
[  363.143184]        SetHwReg8723B+0xcc9/0xd30 [r8723bs]
[  363.143294]        mlmeext_joinbss_event_callback+0x17a/0x1a0 [r8723bs]
[  363.143405]        rtw_joinbss_event_callback+0x11/0x20 [r8723bs]
[  363.143507]        mlme_evt_hdl+0x4d/0x70 [r8723bs]
[  363.143620]        rtw_cmd_thread+0x168/0x3c0 [r8723bs]
[  363.143712]        kthread+0x143/0x160
[  363.143732]        ret_from_fork+0x22/0x30
[  363.143757]
               other info that might help us debug this:

[  363.143768] Chain exists of:
                 &pmlmepriv->lock --> &pmlmepriv->scanned_queue.lock --> (&pmlmepriv->assoc_timer)

[  363.143809]  Possible unsafe locking scenario:

[  363.143819]        CPU0                    CPU1
[  363.143831]        ----                    ----
[  363.143841]   lock((&pmlmepriv->assoc_timer));
[  363.143862]                                lock(&pmlmepriv->scanned_queue.lock);
[  363.143882]                                lock((&pmlmepriv->assoc_timer));
[  363.143902]   lock(&pmlmepriv->lock);
[  363.143921]
                *** DEADLOCK ***

Make rtw_joinbss_event_prehandle() release the scanned_queue.lock before
it deletes the timer to avoid this (it is still holding pmlmepriv->lock
protecting against racing the timer).

Signed-off-by: Hans de Goede <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
popcornmix pushed a commit that referenced this issue Dec 9, 2024
Jordy says:

"
In the xsk_map_delete_elem function an unsigned integer
(map->max_entries) is compared with a user-controlled signed integer
(k). Due to implicit type conversion, a large unsigned value for
map->max_entries can bypass the intended bounds check:

	if (k >= map->max_entries)
		return -EINVAL;

This allows k to hold a negative value (between -2147483648 and -2),
which is then used as an array index in m->xsk_map[k], which results
in an out-of-bounds access.

	spin_lock_bh(&m->lock);
	map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry
	old_xs = unrcu_pointer(xchg(map_entry, NULL));  // Oob write
	if (old_xs)
		xsk_map_sock_delete(old_xs, map_entry);
	spin_unlock_bh(&m->lock);

The xchg operation can then be used to cause an out-of-bounds write.
Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead
to further memory corruption.
"

It indeed results in following splat:

[76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108
[76612.904330] #PF: supervisor write access in kernel mode
[76612.909639] #PF: error_code(0x0002) - not-present page
[76612.914855] PGD 0 P4D 0
[76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP
[76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470
[76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60
[76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31
[76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246
[76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000
[76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000
[76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007
[76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8
[76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0
[76613.005303] FS:  00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000
[76613.013517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0
[76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76613.041086] PKRU: 55555554
[76613.043842] Call Trace:
[76613.046331]  <TASK>
[76613.048468]  ? __die+0x20/0x60
[76613.051581]  ? page_fault_oops+0x15a/0x450
[76613.055747]  ? search_extable+0x22/0x30
[76613.059649]  ? search_bpf_extables+0x5f/0x80
[76613.063988]  ? exc_page_fault+0xa9/0x140
[76613.067975]  ? asm_exc_page_fault+0x22/0x30
[76613.072229]  ? xsk_map_delete_elem+0x2d/0x60
[76613.076573]  ? xsk_map_delete_elem+0x23/0x60
[76613.080914]  __sys_bpf+0x19b7/0x23c0
[76613.084555]  __x64_sys_bpf+0x1a/0x20
[76613.088194]  do_syscall_64+0x37/0xb0
[76613.091832]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[76613.096962] RIP: 0033:0x7f80b6d1e88d
[76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d
[76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003
[76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000
[76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8
[76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040
[76613.188129]  </TASK>

Fix this by simply changing key type from int to u32.

Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
CC: [email protected]
Reported-by: Jordy Zomer <[email protected]>
Suggested-by: Jordy Zomer <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
popcornmix pushed a commit that referenced this issue Dec 16, 2024
commit 32cd3db upstream.

Jordy says:

"
In the xsk_map_delete_elem function an unsigned integer
(map->max_entries) is compared with a user-controlled signed integer
(k). Due to implicit type conversion, a large unsigned value for
map->max_entries can bypass the intended bounds check:

	if (k >= map->max_entries)
		return -EINVAL;

This allows k to hold a negative value (between -2147483648 and -2),
which is then used as an array index in m->xsk_map[k], which results
in an out-of-bounds access.

	spin_lock_bh(&m->lock);
	map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry
	old_xs = unrcu_pointer(xchg(map_entry, NULL));  // Oob write
	if (old_xs)
		xsk_map_sock_delete(old_xs, map_entry);
	spin_unlock_bh(&m->lock);

The xchg operation can then be used to cause an out-of-bounds write.
Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead
to further memory corruption.
"

It indeed results in following splat:

[76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108
[76612.904330] #PF: supervisor write access in kernel mode
[76612.909639] #PF: error_code(0x0002) - not-present page
[76612.914855] PGD 0 P4D 0
[76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP
[76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470
[76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60
[76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31
[76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246
[76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000
[76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000
[76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007
[76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8
[76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0
[76613.005303] FS:  00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000
[76613.013517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0
[76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76613.041086] PKRU: 55555554
[76613.043842] Call Trace:
[76613.046331]  <TASK>
[76613.048468]  ? __die+0x20/0x60
[76613.051581]  ? page_fault_oops+0x15a/0x450
[76613.055747]  ? search_extable+0x22/0x30
[76613.059649]  ? search_bpf_extables+0x5f/0x80
[76613.063988]  ? exc_page_fault+0xa9/0x140
[76613.067975]  ? asm_exc_page_fault+0x22/0x30
[76613.072229]  ? xsk_map_delete_elem+0x2d/0x60
[76613.076573]  ? xsk_map_delete_elem+0x23/0x60
[76613.080914]  __sys_bpf+0x19b7/0x23c0
[76613.084555]  __x64_sys_bpf+0x1a/0x20
[76613.088194]  do_syscall_64+0x37/0xb0
[76613.091832]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[76613.096962] RIP: 0033:0x7f80b6d1e88d
[76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d
[76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003
[76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000
[76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8
[76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040
[76613.188129]  </TASK>

Fix this by simply changing key type from int to u32.

Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
CC: [email protected]
Reported-by: Jordy Zomer <[email protected]>
Suggested-by: Jordy Zomer <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Dec 16, 2024
commit 32cd3db upstream.

Jordy says:

"
In the xsk_map_delete_elem function an unsigned integer
(map->max_entries) is compared with a user-controlled signed integer
(k). Due to implicit type conversion, a large unsigned value for
map->max_entries can bypass the intended bounds check:

	if (k >= map->max_entries)
		return -EINVAL;

This allows k to hold a negative value (between -2147483648 and -2),
which is then used as an array index in m->xsk_map[k], which results
in an out-of-bounds access.

	spin_lock_bh(&m->lock);
	map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry
	old_xs = unrcu_pointer(xchg(map_entry, NULL));  // Oob write
	if (old_xs)
		xsk_map_sock_delete(old_xs, map_entry);
	spin_unlock_bh(&m->lock);

The xchg operation can then be used to cause an out-of-bounds write.
Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead
to further memory corruption.
"

It indeed results in following splat:

[76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108
[76612.904330] #PF: supervisor write access in kernel mode
[76612.909639] #PF: error_code(0x0002) - not-present page
[76612.914855] PGD 0 P4D 0
[76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP
[76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ #470
[76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60
[76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31
[76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246
[76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000
[76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000
[76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007
[76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8
[76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0
[76613.005303] FS:  00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000
[76613.013517] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0
[76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[76613.041086] PKRU: 55555554
[76613.043842] Call Trace:
[76613.046331]  <TASK>
[76613.048468]  ? __die+0x20/0x60
[76613.051581]  ? page_fault_oops+0x15a/0x450
[76613.055747]  ? search_extable+0x22/0x30
[76613.059649]  ? search_bpf_extables+0x5f/0x80
[76613.063988]  ? exc_page_fault+0xa9/0x140
[76613.067975]  ? asm_exc_page_fault+0x22/0x30
[76613.072229]  ? xsk_map_delete_elem+0x2d/0x60
[76613.076573]  ? xsk_map_delete_elem+0x23/0x60
[76613.080914]  __sys_bpf+0x19b7/0x23c0
[76613.084555]  __x64_sys_bpf+0x1a/0x20
[76613.088194]  do_syscall_64+0x37/0xb0
[76613.091832]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[76613.096962] RIP: 0033:0x7f80b6d1e88d
[76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
[76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
[76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d
[76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003
[76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000
[76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8
[76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040
[76613.188129]  </TASK>

Fix this by simply changing key type from int to u32.

Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
CC: [email protected]
Reported-by: Jordy Zomer <[email protected]>
Suggested-by: Jordy Zomer <[email protected]>
Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: Maciej Fijalkowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants