Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove LTE (and other) from combo-chip-folder #9

Closed
frank-w opened this issue Jan 7, 2018 · 1 comment
Closed

remove LTE (and other) from combo-chip-folder #9

frank-w opened this issue Jan 7, 2018 · 1 comment

Comments

@frank-w
Copy link
Owner

frank-w commented Jan 7, 2018

frank-w pushed a commit that referenced this issue Mar 1, 2018
commit 7ba7166 upstream.

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur in
random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is bd4c82c ("mm,
THP, swap: delay splitting THP after swapped out").

The root cause is as follows:

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages to
improve performance.  But zswap (frontswap) will treat THP as a normal
page, so only the head page is saved.  After swapping in, tail pages
will not be restored to their original contents, causing memory
corruption in the applications.

This is fixed by refusing to save page in the frontswap store functions
if the page is a THP.  So that the THP will be swapped out to swap
device.

Another choice is to split THP if frontswap is enabled.  But it is found
that the frontswap enabling isn't flexible.  For example, if
CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
zswap itself isn't enabled.

Frontswap has multiple backends, to make it easy for one backend to
enable THP support, the THP checking is put in backend frontswap store
functions instead of the general interfaces.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out")
Signed-off-by: "Huang, Ying" <[email protected]>
Reported-by: Sergey Senozhatsky <[email protected]>
Tested-by: Sergey Senozhatsky <[email protected]>
Suggested-by: Minchan Kim <[email protected]>	[put THP checking in backend]
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: <[email protected]>	[4.14]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Mar 15, 2018
commit 1514839 upstream.

This patch fixes NULL pointer crash due to active timer running for abort
IOCB.

From crash dump analysis it was discoverd that get_next_timer_interrupt()
encountered a corrupted entry on the timer list.

 #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
    [exception RIP: get_next_timer_interrupt+440]
    RIP: ffffffff90ea3088  RSP: ffff95e1f6f0fdf0  RFLAGS: 00010013
    RAX: ffff95e1f6451028  RBX: 000218e2389e5f40  RCX: 00000001232ad600
    RDX: 0000000000000001  RSI: ffff95e1f6f0fdf0  RDI: 0000000001232ad6
    RBP: ffff95e1f6f0fe40   R8: ffff95e1f6451188   R9: 0000000000000001
    R10: 0000000000000016  R11: 0000000000000016  R12: 00000001232ad5f6
    R13: ffff95e1f6450000  R14: ffff95e1f6f0fdf8  R15: ffff95e1f6f0fe10
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Looking at the assembly of get_next_timer_interrupt(), address came
from %r8 (ffff95e1f6451188) which is pointing to list_head with single
entry at ffff95e5ff621178.

 0xffffffff90ea307a <get_next_timer_interrupt+426>:      mov    (%r8),%rdx
 0xffffffff90ea307d <get_next_timer_interrupt+429>:      cmp    %r8,%rdx
 0xffffffff90ea3080 <get_next_timer_interrupt+432>:      je     0xffffffff90ea30a7 <get_next_timer_interrupt+471>
 0xffffffff90ea3082 <get_next_timer_interrupt+434>:      nopw   0x0(%rax,%rax,1)
 0xffffffff90ea3088 <get_next_timer_interrupt+440>:      testb  $0x1,0x18(%rdx)

 crash> rd ffff95e1f6451188 10
 ffff95e1f6451188:  ffff95e5ff621178 ffff95e5ff621178   x.b.....x.b.....
 ffff95e1f6451198:  ffff95e1f6451198 ffff95e1f6451198   ..E.......E.....
 ffff95e1f64511a8:  ffff95e1f64511a8 ffff95e1f64511a8   ..E.......E.....
 ffff95e1f64511b8:  ffff95e77cf509a0 ffff95e77cf509a0   ...|.......|....
 ffff95e1f64511c8:  ffff95e1f64511c8 ffff95e1f64511c8   ..E.......E.....

 crash> rd ffff95e5ff621178 10
 ffff95e5ff621178:  0000000000000001 ffff95e15936aa00   ..........6Y....
 ffff95e5ff621188:  0000000000000000 00000000ffffffff   ................
 ffff95e5ff621198:  00000000000000a0 0000000000000010   ................
 ffff95e5ff6211a8:  ffff95e5ff621198 000000000000000c   ..b.............
 ffff95e5ff6211b8:  00000f5800000000 ffff95e751f8d720   ....X... ..Q....

 ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.

 CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
 ffff95dc7fd74d00 mnt_cache                384      19785     24948    594    16k
   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
   ffffdc5dabfd8800  ffff95e5ff620000     1     42         29    13
   FREE / [ALLOCATED]
    ffff95e5ff621080  (cpu 6 cache)

Examining the contents of that memory reveals a pointer to a constant string
in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().

 crash> rd ffffffffc059277c 20
 ffffffffc059277c:  6e490074726f6261 0074707572726574   abort.Interrupt.
 ffffffffc059278c:  00676e696c6c6f50 6920726576697244   Polling.Driver i
 ffffffffc059279c:  646f6d207325206e 6974736554000a65   n %s mode..Testi
 ffffffffc05927ac:  636976656420676e 786c252074612065   ng device at %lx
 ffffffffc05927bc:  6b63656843000a2e 646f727020676e69   ...Checking prod
 ffffffffc05927cc:  6f20444920746375 0a2e706968632066   uct ID of chip..
 ffffffffc05927dc:  5120646e756f4600 204130303232414c   .Found QLA2200A
 ffffffffc05927ec:  43000a2e70696843 20676e696b636568   Chip...Checking
 ffffffffc05927fc:  65786f626c69616d 6c636e69000a2e73   mailboxes...incl
 ffffffffc059280c:  756e696c2f656475 616d2d616d642f78   ude/linux/dma-ma

 crash> struct -ox srb_iocb
 struct srb_iocb {
           union {
               struct {...} logio;
               struct {...} els_logo;
               struct {...} tmf;
               struct {...} fxiocb;
               struct {...} abt;
               struct ct_arg ctarg;
               struct {...} mbx;
               struct {...} nack;
    [0x0 ] } u;
    [0xb8] struct timer_list timer;
    [0x108] void (*timeout)(void *);
 }
 SIZE: 0x110

 crash> ! bc
 ibase=16
 obase=10
 B8+40
 F8

The object is a srb_t, and at offset 0xf8 within that structure
(i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.

Cc: <[email protected]> #4.4+
Fixes: 4440e46 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.")
Signed-off-by: Himanshu Madhani <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Apr 12, 2018
[ Upstream commit d754941 ]

If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.

PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
 #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
 #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
 #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
 #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c

This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.

Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.

Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.

After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.

Signed-off-by: Rafael David Tinoco <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Apr 26, 2018
[ Upstream commit 2c0aa08 ]

Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall

PID: 47039  TASK: ffff89887e2fe640  CPU: 47  COMMAND: "kworker/u:6"
 #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9
 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3
 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518
 #3 [ffff898e35f15b60] no_context at ffffffff8104854c
 #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675
 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3
 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8
 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff898e35f15dc8  RFLAGS: 00010282
    RAX: 00000000fffffffe  RBX: ffff889b77f6fc00  RCX:ffffffff81c99d88
    RDX: 0000000000000000  RSI: ffff896019ee08e8  RDI:ffff889b77f6fc00
    RBP: ffff898e35f15df0   R8: ffff896019ee08c8  R9:0000000000000000
    R10: 0000000000000400  R11: 0000000000000000  R12:ffff896019ee08c0
    R13: ffff889b77f6fe68  R14: ffffffff81c99d80  R15: ffffffffa022a1e0
    ORIG_RAX: ffffffffffffffff  CS: 0010 SS: 0018
 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm]
 #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6
 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0
 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6

PID: 45659  TASK: ffff880d313d2500  CPU: 31  COMMAND: "oracle_45659_ap"
 #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4
 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf
 #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7
 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb
 #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm]
 #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma]
 #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds]
 #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds]
 #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670

PID: 45659                          PID: 47039
rds_ib_laddr_check
  /* create id_priv with a null event_handler */
  rdma_create_id
  rdma_bind_addr
    cma_acquire_dev
      /* add id_priv to cma_dev->id_list */
      cma_attach_to_dev
                                    cma_ndev_work_handler
                                      /* event_hanlder is null */
                                      id_priv->id.event_handler

Signed-off-by: Guanglei Li <[email protected]>
Signed-off-by: Honglei Wang <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
Reviewed-by: Yanjun Zhu <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Acked-by: Doug Ledford <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 21, 2018
…s found

[ Upstream commit 72f17ba ]

If an OVS_ATTR_NESTED attribute type is found while walking
through netlink attributes, we call nlattr_set() recursively
passing the length table for the following nested attributes, if
different from the current one.

However, once we're done with those sub-nested attributes, we
should continue walking through attributes using the current
table, instead of using the one related to the sub-nested
attributes.

For example, given this sequence:

1  OVS_KEY_ATTR_PRIORITY
2  OVS_KEY_ATTR_TUNNEL
3	OVS_TUNNEL_KEY_ATTR_ID
4	OVS_TUNNEL_KEY_ATTR_IPV4_SRC
5	OVS_TUNNEL_KEY_ATTR_IPV4_DST
6	OVS_TUNNEL_KEY_ATTR_TTL
7	OVS_TUNNEL_KEY_ATTR_TP_SRC
8	OVS_TUNNEL_KEY_ATTR_TP_DST
9  OVS_KEY_ATTR_IN_PORT
10 OVS_KEY_ATTR_SKB_MARK
11 OVS_KEY_ATTR_MPLS

we switch to the 'ovs_tunnel_key_lens' table on attribute #3,
and we don't switch back to 'ovs_key_lens' while setting
attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS
evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is
15, we also get this kind of KASan splat while accessing the
wrong table:

[ 7654.586496] ==================================================================
[ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430
[ 7654.610983]
[ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1
[ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
[ 7654.631379] Call Trace:
[ 7654.634108]  [<ffffffffb65a7c50>] dump_stack+0x19/0x1b
[ 7654.639843]  [<ffffffffb53ff373>] print_address_description+0x33/0x290
[ 7654.647129]  [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.654607]  [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330
[ 7654.661406]  [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40
[ 7654.668789]  [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.676076]  [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch]
[ 7654.684234]  [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40
[ 7654.689968]  [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590
[ 7654.696574]  [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch]
[ 7654.705122]  [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0
[ 7654.712503]  [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21
[ 7654.719401]  [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
[ 7654.726298]  [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
[ 7654.733195]  [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50
[ 7654.740187]  [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0
[ 7654.746406]  [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20
[ 7654.752914]  [<ffffffffb53fe711>] ? memset+0x31/0x40
[ 7654.758456]  [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch]

[snip]

[ 7655.132484] The buggy address belongs to the variable:
[ 7655.138226]  ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch]
[ 7655.145507]
[ 7655.147166] Memory state around the buggy address:
[ 7655.152514]  ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
[ 7655.160585]  ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
[ 7655.176701]                                                              ^
[ 7655.184372]  ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05
[ 7655.192431]  ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
[ 7655.200490] ==================================================================

Reported-by: Hangbin Liu <[email protected]>
Fixes: 982b527 ("openvswitch: Fix mask generation for nested attributes.")
Signed-off-by: Stefano Brivio <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 21, 2018
[ Upstream commit af50e4b ]

syzbot caught an infinite recursion in nsh_gso_segment().

Problem here is that we need to make sure the NSH header is of
reasonable length.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by syz-executor0/10189:
 #0:         (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517
 #1:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #1:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #2:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #2:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #3:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #3:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #4:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #4:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #5:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #5:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #6:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #6:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #7:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #7:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #8:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #8:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #9:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #9:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #10:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #10:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #11:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #11:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #12:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #12:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #13:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #13:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #14:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #14:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #15:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #15:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #16:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #16:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #17:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #17:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #18:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #18:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #19:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #19:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #20:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #20:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #21:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #21:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #22:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #22:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #23:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #23:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #24:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #24:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #25:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #25:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #26:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #26:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #27:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #27:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #28:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #28:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #29:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #29:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #30:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #30:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #31:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #31:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
dccp_close: ABORT with 65423 bytes unread
 #32:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #32:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #33:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #33:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #34:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #34:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #35:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #35:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #36:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #36:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #37:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #37:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #38:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #38:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #39:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #39:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #40:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #40:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #41:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #41:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #42:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #42:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #43:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #43:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #44:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #44:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #45:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #45:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #46:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #46:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #47:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #47:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
INFO: lockdep is turned off.
CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #26
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449
 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
 rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
 rcu_read_lock include/linux/rcupdate.h:632 [inline]
 skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4025 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118
 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168
 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312
 qdisc_restart net/sched/sch_generic.c:399 [inline]
 __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410
 __dev_xmit_skb net/core/dev.c:3243 [inline]
 __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616
 packet_snd net/packet/af_packet.c:2951 [inline]
 packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 __sys_sendto+0x3d7/0x670 net/socket.c:1789
 __do_sys_sendto net/socket.c:1801 [inline]
 __se_sys_sendto net/socket.c:1797 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: c411ed8 ("nsh: add GSO support")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jiri Benc <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 21, 2018
…s found

[ Upstream commit 72f17ba ]

If an OVS_ATTR_NESTED attribute type is found while walking
through netlink attributes, we call nlattr_set() recursively
passing the length table for the following nested attributes, if
different from the current one.

However, once we're done with those sub-nested attributes, we
should continue walking through attributes using the current
table, instead of using the one related to the sub-nested
attributes.

For example, given this sequence:

1  OVS_KEY_ATTR_PRIORITY
2  OVS_KEY_ATTR_TUNNEL
3	OVS_TUNNEL_KEY_ATTR_ID
4	OVS_TUNNEL_KEY_ATTR_IPV4_SRC
5	OVS_TUNNEL_KEY_ATTR_IPV4_DST
6	OVS_TUNNEL_KEY_ATTR_TTL
7	OVS_TUNNEL_KEY_ATTR_TP_SRC
8	OVS_TUNNEL_KEY_ATTR_TP_DST
9  OVS_KEY_ATTR_IN_PORT
10 OVS_KEY_ATTR_SKB_MARK
11 OVS_KEY_ATTR_MPLS

we switch to the 'ovs_tunnel_key_lens' table on attribute #3,
and we don't switch back to 'ovs_key_lens' while setting
attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS
evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is
15, we also get this kind of KASan splat while accessing the
wrong table:

[ 7654.586496] ==================================================================
[ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430
[ 7654.610983]
[ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1
[ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
[ 7654.631379] Call Trace:
[ 7654.634108]  [<ffffffffb65a7c50>] dump_stack+0x19/0x1b
[ 7654.639843]  [<ffffffffb53ff373>] print_address_description+0x33/0x290
[ 7654.647129]  [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.654607]  [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330
[ 7654.661406]  [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40
[ 7654.668789]  [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch]
[ 7654.676076]  [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch]
[ 7654.684234]  [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40
[ 7654.689968]  [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590
[ 7654.696574]  [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch]
[ 7654.705122]  [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0
[ 7654.712503]  [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21
[ 7654.719401]  [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
[ 7654.726298]  [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
[ 7654.733195]  [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50
[ 7654.740187]  [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0
[ 7654.746406]  [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20
[ 7654.752914]  [<ffffffffb53fe711>] ? memset+0x31/0x40
[ 7654.758456]  [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch]

[snip]

[ 7655.132484] The buggy address belongs to the variable:
[ 7655.138226]  ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch]
[ 7655.145507]
[ 7655.147166] Memory state around the buggy address:
[ 7655.152514]  ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
[ 7655.160585]  ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
[ 7655.176701]                                                              ^
[ 7655.184372]  ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05
[ 7655.192431]  ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
[ 7655.200490] ==================================================================

Reported-by: Hangbin Liu <[email protected]>
Fixes: 982b527 ("openvswitch: Fix mask generation for nested attributes.")
Signed-off-by: Stefano Brivio <[email protected]>
Reviewed-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 21, 2018
[ Upstream commit af50e4b ]

syzbot caught an infinite recursion in nsh_gso_segment().

Problem here is that we need to make sure the NSH header is of
reasonable length.

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48  max: 48!
48 locks held by syz-executor0/10189:
 #0:         (ptrval) (rcu_read_lock_bh){....}, at: __dev_queue_xmit+0x30f/0x34c0 net/core/dev.c:3517
 #1:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #1:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #2:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #2:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #3:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #3:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #4:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #4:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #5:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #5:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #6:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #6:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #7:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #7:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #8:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #8:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #9:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #9:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #10:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #10:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #11:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #11:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #12:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #12:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #13:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #13:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #14:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #14:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #15:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #15:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #16:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #16:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #17:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #17:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #18:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #18:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #19:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #19:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #20:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #20:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #21:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #21:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #22:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #22:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #23:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #23:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #24:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #24:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #25:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #25:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #26:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #26:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #27:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #27:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #28:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #28:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #29:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #29:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #30:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #30:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #31:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #31:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
dccp_close: ABORT with 65423 bytes unread
 #32:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #32:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #33:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #33:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #34:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #34:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #35:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #35:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #36:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #36:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #37:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #37:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #38:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #38:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #39:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #39:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #40:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #40:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #41:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #41:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #42:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #42:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #43:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #43:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #44:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #44:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #45:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #45:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #46:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #46:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
 #47:         (ptrval) (rcu_read_lock){....}, at: __skb_pull include/linux/skbuff.h:2080 [inline]
 #47:         (ptrval) (rcu_read_lock){....}, at: skb_mac_gso_segment+0x221/0x720 net/core/dev.c:2787
INFO: lockdep is turned off.
CPU: 1 PID: 10189 Comm: syz-executor0 Not tainted 4.17.0-rc2+ #26
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 __lock_acquire+0x1788/0x5140 kernel/locking/lockdep.c:3449
 lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
 rcu_lock_acquire include/linux/rcupdate.h:246 [inline]
 rcu_read_lock include/linux/rcupdate.h:632 [inline]
 skb_mac_gso_segment+0x25b/0x720 net/core/dev.c:2789
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 nsh_gso_segment+0x405/0xb60 net/nsh/nsh.c:107
 skb_mac_gso_segment+0x3ad/0x720 net/core/dev.c:2792
 __skb_gso_segment+0x3bb/0x870 net/core/dev.c:2865
 skb_gso_segment include/linux/netdevice.h:4025 [inline]
 validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3118
 validate_xmit_skb_list+0xbf/0x120 net/core/dev.c:3168
 sch_direct_xmit+0x354/0x11e0 net/sched/sch_generic.c:312
 qdisc_restart net/sched/sch_generic.c:399 [inline]
 __qdisc_run+0x741/0x1af0 net/sched/sch_generic.c:410
 __dev_xmit_skb net/core/dev.c:3243 [inline]
 __dev_queue_xmit+0x28ea/0x34c0 net/core/dev.c:3551
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3616
 packet_snd net/packet/af_packet.c:2951 [inline]
 packet_sendmsg+0x40f8/0x6070 net/packet/af_packet.c:2976
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:639
 __sys_sendto+0x3d7/0x670 net/socket.c:1789
 __do_sys_sendto net/socket.c:1801 [inline]
 __se_sys_sendto net/socket.c:1797 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: c411ed8 ("nsh: add GSO support")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jiri Benc <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 31, 2018
[ Upstream commit fca3234 ]

Executing command 'perf stat -T -- ls' dumps core on x86 and s390.

Here is the call back chain (done on x86):

 # gdb ./perf
 ....
 (gdb) r stat -T -- ls
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
(gdb) where
 #0  0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
 #1  0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6
 #2  0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580,
    list=0xbfb970, name=0xbf3ef0 "cpu",
    head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233
 #3  0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580,
    list=0xbfb970, name=0xbf3ef0 "cpu",
    head_config=0xbfb930) at util/parse-events.c:1288
 #4  0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580,
    scanner=0xbf4210) at util/parse-events.y:234
 #5  0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0
    "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}",
    parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673
 #6  0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0
    "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0)
    at util/parse-events.c:1713
 #7  0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281
 #8  0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at
    builtin-stat.c:2828
 #9  0x00000000004c8b0f in run_builtin (p=0xab01a0 <commands+288>, argc=4,
    argv=0x7fffffffe3b0) at perf.c:297
 #10 0x00000000004c8d7c in handle_internal_command (argc=4,
    argv=0x7fffffffe3b0) at perf.c:349
 #11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c,
   argv=0x7fffffffe200) at perf.c:393
 #12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537
(gdb)

It turns out that a NULL pointer is referenced. Here are the
function calls:

  ...
  cmd_stat()
  +---> add_default_attributes()
	+---> parse_events(evsel_list, transaction_attrs, NULL);
	             3rd parameter set to NULL

Function parse_events(xx, xx, struct parse_events_error *err) dives
into a bison generated scanner and creates
parser state information for it first:

   struct parse_events_state parse_state = {
                .list   = LIST_HEAD_INIT(parse_state.list),
                .idx    = evlist->nr_entries,
                .error  = err,   <--- NULL POINTER !!!
                .evlist = evlist,
        };

Now various functions inside the bison scanner are called to end up in
__parse_events_add_pmu(struct parse_events_state *parse_state, ..) with
first parameter being a pointer to above structure definition.

Now the PMU event name is not found (because being executed in a VM) and
this function tries to create an error message with

   asprintf(&parse_state->error.str, ....)

which references a NULL pointer and dumps core.

Fix this by providing a pointer to the necessary error information
instead of NULL. Technically only the else part is needed to avoid the
core dump, just lets be safe...

Signed-off-by: Thomas Richter <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Hendrik Brueckner <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 31, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 31, 2018
[ Upstream commit fca3234 ]

Executing command 'perf stat -T -- ls' dumps core on x86 and s390.

Here is the call back chain (done on x86):

 # gdb ./perf
 ....
 (gdb) r stat -T -- ls
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
(gdb) where
 #0  0x00007ffff56d1963 in vasprintf () from /lib64/libc.so.6
 #1  0x00007ffff56ae484 in asprintf () from /lib64/libc.so.6
 #2  0x00000000004f1982 in __parse_events_add_pmu (parse_state=0x7fffffffd580,
    list=0xbfb970, name=0xbf3ef0 "cpu",
    head_config=0xbfb930, auto_merge_stats=false) at util/parse-events.c:1233
 #3  0x00000000004f1c8e in parse_events_add_pmu (parse_state=0x7fffffffd580,
    list=0xbfb970, name=0xbf3ef0 "cpu",
    head_config=0xbfb930) at util/parse-events.c:1288
 #4  0x0000000000537ce3 in parse_events_parse (_parse_state=0x7fffffffd580,
    scanner=0xbf4210) at util/parse-events.y:234
 #5  0x00000000004f2c7a in parse_events__scanner (str=0x6b66c0
    "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}",
    parse_state=0x7fffffffd580, start_token=258) at util/parse-events.c:1673
 #6  0x00000000004f2e23 in parse_events (evlist=0xbe9990, str=0x6b66c0
    "task-clock,{instructions,cycles,cpu/cycles-t/,cpu/tx-start/}", err=0x0)
    at util/parse-events.c:1713
 #7  0x000000000044e137 in add_default_attributes () at builtin-stat.c:2281
 #8  0x000000000044f7b5 in cmd_stat (argc=1, argv=0x7fffffffe3b0) at
    builtin-stat.c:2828
 #9  0x00000000004c8b0f in run_builtin (p=0xab01a0 <commands+288>, argc=4,
    argv=0x7fffffffe3b0) at perf.c:297
 #10 0x00000000004c8d7c in handle_internal_command (argc=4,
    argv=0x7fffffffe3b0) at perf.c:349
 #11 0x00000000004c8ece in run_argv (argcp=0x7fffffffe20c,
   argv=0x7fffffffe200) at perf.c:393
 #12 0x00000000004c929c in main (argc=4, argv=0x7fffffffe3b0) at perf.c:537
(gdb)

It turns out that a NULL pointer is referenced. Here are the
function calls:

  ...
  cmd_stat()
  +---> add_default_attributes()
	+---> parse_events(evsel_list, transaction_attrs, NULL);
	             3rd parameter set to NULL

Function parse_events(xx, xx, struct parse_events_error *err) dives
into a bison generated scanner and creates
parser state information for it first:

   struct parse_events_state parse_state = {
                .list   = LIST_HEAD_INIT(parse_state.list),
                .idx    = evlist->nr_entries,
                .error  = err,   <--- NULL POINTER !!!
                .evlist = evlist,
        };

Now various functions inside the bison scanner are called to end up in
__parse_events_add_pmu(struct parse_events_state *parse_state, ..) with
first parameter being a pointer to above structure definition.

Now the PMU event name is not found (because being executed in a VM) and
this function tries to create an error message with

   asprintf(&parse_state->error.str, ....)

which references a NULL pointer and dumps core.

Fix this by providing a pointer to the necessary error information
instead of NULL. Technically only the else part is needed to avoid the
core dump, just lets be safe...

Signed-off-by: Thomas Richter <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Hendrik Brueckner <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue May 31, 2018
[ Upstream commit 2bbea6e ]

when mounting an ISO filesystem sometimes (very rarely)
the system hangs because of a race condition between two tasks.

PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
 #0 [ffff880078447ae0] __schedule at ffffffff8168d605
 #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
 #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
 #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
 #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
 #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
 #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
 #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
 #8 [ffff880078447da8] mount_bdev at ffffffff81202570
 #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
#10 [ffff880078447e28] mount_fs at ffffffff81202d09
#11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
#12 [ffff880078447ea8] do_mount at ffffffff81220fee
#13 [ffff880078447f28] sys_mount at ffffffff812218d6
#14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
    RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
    RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
    R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
    R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
    ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b

This task was trying to mount the cdrom.  It allocated and configured a
super_block struct and owned the write-lock for the super_block->s_umount
rwsem. While exclusively owning the s_umount lock, it called
sr_block_ioctl and waited to acquire the global sr_mutex lock.

PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
 #0 [ffff880078417898] __schedule at ffffffff8168d605
 #1 [ffff880078417900] schedule at ffffffff8168dc59
 #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
 #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
 #4 [ffff8800784179d0] down_read at ffffffff8168cde0
 #5 [ffff8800784179e8] get_super at ffffffff81201cc7
 #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
 #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
 #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
 #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
#10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
#11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
#12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
#13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
#14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
#15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
#16 [ffff880078417d00] do_last at ffffffff8120d53d
#17 [ffff880078417db0] path_openat at ffffffff8120e6b2
#18 [ffff880078417e48] do_filp_open at ffffffff8121082b
#19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
#20 [ffff880078417f70] sys_open at ffffffff811fde4e
#21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
    RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
    RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
    RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
    RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
    R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
    R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
    ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b

This task tried to open the cdrom device, the sr_block_open function
acquired the global sr_mutex lock. The call to check_disk_change()
then saw an event flag indicating a possible media change and tried
to flush any cached data for the device.
As part of the flush, it tried to acquire the super_block->s_umount
lock associated with the cdrom device.
This was the same super_block as created and locked by the previous task.

The first task acquires the s_umount lock and then the sr_mutex_lock;
the second task acquires the sr_mutex_lock and then the s_umount lock.

This patch fixes the issue by moving check_disk_change() out of
cdrom_open() and let the caller take care of it.

Signed-off-by: Maurizio Lombardi <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Jun 20, 2018
[ Upstream commit 9954b80 ]

platform_domain_notifier contains a variable sized array, which the
pm_clk_notify() notifier treats as a NULL terminated array:

     for (con_id = clknb->con_ids; *con_id; con_id++)
             pm_clk_add(dev, *con_id);

Omitting the initialiser for con_ids means that the array is zero
sized, and there is no NULL terminator.  This leads to pm_clk_notify()
overrunning into what ever structure follows, which may not be NULL.
This leads to an oops:

Unable to handle kernel NULL pointer dereference at virtual address 0000008c
pgd = c0003000
[0000008c] *pgd=80000800004003c, *pmd=00000000c
Internal error: Oops: 206 [#1] PREEMPT SMP ARM
Modules linked in:c
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #9
Hardware name: Keystone
PC is at strlen+0x0/0x34
LR is at kstrdup+0x18/0x54
pc : [<c0623340>]    lr : [<c0111d6c>]    psr: 20000013
sp : eec73dc0  ip : eed780c0  fp : 00000001
r10: 00000000  r9 : 00000000  r8 : eed71e10
r7 : 0000008c  r6 : 0000008c  r5 : 014000c0  r4 : c03a6ff4
r3 : c09445d0  r2 : 00000000  r1 : 014000c0  r0 : 0000008c
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: 00003000  DAC: fffffffd
Process swapper/0 (pid: 1, stack limit = 0xeec72210)
Stack: (0xeec73dc0 to 0xeec74000)
...
[<c0623340>] (strlen) from [<c0111d6c>] (kstrdup+0x18/0x54)
[<c0111d6c>] (kstrdup) from [<c03a6ff4>] (__pm_clk_add+0x58/0x120)
[<c03a6ff4>] (__pm_clk_add) from [<c03a731c>] (pm_clk_notify+0x64/0xa8)
[<c03a731c>] (pm_clk_notify) from [<c004614c>] (notifier_call_chain+0x44/0x84)
[<c004614c>] (notifier_call_chain) from [<c0046320>] (__blocking_notifier_call_chain+0x48/0x60)
[<c0046320>] (__blocking_notifier_call_chain) from [<c0046350>] (blocking_notifier_call_chain+0x18/0x20)
[<c0046350>] (blocking_notifier_call_chain) from [<c0390234>] (device_add+0x36c/0x534)
[<c0390234>] (device_add) from [<c047fc00>] (of_platform_device_create_pdata+0x70/0xa4)
[<c047fc00>] (of_platform_device_create_pdata) from [<c047fea0>] (of_platform_bus_create+0xf0/0x1ec)
[<c047fea0>] (of_platform_bus_create) from [<c047fff8>] (of_platform_populate+0x5c/0xac)
[<c047fff8>] (of_platform_populate) from [<c08b1f04>] (of_platform_default_populate_init+0x8c/0xa8)
[<c08b1f04>] (of_platform_default_populate_init) from [<c000a78c>] (do_one_initcall+0x3c/0x164)
[<c000a78c>] (do_one_initcall) from [<c087bd9c>] (kernel_init_freeable+0x10c/0x1d0)
[<c087bd9c>] (kernel_init_freeable) from [<c0628db0>] (kernel_init+0x8/0xf0)
[<c0628db0>] (kernel_init) from [<c00090d8>] (ret_from_fork+0x14/0x3c)
Exception stack(0xeec73fb0 to 0xeec73ff8)
3fa0:                                     00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e3520000 1afffff7 e12fff1e c0801730 (e5d02000)
---[ end trace cafa8f148e262e80 ]---

Fix this by adding the necessary initialiser.

Fixes: fc20ffe ("ARM: keystone: add PM domain support for clock management")
Signed-off-by: Russell King <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Olof Johansson <[email protected]>

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Jun 20, 2018
[ Upstream commit 9954b80 ]

platform_domain_notifier contains a variable sized array, which the
pm_clk_notify() notifier treats as a NULL terminated array:

     for (con_id = clknb->con_ids; *con_id; con_id++)
             pm_clk_add(dev, *con_id);

Omitting the initialiser for con_ids means that the array is zero
sized, and there is no NULL terminator.  This leads to pm_clk_notify()
overrunning into what ever structure follows, which may not be NULL.
This leads to an oops:

Unable to handle kernel NULL pointer dereference at virtual address 0000008c
pgd = c0003000
[0000008c] *pgd=80000800004003c, *pmd=00000000c
Internal error: Oops: 206 [#1] PREEMPT SMP ARM
Modules linked in:c
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #9
Hardware name: Keystone
PC is at strlen+0x0/0x34
LR is at kstrdup+0x18/0x54
pc : [<c0623340>]    lr : [<c0111d6c>]    psr: 20000013
sp : eec73dc0  ip : eed780c0  fp : 00000001
r10: 00000000  r9 : 00000000  r8 : eed71e10
r7 : 0000008c  r6 : 0000008c  r5 : 014000c0  r4 : c03a6ff4
r3 : c09445d0  r2 : 00000000  r1 : 014000c0  r0 : 0000008c
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 30c5387d  Table: 00003000  DAC: fffffffd
Process swapper/0 (pid: 1, stack limit = 0xeec72210)
Stack: (0xeec73dc0 to 0xeec74000)
...
[<c0623340>] (strlen) from [<c0111d6c>] (kstrdup+0x18/0x54)
[<c0111d6c>] (kstrdup) from [<c03a6ff4>] (__pm_clk_add+0x58/0x120)
[<c03a6ff4>] (__pm_clk_add) from [<c03a731c>] (pm_clk_notify+0x64/0xa8)
[<c03a731c>] (pm_clk_notify) from [<c004614c>] (notifier_call_chain+0x44/0x84)
[<c004614c>] (notifier_call_chain) from [<c0046320>] (__blocking_notifier_call_chain+0x48/0x60)
[<c0046320>] (__blocking_notifier_call_chain) from [<c0046350>] (blocking_notifier_call_chain+0x18/0x20)
[<c0046350>] (blocking_notifier_call_chain) from [<c0390234>] (device_add+0x36c/0x534)
[<c0390234>] (device_add) from [<c047fc00>] (of_platform_device_create_pdata+0x70/0xa4)
[<c047fc00>] (of_platform_device_create_pdata) from [<c047fea0>] (of_platform_bus_create+0xf0/0x1ec)
[<c047fea0>] (of_platform_bus_create) from [<c047fff8>] (of_platform_populate+0x5c/0xac)
[<c047fff8>] (of_platform_populate) from [<c08b1f04>] (of_platform_default_populate_init+0x8c/0xa8)
[<c08b1f04>] (of_platform_default_populate_init) from [<c000a78c>] (do_one_initcall+0x3c/0x164)
[<c000a78c>] (do_one_initcall) from [<c087bd9c>] (kernel_init_freeable+0x10c/0x1d0)
[<c087bd9c>] (kernel_init_freeable) from [<c0628db0>] (kernel_init+0x8/0xf0)
[<c0628db0>] (kernel_init) from [<c00090d8>] (ret_from_fork+0x14/0x3c)
Exception stack(0xeec73fb0 to 0xeec73ff8)
3fa0:                                     00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Code: e3520000 1afffff7 e12fff1e c0801730 (e5d02000)
---[ end trace cafa8f148e262e80 ]---

Fix this by adding the necessary initialiser.

Fixes: fc20ffe ("ARM: keystone: add PM domain support for clock management")
Signed-off-by: Russell King <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Olof Johansson <[email protected]>

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Jul 25, 2018
commit 36eb8ff upstream.

Crash dump shows following instructions

crash> bt
PID: 0      TASK: ffffffffbe412480  CPU: 0   COMMAND: "swapper/0"
 #0 [ffff891ee0003868] machine_kexec at ffffffffbd063ef1
 #1 [ffff891ee00038c8] __crash_kexec at ffffffffbd12b6f2
 #2 [ffff891ee0003998] crash_kexec at ffffffffbd12c84c
 #3 [ffff891ee00039b8] oops_end at ffffffffbd030f0a
 #4 [ffff891ee00039e0] no_context at ffffffffbd074643
 #5 [ffff891ee0003a40] __bad_area_nosemaphore at ffffffffbd07496e
 #6 [ffff891ee0003a90] bad_area_nosemaphore at ffffffffbd074a64
 #7 [ffff891ee0003aa0] __do_page_fault at ffffffffbd074b0a
 #8 [ffff891ee0003b18] do_page_fault at ffffffffbd074fc8
 #9 [ffff891ee0003b50] page_fault at ffffffffbda01925
    [exception RIP: qlt_schedule_sess_for_deletion+15]
    RIP: ffffffffc02e526f  RSP: ffff891ee0003c08  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffffffffc0307847
    RDX: 00000000000020e6  RSI: ffff891edbc377c8  RDI: 0000000000000000
    RBP: ffff891ee0003c18   R8: ffffffffc02f0b20   R9: 0000000000000250
    R10: 0000000000000258  R11: 000000000000b780  R12: ffff891ed9b43000
    R13: 00000000000000f0  R14: 0000000000000006  R15: ffff891edbc377c8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ffff891ee0003c20] qla2x00_fcport_event_handler at ffffffffc02853d3 [qla2xxx]
 #11 [ffff891ee0003cf0] __dta_qla24xx_async_gnl_sp_done_333 at ffffffffc0285a1d [qla2xxx]
 #12 [ffff891ee0003de8] qla24xx_process_response_queue at ffffffffc02a2eb5 [qla2xxx]
 #13 [ffff891ee0003e88] qla24xx_msix_rsp_q at ffffffffc02a5403 [qla2xxx]
 #14 [ffff891ee0003ec0] __handle_irq_event_percpu at ffffffffbd0f4c59
 #15 [ffff891ee0003f10] handle_irq_event_percpu at ffffffffbd0f4e02
 #16 [ffff891ee0003f40] handle_irq_event at ffffffffbd0f4e90
 #17 [ffff891ee0003f68] handle_edge_irq at ffffffffbd0f8984
 #18 [ffff891ee0003f88] handle_irq at ffffffffbd0305d5
 #19 [ffff891ee0003fb8] do_IRQ at ffffffffbda02a18
 --- <IRQ stack> ---
 #20 [ffffffffbe403d30] ret_from_intr at ffffffffbda0094e
    [exception RIP: unknown or invalid address]
    RIP: 000000000000001f  RSP: 0000000000000000  RFLAGS: fff3b8c2091ebb3f
    RAX: ffffbba5a0000200  RBX: 0000be8cdfa8f9fa  RCX: 0000000000000018
    RDX: 0000000000000101  RSI: 000000000000015d  RDI: 0000000000000193
    RBP: 0000000000000083   R8: ffffffffbe403e38   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffffffbe56b820  R12: ffff891ee001cf00
    R13: ffffffffbd11c0a4  R14: ffffffffbe403d60  R15: 0000000000000001
    ORIG_RAX: ffff891ee0022ac0  CS: 0000  SS: ffffffffffffffb9
 bt: WARNING: possibly bogus exception frame
 #21 [ffffffffbe403dd8] cpuidle_enter_state at ffffffffbd67c6fd
 #22 [ffffffffbe403e40] cpuidle_enter at ffffffffbd67c907
 #23 [ffffffffbe403e50] call_cpuidle at ffffffffbd0d98f3
 #24 [ffffffffbe403e60] do_idle at ffffffffbd0d9b42
 #25 [ffffffffbe403e98] cpu_startup_entry at ffffffffbd0d9da3
 #26 [ffffffffbe403ec0] rest_init at ffffffffbd81d4aa
 #27 [ffffffffbe403ed0] start_kernel at ffffffffbe67d2ca
 #28 [ffffffffbe403f28] x86_64_start_reservations at ffffffffbe67c675
 #29 [ffffffffbe403f38] x86_64_start_kernel at ffffffffbe67c6eb
 #30 [ffffffffbe403f50] secondary_startup_64 at ffffffffbd0000d5

Fixes: 040036b ("scsi: qla2xxx: Delay loop id allocation at login")
Cc: <[email protected]> # v4.17+
Signed-off-by: Chuck Anderson <[email protected]>
Signed-off-by: Himanshu Madhani <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Jul 28, 2018
[ Upstream commit 2efd4fc ]

Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:

  BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
  CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
  Google 01/01/2011
  Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x5ef/0x860 net/core/scm.c:242
    ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
    ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
    rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
    [..]

This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.

With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.

Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.

Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: [email protected]
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Jul 28, 2018
[ Upstream commit 2efd4fc ]

Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:

  BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
  CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
  Google 01/01/2011
  Call Trace:
    __dump_stack lib/dump_stack.c:77 [inline]
    dump_stack+0x185/0x1d0 lib/dump_stack.c:113
    kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
    kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
    kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
    copy_to_user include/linux/uaccess.h:184 [inline]
    put_cmsg+0x5ef/0x860 net/core/scm.c:242
    ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
    ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
    rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
    [..]

This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.

With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.

Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.

Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: [email protected]
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Aug 3, 2018
…ilure

commit b512719 upstream.

While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.

Commit c96f547 ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays.  If
%current succeeded init while @p didn't, it leads to the following
crash.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
 IP: __delayacct_blkio_end+0xc/0x40
 PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
 RIP: 0010:__delayacct_blkio_end+0xc/0x40
 Call Trace:
  try_to_wake_up+0x2c0/0x600
  autoremove_wake_function+0xe/0x30
  __wake_up_common+0x74/0x120
  wake_up_page_bit+0x9c/0xe0
  mpage_end_io+0x27/0x70
  blk_update_request+0x78/0x2c0
  scsi_end_request+0x2c/0x1e0
  scsi_io_completion+0x20b/0x5f0
  blk_mq_complete_request+0xa2/0x100
  ata_scsi_qc_complete+0x79/0x400
  ata_qc_complete_multiple+0x86/0xd0
  ahci_handle_port_interrupt+0xc9/0x5c0
  ahci_handle_port_intr+0x54/0xb0
  ahci_single_level_irq_intr+0x3b/0x60
  __handle_irq_event_percpu+0x43/0x190
  handle_irq_event_percpu+0x20/0x50
  handle_irq_event+0x2a/0x50
  handle_edge_irq+0x80/0x1c0
  handle_irq+0xaf/0x120
  do_IRQ+0x41/0xc0
  common_interrupt+0xf/0xf

Fix it by updating delayacct_blkio_end() check @p->delays instead.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: c96f547 ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Dave Jones <[email protected]>
Debugged-by: Dave Jones <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Josh Snyder <[email protected]>
Cc: <[email protected]>	[4.15+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Aug 3, 2018
[ Upstream commit 4f4616c ]

Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted.  This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.

Fixes the crash:

PID: 106676  TASK: ffff9a436aa90000  CPU: 12  COMMAND: "multipathd"
 #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a
 #1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512
 #2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600
 #3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768
 #4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52
 #5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9
 #6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379
 #7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf
 #8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915
 #9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768
    [exception RIP: qedf_init_task+61]
    RIP: ffffffffc0e13c2d  RSP: ffff9a43567d38d0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffffbe920472c738  RCX: ffff9a434fa0e3e8
    RDX: ffff9a434f695280  RSI: ffffbe920472c738  RDI: ffff9a43aa359c80
    RBP: ffff9a43567d3950   R8: 0000000000000c15   R9: ffff9a3fb09b9880
    R10: ffff9a434fa0e3e8  R11: ffff9a43567d35ce  R12: 0000000000000000
    R13: ffff9a434f695280  R14: ffff9a43aa359c80  R15: ffff9a3fb9e005c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Chad Dupuis <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Aug 3, 2018
…ilure

commit b512719 upstream.

While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.

Commit c96f547 ("delayacct: Account blkio completion on the correct
task"), while updating delayacct_blkio_end() to take the target task
instead of always using %current, made the function test NULL on
%current->delays and then continue to operated on @p->delays.  If
%current succeeded init while @p didn't, it leads to the following
crash.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
 IP: __delayacct_blkio_end+0xc/0x40
 PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
 Oops: 0000 [#1] SMP PTI
 CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
 RIP: 0010:__delayacct_blkio_end+0xc/0x40
 Call Trace:
  try_to_wake_up+0x2c0/0x600
  autoremove_wake_function+0xe/0x30
  __wake_up_common+0x74/0x120
  wake_up_page_bit+0x9c/0xe0
  mpage_end_io+0x27/0x70
  blk_update_request+0x78/0x2c0
  scsi_end_request+0x2c/0x1e0
  scsi_io_completion+0x20b/0x5f0
  blk_mq_complete_request+0xa2/0x100
  ata_scsi_qc_complete+0x79/0x400
  ata_qc_complete_multiple+0x86/0xd0
  ahci_handle_port_interrupt+0xc9/0x5c0
  ahci_handle_port_intr+0x54/0xb0
  ahci_single_level_irq_intr+0x3b/0x60
  __handle_irq_event_percpu+0x43/0x190
  handle_irq_event_percpu+0x20/0x50
  handle_irq_event+0x2a/0x50
  handle_edge_irq+0x80/0x1c0
  handle_irq+0xaf/0x120
  do_IRQ+0x41/0xc0
  common_interrupt+0xf/0xf

Fix it by updating delayacct_blkio_end() check @p->delays instead.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: c96f547 ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Dave Jones <[email protected]>
Debugged-by: Dave Jones <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Josh Snyder <[email protected]>
Cc: <[email protected]>	[4.15+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Aug 3, 2018
[ Upstream commit 4f4616c ]

Similar to what we do when we remove a PCI function, set the
QEDF_UNLOADING flag to prevent any requests from being queued while a
vport is being deleted.  This prevents any requests from getting stuck
in limbo when the vport is unloaded or deleted.

Fixes the crash:

PID: 106676  TASK: ffff9a436aa90000  CPU: 12  COMMAND: "multipathd"
 #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a
 #1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512
 #2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600
 #3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768
 #4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52
 #5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9
 #6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379
 #7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf
 #8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915
 #9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768
    [exception RIP: qedf_init_task+61]
    RIP: ffffffffc0e13c2d  RSP: ffff9a43567d38d0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: ffffbe920472c738  RCX: ffff9a434fa0e3e8
    RDX: ffff9a434f695280  RSI: ffffbe920472c738  RDI: ffff9a43aa359c80
    RBP: ffff9a43567d3950   R8: 0000000000000c15   R9: ffff9a3fb09b9880
    R10: ffff9a434fa0e3e8  R11: ffff9a43567d35ce  R12: 0000000000000000
    R13: ffff9a434f695280  R14: ffff9a43aa359c80  R15: ffff9a3fb9e005c0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Signed-off-by: Chad Dupuis <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Aug 13, 2018
commit 89da619 upstream.

Kernel panic when with high memory pressure, calltrace looks like,

PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
 #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
 #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
 #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
 #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
    [exception RIP: _raw_spin_lock_irqsave+47]
    RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
    RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
    RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
    RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
    R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.

Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.

It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.

Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: [email protected]
Signed-off-by: Jiang Biao <[email protected]>
Signed-off-by: Huang Chong <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Aug 13, 2018
commit 89da619 upstream.

Kernel panic when with high memory pressure, calltrace looks like,

PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java"
 #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb
 #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942
 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30
 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8
 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46
 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc
 #6 [ffff881ec7ed7838] __node_set at ffffffff81680300
 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f
 #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5
 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8
    [exception RIP: _raw_spin_lock_irqsave+47]
    RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046
    RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8
    RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008
    RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098
    R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000
    R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

It happens in the pagefault and results in double pagefault
during compacting pages when memory allocation fails.

Analysed the vmcore, the page leads to second pagefault is corrupted
with _mapcount=-256, but private=0.

It's caused by the race between migration and ballooning, and lock
missing in virtballoon_migratepage() of virtio_balloon driver.
This patch fix the bug.

Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages")
Cc: [email protected]
Signed-off-by: Jiang Biao <[email protected]>
Signed-off-by: Huang Chong <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Sep 9, 2018
[ Upstream commit 934140a ]

cachefiles_read_waiter() has the right to access a 'monitor' object by
virtue of being called under the waitqueue lock for one of the pages in its
purview.  However, it has no ref on that monitor object or on the
associated operation.

What it is allowed to do is to move the monitor object to the operation's
to_do list, but once it drops the work_lock, it's actually no longer
permitted to access that object.  However, it is trying to enqueue the
retrieval operation for processing - but it can only do this via a pointer
in the monitor object, something it shouldn't be doing.

If it doesn't enqueue the operation, the operation may not get processed.
If the order is flipped so that the enqueue is first, then it's possible
for the work processor to look at the to_do list before the monitor is
enqueued upon it.

Fix this by getting a ref on the operation so that we can trust that it
will still be there once we've added the monitor to the to_do list and
dropped the work_lock.  The op can then be enqueued after the lock is
dropped.

The bug can manifest in one of a couple of ways.  The first manifestation
looks like:

 FS-Cache:
 FS-Cache: Assertion failed
 FS-Cache: 6 == 5 is false
 ------------[ cut here ]------------
 kernel BUG at fs/fscache/operation.c:494!
 RIP: 0010:fscache_put_operation+0x1e3/0x1f0
 ...
 fscache_op_work_func+0x26/0x50
 process_one_work+0x131/0x290
 worker_thread+0x45/0x360
 kthread+0xf8/0x130
 ? create_worker+0x190/0x190
 ? kthread_cancel_work_sync+0x10/0x10
 ret_from_fork+0x1f/0x30

This is due to the operation being in the DEAD state (6) rather than
INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
fscache_put_operation().

The bug can also manifest like the following:

 kernel BUG at fs/fscache/operation.c:69!
 ...
    [exception RIP: fscache_enqueue_operation+246]
 ...
 #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028

I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
entirely clear which assertion failed.

Fixes: 9ae326a ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Lei Xue <[email protected]>
Reported-by: Vegard Nossum <[email protected]>
Reported-by: Anthony DeRobertis <[email protected]>
Reported-by: NeilBrown <[email protected]>
Reported-by: Daniel Axtens <[email protected]>
Reported-by: Kiran Kumar Modukuri <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Daniel Axtens <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Sep 10, 2018
commit a5ba1d9 upstream.

We have reports of the following crash:

    PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
    #0 [ffff88085c6db710] machine_kexec at ffffffff81046239
    #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248
    #2 [ffff88085c6db830] oops_end at ffffffff81008ae7
    #3 [ffff88085c6db860] no_context at ffffffff81050b8f
    #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75
    #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83
    #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e
    #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c
    #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122
    [exception RIP: uart_put_char+149]
    RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006
    RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081
    RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120
    RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320
    R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000
    R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544
    #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c
    #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b
    #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2
    #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b
    #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a
    #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016
    #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194
    #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a
    #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2
    #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d
    #20 [ffff88085c6dbeb0] kthread at ffffffff81096384
    #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f​

after slogging through some dissasembly:

ffffffff814b6720 <uart_put_char>:
ffffffff814b6720:	55                   	push   %rbp
ffffffff814b6721:	48 89 e5             	mov    %rsp,%rbp
ffffffff814b6724:	48 83 ec 20          	sub    $0x20,%rsp
ffffffff814b6728:	48 89 1c 24          	mov    %rbx,(%rsp)
ffffffff814b672c:	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
ffffffff814b6731:	4c 89 6c 24 10       	mov    %r13,0x10(%rsp)
ffffffff814b6736:	4c 89 74 24 18       	mov    %r14,0x18(%rsp)
ffffffff814b673b:	e8 b0 8e 58 00       	callq  ffffffff81a3f5f0 <mcount>
ffffffff814b6740:	4c 8b a7 88 02 00 00 	mov    0x288(%rdi),%r12
ffffffff814b6747:	45 31 ed             	xor    %r13d,%r13d
ffffffff814b674a:	41 89 f6             	mov    %esi,%r14d
ffffffff814b674d:	49 83 bc 24 70 01 00 	cmpq   $0x0,0x170(%r12)
ffffffff814b6754:	00 00
ffffffff814b6756:	49 8b 9c 24 80 01 00 	mov    0x180(%r12),%rbx
ffffffff814b675d:	00
ffffffff814b675e:	74 2f                	je     ffffffff814b678f <uart_put_char+0x6f>
ffffffff814b6760:	48 89 df             	mov    %rbx,%rdi
ffffffff814b6763:	e8 a8 67 58 00       	callq  ffffffff81a3cf10 <_raw_spin_lock_irqsave>
ffffffff814b6768:	41 8b 8c 24 78 01 00 	mov    0x178(%r12),%ecx
ffffffff814b676f:	00
ffffffff814b6770:	89 ca                	mov    %ecx,%edx
ffffffff814b6772:	f7 d2                	not    %edx
ffffffff814b6774:	41 03 94 24 7c 01 00 	add    0x17c(%r12),%edx
ffffffff814b677b:	00
ffffffff814b677c:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b6782:	75 23                	jne    ffffffff814b67a7 <uart_put_char+0x87>
ffffffff814b6784:	48 89 c6             	mov    %rax,%rsi
ffffffff814b6787:	48 89 df             	mov    %rbx,%rdi
ffffffff814b678a:	e8 e1 64 58 00       	callq  ffffffff81a3cc70 <_raw_spin_unlock_irqrestore>
ffffffff814b678f:	44 89 e8             	mov    %r13d,%eax
ffffffff814b6792:	48 8b 1c 24          	mov    (%rsp),%rbx
ffffffff814b6796:	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
ffffffff814b679b:	4c 8b 6c 24 10       	mov    0x10(%rsp),%r13
ffffffff814b67a0:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff814b67a5:	c9                   	leaveq
ffffffff814b67a6:	c3                   	retq
ffffffff814b67a7:	49 8b 94 24 70 01 00 	mov    0x170(%r12),%rdx
ffffffff814b67ae:	00
ffffffff814b67af:	48 63 c9             	movslq %ecx,%rcx
ffffffff814b67b2:	41 b5 01             	mov    $0x1,%r13b
ffffffff814b67b5:	44 88 34 0a          	mov    %r14b,(%rdx,%rcx,1)
ffffffff814b67b9:	41 8b 94 24 78 01 00 	mov    0x178(%r12),%edx
ffffffff814b67c0:	00
ffffffff814b67c1:	83 c2 01             	add    $0x1,%edx
ffffffff814b67c4:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b67ca:	41 89 94 24 78 01 00 	mov    %edx,0x178(%r12)
ffffffff814b67d1:	00
ffffffff814b67d2:	eb b0                	jmp    ffffffff814b6784 <uart_put_char+0x64>
ffffffff814b67d4:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff814b67db:	00 00 00 00 00

for our build, this is crashing at:

    circ->buf[circ->head] = c;

Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.

Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.

To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.

v2: switch to locking uport->lock on allocation/deallocation instead of
    locking the per-port mutex in uart_put_char. Note that since
    uport->lock is a spin lock, we have to switch the allocation to
    GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
    GFP_KERNEL

Signed-off-by: Tycho Andersen <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Sep 10, 2018
commit a5ba1d9 upstream.

We have reports of the following crash:

    PID: 7 TASK: ffff88085c6d61c0 CPU: 1 COMMAND: "kworker/u25:0"
    #0 [ffff88085c6db710] machine_kexec at ffffffff81046239
    #1 [ffff88085c6db760] crash_kexec at ffffffff810fc248
    #2 [ffff88085c6db830] oops_end at ffffffff81008ae7
    #3 [ffff88085c6db860] no_context at ffffffff81050b8f
    #4 [ffff88085c6db8b0] __bad_area_nosemaphore at ffffffff81050d75
    #5 [ffff88085c6db900] bad_area_nosemaphore at ffffffff81050e83
    #6 [ffff88085c6db910] __do_page_fault at ffffffff8105132e
    #7 [ffff88085c6db9b0] do_page_fault at ffffffff8105152c
    #8 [ffff88085c6db9c0] page_fault at ffffffff81a3f122
    [exception RIP: uart_put_char+149]
    RIP: ffffffff814b67b5 RSP: ffff88085c6dba78 RFLAGS: 00010006
    RAX: 0000000000000292 RBX: ffffffff827c5120 RCX: 0000000000000081
    RDX: 0000000000000000 RSI: 000000000000005f RDI: ffffffff827c5120
    RBP: ffff88085c6dba98 R8: 000000000000012c R9: ffffffff822ea320
    R10: ffff88085fe4db04 R11: 0000000000000001 R12: ffff881059f9c000
    R13: 0000000000000001 R14: 000000000000005f R15: 0000000000000fba
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #9 [ffff88085c6dbaa0] tty_put_char at ffffffff81497544
    #10 [ffff88085c6dbac0] do_output_char at ffffffff8149c91c
    #11 [ffff88085c6dbae0] __process_echoes at ffffffff8149cb8b
    #12 [ffff88085c6dbb30] commit_echoes at ffffffff8149cdc2
    #13 [ffff88085c6dbb60] n_tty_receive_buf_fast at ffffffff8149e49b
    #14 [ffff88085c6dbbc0] __receive_buf at ffffffff8149ef5a
    #15 [ffff88085c6dbc20] n_tty_receive_buf_common at ffffffff8149f016
    #16 [ffff88085c6dbca0] n_tty_receive_buf2 at ffffffff8149f194
    #17 [ffff88085c6dbcb0] flush_to_ldisc at ffffffff814a238a
    #18 [ffff88085c6dbd50] process_one_work at ffffffff81090be2
    #19 [ffff88085c6dbe20] worker_thread at ffffffff81091b4d
    #20 [ffff88085c6dbeb0] kthread at ffffffff81096384
    #21 [ffff88085c6dbf50] ret_from_fork at ffffffff81a3d69f​

after slogging through some dissasembly:

ffffffff814b6720 <uart_put_char>:
ffffffff814b6720:	55                   	push   %rbp
ffffffff814b6721:	48 89 e5             	mov    %rsp,%rbp
ffffffff814b6724:	48 83 ec 20          	sub    $0x20,%rsp
ffffffff814b6728:	48 89 1c 24          	mov    %rbx,(%rsp)
ffffffff814b672c:	4c 89 64 24 08       	mov    %r12,0x8(%rsp)
ffffffff814b6731:	4c 89 6c 24 10       	mov    %r13,0x10(%rsp)
ffffffff814b6736:	4c 89 74 24 18       	mov    %r14,0x18(%rsp)
ffffffff814b673b:	e8 b0 8e 58 00       	callq  ffffffff81a3f5f0 <mcount>
ffffffff814b6740:	4c 8b a7 88 02 00 00 	mov    0x288(%rdi),%r12
ffffffff814b6747:	45 31 ed             	xor    %r13d,%r13d
ffffffff814b674a:	41 89 f6             	mov    %esi,%r14d
ffffffff814b674d:	49 83 bc 24 70 01 00 	cmpq   $0x0,0x170(%r12)
ffffffff814b6754:	00 00
ffffffff814b6756:	49 8b 9c 24 80 01 00 	mov    0x180(%r12),%rbx
ffffffff814b675d:	00
ffffffff814b675e:	74 2f                	je     ffffffff814b678f <uart_put_char+0x6f>
ffffffff814b6760:	48 89 df             	mov    %rbx,%rdi
ffffffff814b6763:	e8 a8 67 58 00       	callq  ffffffff81a3cf10 <_raw_spin_lock_irqsave>
ffffffff814b6768:	41 8b 8c 24 78 01 00 	mov    0x178(%r12),%ecx
ffffffff814b676f:	00
ffffffff814b6770:	89 ca                	mov    %ecx,%edx
ffffffff814b6772:	f7 d2                	not    %edx
ffffffff814b6774:	41 03 94 24 7c 01 00 	add    0x17c(%r12),%edx
ffffffff814b677b:	00
ffffffff814b677c:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b6782:	75 23                	jne    ffffffff814b67a7 <uart_put_char+0x87>
ffffffff814b6784:	48 89 c6             	mov    %rax,%rsi
ffffffff814b6787:	48 89 df             	mov    %rbx,%rdi
ffffffff814b678a:	e8 e1 64 58 00       	callq  ffffffff81a3cc70 <_raw_spin_unlock_irqrestore>
ffffffff814b678f:	44 89 e8             	mov    %r13d,%eax
ffffffff814b6792:	48 8b 1c 24          	mov    (%rsp),%rbx
ffffffff814b6796:	4c 8b 64 24 08       	mov    0x8(%rsp),%r12
ffffffff814b679b:	4c 8b 6c 24 10       	mov    0x10(%rsp),%r13
ffffffff814b67a0:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff814b67a5:	c9                   	leaveq
ffffffff814b67a6:	c3                   	retq
ffffffff814b67a7:	49 8b 94 24 70 01 00 	mov    0x170(%r12),%rdx
ffffffff814b67ae:	00
ffffffff814b67af:	48 63 c9             	movslq %ecx,%rcx
ffffffff814b67b2:	41 b5 01             	mov    $0x1,%r13b
ffffffff814b67b5:	44 88 34 0a          	mov    %r14b,(%rdx,%rcx,1)
ffffffff814b67b9:	41 8b 94 24 78 01 00 	mov    0x178(%r12),%edx
ffffffff814b67c0:	00
ffffffff814b67c1:	83 c2 01             	add    $0x1,%edx
ffffffff814b67c4:	81 e2 ff 0f 00 00    	and    $0xfff,%edx
ffffffff814b67ca:	41 89 94 24 78 01 00 	mov    %edx,0x178(%r12)
ffffffff814b67d1:	00
ffffffff814b67d2:	eb b0                	jmp    ffffffff814b6784 <uart_put_char+0x64>
ffffffff814b67d4:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff814b67db:	00 00 00 00 00

for our build, this is crashing at:

    circ->buf[circ->head] = c;

Looking in uart_port_startup(), it seems that circ->buf (state->xmit.buf)
protected by the "per-port mutex", which based on uart_port_check() is
state->port.mutex. Indeed, the lock acquired in uart_put_char() is
uport->lock, i.e. not the same lock.

Anyway, since the lock is not acquired, if uart_shutdown() is called, the
last chunk of that function may release state->xmit.buf before its assigned
to null, and cause the race above.

To fix it, let's lock uport->lock when allocating/deallocating
state->xmit.buf in addition to the per-port mutex.

v2: switch to locking uport->lock on allocation/deallocation instead of
    locking the per-port mutex in uart_put_char. Note that since
    uport->lock is a spin lock, we have to switch the allocation to
    GFP_ATOMIC.
v3: move the allocation outside the lock, so we can switch back to
    GFP_KERNEL

Signed-off-by: Tycho Andersen <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Oct 1, 2018
[ Upstream commit 46b3722 ]

We occasionaly hit following assert failure in 'perf top', when processing the
/proc info in multiple threads.

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
      at ...include/linux/refcount.h:109
  #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
      at util/comm.c:24
  #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:72
  #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:95
  #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
      timestamp=0, exec=false) at util/comm.c:111
  #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
  #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
  #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

The problem is that we keep global comm_str_root list, which
is accessed by multiple threads during the 'perf top' startup
and following 2 paths can race:

  thread 1:
    ...
    thread__new
      comm__new
        comm_str__findnew
          down_write(&comm_str_lock);
          __comm_str__findnew
            comm_str__get

  thread 2:
    ...
    comm__override or comm__free
      comm_str__put
        refcount_dec_and_test
          down_write(&comm_str_lock);
          rb_erase(&cs->rb_node, &comm_str_root);

Because thread 2 first decrements the refcnt and only after then it removes the
struct comm_str from the list, the thread 1 can find this object on the list
with refcnt equls to 0 and hit the assert.

This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
that already dropped the refcnt to 0. For the rest of the objects we take the
refcnt before comparing its name and release it afterwards with comm_str__put,
which can also release the object completely.

Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Lukasz Odzioba <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Oct 1, 2018
[ Upstream commit 46b3722 ]

We occasionaly hit following assert failure in 'perf top', when processing the
/proc info in multiple threads.

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  [Switching to Thread 0x7ffff11ba700 (LWP 13749)]
  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535373 in refcount_inc (r=0x7fffdc009be0)
      at ...include/linux/refcount.h:109
  #5  0x00000000005354f1 in comm_str__get (cs=0x7fffdc009bc0)
      at util/comm.c:24
  #6  0x00000000005356bd in __comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:72
  #7  0x000000000053579e in comm_str__findnew (str=0x7fffd000b260 ":2",
      root=0xbed5c0 <comm_str_root>) at util/comm.c:95
  #8  0x000000000053582e in comm__new (str=0x7fffd000b260 ":2",
      timestamp=0, exec=false) at util/comm.c:111
  #9  0x00000000005363bc in thread__new (pid=2, tid=2) at util/thread.c:57
  #10 0x0000000000523da0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:457
  #11 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

The problem is that we keep global comm_str_root list, which
is accessed by multiple threads during the 'perf top' startup
and following 2 paths can race:

  thread 1:
    ...
    thread__new
      comm__new
        comm_str__findnew
          down_write(&comm_str_lock);
          __comm_str__findnew
            comm_str__get

  thread 2:
    ...
    comm__override or comm__free
      comm_str__put
        refcount_dec_and_test
          down_write(&comm_str_lock);
          rb_erase(&cs->rb_node, &comm_str_root);

Because thread 2 first decrements the refcnt and only after then it removes the
struct comm_str from the list, the thread 1 can find this object on the list
with refcnt equls to 0 and hit the assert.

This patch fixes the thread 1 __comm_str__findnew path, by ignoring objects
that already dropped the refcnt to 0. For the rest of the objects we take the
refcnt before comparing its name and release it afterwards with comm_str__put,
which can also release the object completely.

Signed-off-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Lukasz Odzioba <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/20180720101740.GA27176@krava
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Oct 1, 2018
commit 889c695 upstream.

Deferred struct page init is needed only on systems with large amount of
physical memory to improve boot performance.  32-bit systems do not
benefit from this feature.

Jiri reported a problem where deferred struct pages do not work well with
x86-32:

[    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.036269] Initializing CPU#0
[    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
[    0.038459] page:f6780000 is uninitialized and poisoned
[    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
[    0.040038] ------------[ cut here ]------------
[    0.040399] kernel BUG at include/linux/page-flags.h:293!
[    0.040823] invalid opcode: 0000 [#1] SMP PTI
[    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
[    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.042496] EIP: free_highmem_page+0x64/0x80
[    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
[    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
[    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
[    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
[    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
[    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.046913] DR6: fffe0ff0 DR7: 00000400
[    0.047220] Call Trace:
[    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
[    0.047854]  set_highmem_pages_init+0x5b/0x71
[    0.048202]  mem_init+0x2b/0x1e8
[    0.048460]  start_kernel+0x1d2/0x425
[    0.048757]  i386_start_kernel+0x93/0x97
[    0.049073]  startup_32_smp+0x164/0x168
[    0.049379] Modules linked in:
[    0.049626] ---[ end trace 337949378db0abbb ]---

We free highmem pages before their struct pages are initialized:

mem_init()
 set_highmem_pages_init()
  add_highpages_with_active_regions()
   free_highmem_page()
    .. Access uninitialized struct page here..

Because there is no reason to have this feature on 32-bit systems, just
disable it.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2e3ca40 ("mm: relax deferred struct page requirements")
Signed-off-by: Pavel Tatashin <[email protected]>
Reported-by: Jiri Slaby <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Oct 12, 2018
Deferred struct page init is needed only on systems with large amount of
physical memory to improve boot performance.  32-bit systems do not
benefit from this feature.

Jiri reported a problem where deferred struct pages do not work well with
x86-32:

[    0.035162] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.035725] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.036269] Initializing CPU#0
[    0.036513] Initializing HighMem for node 0 (00036ffe:0007ffe0)
[    0.038459] page:f6780000 is uninitialized and poisoned
[    0.038460] raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
[    0.039509] page dumped because: VM_BUG_ON_PAGE(1 && PageCompound(page))
[    0.040038] ------------[ cut here ]------------
[    0.040399] kernel BUG at include/linux/page-flags.h:293!
[    0.040823] invalid opcode: 0000 [#1] SMP PTI
[    0.041166] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc1_pt_jiri #9
[    0.041694] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[    0.042496] EIP: free_highmem_page+0x64/0x80
[    0.042839] Code: 13 46 d8 c1 e8 18 5d 83 e0 03 8d 04 c0 c1 e0 06 ff 80 ec 5f 44 d8 c3 8d b4 26 00 00 00 00 ba 08 65 28 d8 89 d8 e8 fc 71 02 00 <0f> 0b 8d 76 00 8d bc 27 00 00 00 00 ba d0 b1 26 d8 89 d8 e8 e4 71
[    0.044338] EAX: 0000003c EBX: f6780000 ECX: 00000000 EDX: d856cbe8
[    0.044868] ESI: 0007ffe0 EDI: d838df20 EBP: d838df00 ESP: d838defc
[    0.045372] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210086
[    0.045913] CR0: 80050033 CR2: 00000000 CR3: 18556000 CR4: 00040690
[    0.046413] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.046913] DR6: fffe0ff0 DR7: 00000400
[    0.047220] Call Trace:
[    0.047419]  add_highpages_with_active_regions+0xbd/0x10d
[    0.047854]  set_highmem_pages_init+0x5b/0x71
[    0.048202]  mem_init+0x2b/0x1e8
[    0.048460]  start_kernel+0x1d2/0x425
[    0.048757]  i386_start_kernel+0x93/0x97
[    0.049073]  startup_32_smp+0x164/0x168
[    0.049379] Modules linked in:
[    0.049626] ---[ end trace 337949378db0abbb ]---

We free highmem pages before their struct pages are initialized:

mem_init()
 set_highmem_pages_init()
  add_highpages_with_active_regions()
   free_highmem_page()
    .. Access uninitialized struct page here..

Because there is no reason to have this feature on 32-bit systems, just
disable it.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 2e3ca40 ("mm: relax deferred struct page requirements")
Signed-off-by: Pavel Tatashin <[email protected]>
Reported-by: Jiri Slaby <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
BtbN pushed a commit to BtbN/BPI-Router-Linux that referenced this issue Sep 7, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  frank-w#8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  frank-w#9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 frank-w#10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 frank-w#11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 frank-w#12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 frank-w#13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 frank-w#14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 frank-w#15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 frank-w#16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 frank-w#17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit 86a41ea ]

When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.

This issue was previously discussed here:
https://lore.kernel.org/netdev/[email protected]/

The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.

  ============================================
  WARNING: possible recursive locking detected
  6.10.0+ #34 Not tainted
  --------------------------------------------
  iperf3/771 is trying to acquire lock:
  ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0

  but task is already holding lock:
  ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(slock-AF_INET/1);
    lock(slock-AF_INET/1);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  10 locks held by iperf3/771:
   #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
   #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
   #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
   #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
   #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
   #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450

  stack backtrace:
  CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x69/0xa0
   dump_stack+0xc/0x20
   __lock_acquire+0x135d/0x2600
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc4/0x2a0
   ? l2tp_xmit_skb+0x243/0x9d0
   ? __skb_checksum+0xa3/0x540
   _raw_spin_lock_nested+0x35/0x50
   ? l2tp_xmit_skb+0x243/0x9d0
   l2tp_xmit_skb+0x243/0x9d0
   l2tp_eth_dev_xmit+0x3c/0xc0
   dev_hard_start_xmit+0x11e/0x420
   sch_direct_xmit+0xc3/0x640
   __dev_queue_xmit+0x61c/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   __tcp_send_ack+0x1b8/0x340
   tcp_send_ack+0x23/0x30
   __tcp_ack_snd_check+0xa8/0x530
   ? srso_alias_return_thunk+0x5/0xfbef5
   tcp_rcv_established+0x412/0xd70
   tcp_v4_do_rcv+0x299/0x420
   tcp_v4_rcv+0x1991/0x1e10
   ip_protocol_deliver_rcu+0x50/0x220
   ip_local_deliver_finish+0x158/0x260
   ip_local_deliver+0xc8/0xe0
   ip_rcv+0xe5/0x1d0
   ? __pfx_ip_rcv+0x10/0x10
   __netif_receive_skb_one_core+0xce/0xe0
   ? process_backlog+0x28b/0x9f0
   __netif_receive_skb+0x34/0xd0
   ? process_backlog+0x28b/0x9f0
   process_backlog+0x2cb/0x9f0
   __napi_poll.constprop.0+0x61/0x280
   net_rx_action+0x332/0x670
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? find_held_lock+0x2b/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   handle_softirqs+0xda/0x480
   ? __dev_queue_xmit+0xa2c/0x1450
   do_softirq+0xa1/0xd0
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0xc8/0xe0
   ? __dev_queue_xmit+0xa2c/0x1450
   __dev_queue_xmit+0xa48/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   tcp_write_xmit+0x766/0x2fb0
   ? __entry_text_end+0x102ba9/0x102bad
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __might_fault+0x74/0xc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   __tcp_push_pending_frames+0x56/0x190
   tcp_push+0x117/0x310
   tcp_sendmsg_locked+0x14c1/0x1740
   tcp_sendmsg+0x28/0x40
   inet_sendmsg+0x5d/0x90
   sock_write_iter+0x242/0x2b0
   vfs_write+0x68d/0x800
   ? __pfx_sock_write_iter+0x10/0x10
   ksys_write+0xc8/0xf0
   __x64_sys_write+0x3d/0x50
   x64_sys_call+0xfaf/0x1f50
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4d143af992
  Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
  RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
  RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
  RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
  R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
   </TASK>

Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Suggested-by: Eric Dumazet <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
CC: [email protected]
CC: [email protected]
Signed-off-by: James Chapman <[email protected]>
Signed-off-by: Tom Parkin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit 86a41ea ]

When l2tp tunnels use a socket provided by userspace, we can hit
lockdep splats like the below when data is transmitted through another
(unrelated) userspace socket which then gets routed over l2tp.

This issue was previously discussed here:
https://lore.kernel.org/netdev/[email protected]/

The solution is to have lockdep treat socket locks of l2tp tunnel
sockets separately than those of standard INET sockets. To do so, use
a different lockdep subclass where lock nesting is possible.

  ============================================
  WARNING: possible recursive locking detected
  6.10.0+ #34 Not tainted
  --------------------------------------------
  iperf3/771 is trying to acquire lock:
  ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0

  but task is already holding lock:
  ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(slock-AF_INET/1);
    lock(slock-AF_INET/1);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  10 locks held by iperf3/771:
   #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40
   #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0
   #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260
   #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10
   #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0
   #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130
   #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450
   #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450

  stack backtrace:
  CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x69/0xa0
   dump_stack+0xc/0x20
   __lock_acquire+0x135d/0x2600
   ? srso_alias_return_thunk+0x5/0xfbef5
   lock_acquire+0xc4/0x2a0
   ? l2tp_xmit_skb+0x243/0x9d0
   ? __skb_checksum+0xa3/0x540
   _raw_spin_lock_nested+0x35/0x50
   ? l2tp_xmit_skb+0x243/0x9d0
   l2tp_xmit_skb+0x243/0x9d0
   l2tp_eth_dev_xmit+0x3c/0xc0
   dev_hard_start_xmit+0x11e/0x420
   sch_direct_xmit+0xc3/0x640
   __dev_queue_xmit+0x61c/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   __tcp_send_ack+0x1b8/0x340
   tcp_send_ack+0x23/0x30
   __tcp_ack_snd_check+0xa8/0x530
   ? srso_alias_return_thunk+0x5/0xfbef5
   tcp_rcv_established+0x412/0xd70
   tcp_v4_do_rcv+0x299/0x420
   tcp_v4_rcv+0x1991/0x1e10
   ip_protocol_deliver_rcu+0x50/0x220
   ip_local_deliver_finish+0x158/0x260
   ip_local_deliver+0xc8/0xe0
   ip_rcv+0xe5/0x1d0
   ? __pfx_ip_rcv+0x10/0x10
   __netif_receive_skb_one_core+0xce/0xe0
   ? process_backlog+0x28b/0x9f0
   __netif_receive_skb+0x34/0xd0
   ? process_backlog+0x28b/0x9f0
   process_backlog+0x2cb/0x9f0
   __napi_poll.constprop.0+0x61/0x280
   net_rx_action+0x332/0x670
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? find_held_lock+0x2b/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   handle_softirqs+0xda/0x480
   ? __dev_queue_xmit+0xa2c/0x1450
   do_softirq+0xa1/0xd0
   </IRQ>
   <TASK>
   __local_bh_enable_ip+0xc8/0xe0
   ? __dev_queue_xmit+0xa2c/0x1450
   __dev_queue_xmit+0xa48/0x1450
   ? ip_finish_output2+0xf4c/0x1130
   ip_finish_output2+0x6b6/0x1130
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __ip_finish_output+0x217/0x380
   ? srso_alias_return_thunk+0x5/0xfbef5
   __ip_finish_output+0x217/0x380
   ip_output+0x99/0x120
   __ip_queue_xmit+0xae4/0xbc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? tcp_options_write.constprop.0+0xcb/0x3e0
   ip_queue_xmit+0x34/0x40
   __tcp_transmit_skb+0x1625/0x1890
   tcp_write_xmit+0x766/0x2fb0
   ? __entry_text_end+0x102ba9/0x102bad
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? __might_fault+0x74/0xc0
   ? srso_alias_return_thunk+0x5/0xfbef5
   __tcp_push_pending_frames+0x56/0x190
   tcp_push+0x117/0x310
   tcp_sendmsg_locked+0x14c1/0x1740
   tcp_sendmsg+0x28/0x40
   inet_sendmsg+0x5d/0x90
   sock_write_iter+0x242/0x2b0
   vfs_write+0x68d/0x800
   ? __pfx_sock_write_iter+0x10/0x10
   ksys_write+0xc8/0xf0
   __x64_sys_write+0x3d/0x50
   x64_sys_call+0xfaf/0x1f50
   do_syscall_64+0x6d/0x140
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7f4d143af992
  Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0
  RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992
  RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005
  RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc
  R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0
   </TASK>

Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()")
Suggested-by: Eric Dumazet <[email protected]>
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4
CC: [email protected]
CC: [email protected]
Signed-off-by: James Chapman <[email protected]>
Signed-off-by: Tom Parkin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 14, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <[email protected]>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Sep 23, 2024
iter_finish_branch_entry() doesn't put the branch_info from/to map
elements creating memory leaks. This can be seen with:

```
$ perf record -e cycles -b perf test -w noploop
$ perf report -D
...
Direct leak of 984344 byte(s) in 123043 object(s) allocated from:
    #0 0x7fb2654f3bd7 in malloc libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x564d3400d10b in map__get util/map.h:186
    #2 0x564d3400d10b in ip__resolve_ams util/machine.c:1981
    #3 0x564d34014d81 in sample__resolve_bstack util/machine.c:2151
    #4 0x564d34094790 in iter_prepare_branch_entry util/hist.c:898
    #5 0x564d34098fa4 in hist_entry_iter__add util/hist.c:1238
    #6 0x564d33d1f0c7 in process_sample_event tools/perf/builtin-report.c:334
    #7 0x564d34031eb7 in perf_session__deliver_event util/session.c:1655
    #8 0x564d3403ba52 in do_flush util/ordered-events.c:245
    #9 0x564d3403ba52 in __ordered_events__flush util/ordered-events.c:324
    #10 0x564d3402d32e in perf_session__process_user_event util/session.c:1708
    #11 0x564d34032480 in perf_session__process_event util/session.c:1877
    #12 0x564d340336ad in reader__read_event util/session.c:2399
    #13 0x564d34033fdc in reader__process_events util/session.c:2448
    #14 0x564d34033fdc in __perf_session__process_events util/session.c:2495
    #15 0x564d34033fdc in perf_session__process_events util/session.c:2661
    #16 0x564d33d27113 in __cmd_report tools/perf/builtin-report.c:1065
    #17 0x564d33d27113 in cmd_report tools/perf/builtin-report.c:1805
    #18 0x564d33e0ccb7 in run_builtin tools/perf/perf.c:350
    #19 0x564d33e0d45e in handle_internal_command tools/perf/perf.c:403
    #20 0x564d33cdd827 in run_argv tools/perf/perf.c:447
    #21 0x564d33cdd827 in main tools/perf/perf.c:561
...
```

Clearing up the map_symbols properly creates maps reference count
issues so resolve those. Resolving this issue doesn't improve peak
heap consumption for the test above.

Committer testing:

  $ sudo dnf install libasan
  $ make -k CORESIGHT=1 EXTRA_CFLAGS="-fsanitize=address" CC=clang O=/tmp/build/$(basename $PWD)/ -C tools/perf install-bin

Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Ian Rogers <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sun Haiyong <[email protected]>
Cc: Yanteng Si <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
frank-w pushed a commit that referenced this issue Sep 23, 2024
AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.

  ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
  READ of size 1 at 0x60b00c48844b thread T193
      #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
      #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
      #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
      #3 0x5650d804568f in __hists__add_entry util/hist.c:754
      #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
      #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
      #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
      #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
      #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
      #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
      #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
      #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
      #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
      #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().

While this bug was probably introduced with 5c24b67 ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e94 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.

Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <[email protected]>
Signed-off-by: Matt Fleming (Cloudflare) <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: [email protected]
Cc: Namhyung Kim <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: [email protected] # v5.13+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
frank-w pushed a commit that referenced this issue Sep 23, 2024
The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
frank-w pushed a commit that referenced this issue Nov 3, 2024
On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
frank-w pushed a commit that referenced this issue Nov 3, 2024
There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.

Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:

kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ #9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20

This patch proposes a fixed that's based on adding 2 new additional
stid's sc_status values that help coordinate between the laundromat
and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()).

First to make sure, that once the stid is marked revoked, it is not
removed by the nfsd4_free_stateid(), the laundromat take a reference
on the stateid. Then, coordinating whether the stid has been put
on the cl_revoked list or we are processing FREE_STATEID and need to
make sure to remove it from the list, each check that state and act
accordingly. If laundromat has added to the cl_revoke list before
the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove
it from the list. If nfsd4_free_stateid() finds that operations arrived
before laundromat has placed it on cl_revoke list, it marks the state
freed and then laundromat will no longer add it to the list.

Also, for nfsd4_delegreturn() when looking for the specified stid,
we need to access stid that are marked removed or freeable, it means
the laundromat has started processing it but hasn't finished and this
delegreturn needs to return nfserr_deleg_revoked and not
nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the
lack of it will leave this stid on the cl_revoked list indefinitely.

Fixes: 2d4a532 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
CC: [email protected]
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
BtbN pushed a commit to BtbN/BPI-Router-Linux that referenced this issue Nov 9, 2024
commit ac01c8c upstream.

AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.

  ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
  READ of size 1 at 0x60b00c48844b thread T193
      #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
      frank-w#1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
      frank-w#2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
      frank-w#3 0x5650d804568f in __hists__add_entry util/hist.c:754
      frank-w#4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
      frank-w#5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
      frank-w#6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
      frank-w#7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
      frank-w#8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
      frank-w#9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
      frank-w#10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
      frank-w#11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
      frank-w#12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
      frank-w#13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().

While this bug was probably introduced with 5c24b67 ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e94 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.

Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <[email protected]>
Signed-off-by: Matt Fleming (Cloudflare) <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: [email protected]
Cc: Namhyung Kim <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: [email protected] # v5.13+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
BtbN pushed a commit to BtbN/BPI-Router-Linux that referenced this issue Nov 9, 2024
commit 9af2efe upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  frank-w#1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  frank-w#2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  frank-w#3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  frank-w#4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  frank-w#5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  frank-w#6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  frank-w#7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  frank-w#8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  frank-w#9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  frank-w#10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  frank-w#11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  frank-w#12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  frank-w#13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  frank-w#14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  frank-w#15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  frank-w#16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  frank-w#17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
BtbN pushed a commit to BtbN/BPI-Router-Linux that referenced this issue Nov 9, 2024
[ Upstream commit a848c29 ]

On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [frank-w#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 frank-w#1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 frank-w#1 [ffff8001116ef760] __schedule at ffff800008dd0650
 frank-w#2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 frank-w#3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 frank-w#4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 frank-w#5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 frank-w#6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 frank-w#7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 frank-w#8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 frank-w#9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
BtbN pushed a commit to BtbN/BPI-Router-Linux that referenced this issue Nov 9, 2024
commit 8dd91e8 upstream.

There is a race between laundromat handling of revoked delegations
and a client sending free_stateid operation. Laundromat thread
finds that delegation has expired and needs to be revoked so it
marks the delegation stid revoked and it puts it on a reaper list
but then it unlock the state lock and the actual delegation revocation
happens without the lock. Once the stid is marked revoked a racing
free_stateid processing thread does the following (1) it calls
list_del_init() which removes it from the reaper list and (2) frees
the delegation stid structure. The laundromat thread ends up not
calling the revoke_delegation() function for this particular delegation
but that means it will no release the lock lease that exists on
the file.

Now, a new open for this file comes in and ends up finding that
lease list isn't empty and calls nfsd_breaker_owns_lease() which ends
up trying to derefence a freed delegation stateid. Leading to the
followint use-after-free KASAN warning:

kernel: ==================================================================
kernel: BUG: KASAN: slab-use-after-free in nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: Read of size 8 at addr ffff0000e73cd0c8 by task nfsd/6205
kernel:
kernel: CPU: 2 UID: 0 PID: 6205 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc7+ frank-w#9
kernel: Hardware name: Apple Inc. Apple Virtualization Generic Platform, BIOS 2069.0.0.0.0 08/03/2024
kernel: Call trace:
kernel: dump_backtrace+0x98/0x120
kernel: show_stack+0x1c/0x30
kernel: dump_stack_lvl+0x80/0xe8
kernel: print_address_description.constprop.0+0x84/0x390
kernel: print_report+0xa4/0x268
kernel: kasan_report+0xb4/0xf8
kernel: __asan_report_load8_noabort+0x1c/0x28
kernel: nfsd_breaker_owns_lease+0x140/0x160 [nfsd]
kernel: nfsd_file_do_acquire+0xb3c/0x11d0 [nfsd]
kernel: nfsd_file_acquire_opened+0x84/0x110 [nfsd]
kernel: nfs4_get_vfs_file+0x634/0x958 [nfsd]
kernel: nfsd4_process_open2+0xa40/0x1a40 [nfsd]
kernel: nfsd4_open+0xa08/0xe80 [nfsd]
kernel: nfsd4_proc_compound+0xb8c/0x2130 [nfsd]
kernel: nfsd_dispatch+0x22c/0x718 [nfsd]
kernel: svc_process_common+0x8e8/0x1960 [sunrpc]
kernel: svc_process+0x3d4/0x7e0 [sunrpc]
kernel: svc_handle_xprt+0x828/0xe10 [sunrpc]
kernel: svc_recv+0x2cc/0x6a8 [sunrpc]
kernel: nfsd+0x270/0x400 [nfsd]
kernel: kthread+0x288/0x310
kernel: ret_from_fork+0x10/0x20

This patch proposes a fixed that's based on adding 2 new additional
stid's sc_status values that help coordinate between the laundromat
and other operations (nfsd4_free_stateid() and nfsd4_delegreturn()).

First to make sure, that once the stid is marked revoked, it is not
removed by the nfsd4_free_stateid(), the laundromat take a reference
on the stateid. Then, coordinating whether the stid has been put
on the cl_revoked list or we are processing FREE_STATEID and need to
make sure to remove it from the list, each check that state and act
accordingly. If laundromat has added to the cl_revoke list before
the arrival of FREE_STATEID, then nfsd4_free_stateid() knows to remove
it from the list. If nfsd4_free_stateid() finds that operations arrived
before laundromat has placed it on cl_revoke list, it marks the state
freed and then laundromat will no longer add it to the list.

Also, for nfsd4_delegreturn() when looking for the specified stid,
we need to access stid that are marked removed or freeable, it means
the laundromat has started processing it but hasn't finished and this
delegreturn needs to return nfserr_deleg_revoked and not
nfserr_bad_stateid. The latter will not trigger a FREE_STATEID and the
lack of it will leave this stid on the cl_revoked list indefinitely.

Fixes: 2d4a532 ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock")
CC: [email protected]
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
commit ac01c8c upstream.

AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.

  ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
  READ of size 1 at 0x60b00c48844b thread T193
      #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
      #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
      #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
      #3 0x5650d804568f in __hists__add_entry util/hist.c:754
      #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
      #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
      #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
      #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
      #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
      #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
      #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
      #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
      #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
      #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().

While this bug was probably introduced with 5c24b67 ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e94 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.

Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <[email protected]>
Signed-off-by: Matt Fleming (Cloudflare) <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: [email protected]
Cc: Namhyung Kim <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: [email protected] # v5.13+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
commit 9af2efe upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
[ Upstream commit a848c29 ]

On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
commit ac01c8c upstream.

AddressSanitizer found a use-after-free bug in the symbol code which
manifested as 'perf top' segfaulting.

  ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80
  READ of size 1 at 0x60b00c48844b thread T193
      #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310
      #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286
      #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614
      #3 0x5650d804568f in __hists__add_entry util/hist.c:754
      #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772
      #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997
      #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242
      #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845
      #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208
      #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245
      #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324
      #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120
      #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442
      #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

When updating hist maps it's also necessary to update the hist symbol
reference because the old one gets freed in map__put().

While this bug was probably introduced with 5c24b67 ("perf
tools: Replace map->referenced & maps->removed_maps with map->refcnt"),
the symbol objects were leaked until c087e94 ("perf machine:
Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so
the bug was masked.

Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL")
Reported-by: Yunzhao Li <[email protected]>
Signed-off-by: Matt Fleming (Cloudflare) <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: [email protected]
Cc: Namhyung Kim <[email protected]>
Cc: Riccardo Mancini <[email protected]>
Cc: [email protected] # v5.13+
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
commit 9af2efe upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
…tion to perf_sched__replay()

[ Upstream commit c690786 ]

The start_work_mutex and work_done_wait_mutex are used only for the
'perf sched replay'. Put their initialization in perf_sched__replay () to
reduce unnecessary actions in other commands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.197 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ]

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 65658 nsecs
  the run test took 999991 nsecs
  the sleep test took 1079324 nsecs
  nr_run_events:        42378
  nr_sleep_events:      43102
  nr_wakeup_events:     31852
  target-less wakeups:  17
  multi-target wakeups: 712
  task      0 (             swapper:         0), nr_events: 10451
  task      1 (             swapper:         1), nr_events: 3
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    717 (     sched-messaging:     74483), nr_events: 152
  task    718 (     sched-messaging:     74484), nr_events: 1944
  task    719 (     sched-messaging:     74485), nr_events: 73
  task    720 (     sched-messaging:     74486), nr_events: 163
  task    721 (     sched-messaging:     74487), nr_events: 942
  task    722 (     sched-messaging:     74488), nr_events: 78
  task    723 (     sched-messaging:     74489), nr_events: 1090
  ------------------------------------------------------------
  #1  : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70
  #2  : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82
  #3  : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96
  #4  : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64
  #5  : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11
  #6  : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88
  #7  : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01
  #8  : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40
  #9  : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15
  #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69
  # echo $?
  0

Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
…f_sched__{lat|map|replay}()

[ Upstream commit bd2cdf2 ]

The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.209 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]

  # perf sched lat

   -------------------------------------------------------------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
   -------------------------------------------------------------------------------------------------------------------------------------------
    sched-messaging:(401) |   2990.699 ms |    38705 | avg:   0.661 ms | max:  67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
    qemu-system-x86:(7)   |    179.764 ms |     2191 | avg:   0.152 ms | max:  21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
    sshd:48125            |      0.522 ms |        2 | avg:   0.037 ms | max:   0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
  <SNIP>
    ksoftirqd/11:82       |      0.063 ms |        1 | avg:   0.005 ms | max:   0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
    kworker/9:0-mm_:34624 |      0.233 ms |       20 | avg:   0.004 ms | max:   0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
    migration/13:93       |      0.000 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
   -----------------------------------------------------------------------------------------------------------------
    TOTAL:                |   3180.750 ms |    41368 |
   ---------------------------------------------------

  # echo $?
  0

  # perf sched map
    *A0                                                               456532.510141 secs A0 => migration/0:15
    *.                                                                456532.510171 secs .  => swapper:0
     .  *B0                                                           456532.510261 secs B0 => migration/1:21
     .  *.                                                            456532.510279 secs
  <SNIP>
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .   .    456532.785979 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .    456532.786054 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .    456532.786127 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .    456532.786197 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7   456532.786270 secs
  # echo $?
  0

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 66473 nsecs
  the run test took 1000002 nsecs
  the sleep test took 1082686 nsecs
  nr_run_events:        49334
  nr_sleep_events:      50054
  nr_wakeup_events:     34701
  target-less wakeups:  165
  multi-target wakeups: 766
  task      0 (             swapper:         0), nr_events: 15419
  task      1 (             swapper:         1), nr_events: 1
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    715 (     sched-messaging:    110248), nr_events: 1438
  task    716 (     sched-messaging:    110249), nr_events: 512
  task    717 (     sched-messaging:    110250), nr_events: 500
  task    718 (     sched-messaging:    110251), nr_events: 537
  task    719 (     sched-messaging:    110252), nr_events: 823
  ------------------------------------------------------------
  #1  : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
  #2  : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
  #3  : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
  #4  : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
  #5  : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
  #6  : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
  #7  : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
  #8  : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
  #9  : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
  #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
  # echo $?
  0

Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
[ Upstream commit a848c29 ]

On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
commit 9af2efe upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Matt Fleming <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
…tion to perf_sched__replay()

[ Upstream commit c690786 ]

The start_work_mutex and work_done_wait_mutex are used only for the
'perf sched replay'. Put their initialization in perf_sched__replay () to
reduce unnecessary actions in other commands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.197 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ]

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 65658 nsecs
  the run test took 999991 nsecs
  the sleep test took 1079324 nsecs
  nr_run_events:        42378
  nr_sleep_events:      43102
  nr_wakeup_events:     31852
  target-less wakeups:  17
  multi-target wakeups: 712
  task      0 (             swapper:         0), nr_events: 10451
  task      1 (             swapper:         1), nr_events: 3
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    717 (     sched-messaging:     74483), nr_events: 152
  task    718 (     sched-messaging:     74484), nr_events: 1944
  task    719 (     sched-messaging:     74485), nr_events: 73
  task    720 (     sched-messaging:     74486), nr_events: 163
  task    721 (     sched-messaging:     74487), nr_events: 942
  task    722 (     sched-messaging:     74488), nr_events: 78
  task    723 (     sched-messaging:     74489), nr_events: 1090
  ------------------------------------------------------------
  #1  : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70
  #2  : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82
  #3  : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96
  #4  : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64
  #5  : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11
  #6  : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88
  #7  : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01
  #8  : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40
  #9  : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15
  #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69
  # echo $?
  0

Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
…f_sched__{lat|map|replay}()

[ Upstream commit bd2cdf2 ]

The curr_pid and cpu_last_switched are used only for the
'perf sched replay/latency/map'. Put their initialization in
perf_sched__{lat|map|replay () to reduce unnecessary actions in other
commands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.209 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ]

  # perf sched lat

   -------------------------------------------------------------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
   -------------------------------------------------------------------------------------------------------------------------------------------
    sched-messaging:(401) |   2990.699 ms |    38705 | avg:   0.661 ms | max:  67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s
    qemu-system-x86:(7)   |    179.764 ms |     2191 | avg:   0.152 ms | max:  21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s
    sshd:48125            |      0.522 ms |        2 | avg:   0.037 ms | max:   0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s
  <SNIP>
    ksoftirqd/11:82       |      0.063 ms |        1 | avg:   0.005 ms | max:   0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s
    kworker/9:0-mm_:34624 |      0.233 ms |       20 | avg:   0.004 ms | max:   0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s
    migration/13:93       |      0.000 ms |        1 | avg:   0.004 ms | max:   0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s
   -----------------------------------------------------------------------------------------------------------------
    TOTAL:                |   3180.750 ms |    41368 |
   ---------------------------------------------------

  # echo $?
  0

  # perf sched map
    *A0                                                               456532.510141 secs A0 => migration/0:15
    *.                                                                456532.510171 secs .  => swapper:0
     .  *B0                                                           456532.510261 secs B0 => migration/1:21
     .  *.                                                            456532.510279 secs
  <SNIP>
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .   .    456532.785979 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .   .    456532.786054 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .   .    456532.786127 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7  .    456532.786197 secs
     L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7  L7 *L7   456532.786270 secs
  # echo $?
  0

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 66473 nsecs
  the run test took 1000002 nsecs
  the sleep test took 1082686 nsecs
  nr_run_events:        49334
  nr_sleep_events:      50054
  nr_wakeup_events:     34701
  target-less wakeups:  165
  multi-target wakeups: 766
  task      0 (             swapper:         0), nr_events: 15419
  task      1 (             swapper:         1), nr_events: 1
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    715 (     sched-messaging:    110248), nr_events: 1438
  task    716 (     sched-messaging:    110249), nr_events: 512
  task    717 (     sched-messaging:    110250), nr_events: 500
  task    718 (     sched-messaging:    110251), nr_events: 537
  task    719 (     sched-messaging:    110252), nr_events: 823
  ------------------------------------------------------------
  #1  : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35
  #2  : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56
  #3  : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39
  #4  : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86
  #5  : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34
  #6  : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39
  #7  : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09
  #8  : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57
  #9  : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90
  #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07
  # echo $?
  0

Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string")
Signed-off-by: Sasha Levin <[email protected]>
frank-w pushed a commit that referenced this issue Dec 6, 2024
[ Upstream commit a848c29 ]

On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32 ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <[email protected]>
Reviewed-by: Trond Myklebust <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant