up (#2) #487

addstone · 2017-11-12T18:17:37Z

objtool: Fix memory leak in decode_instructions()

When an error occurs before adding an allocated insn to the list, free
it before returning.

Signed-off-by: Kamalesh Babulal [email protected]
Signed-off-by: Josh Poimboeuf [email protected]
Cc: Linus Torvalds [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Thomas Gleixner [email protected]
Link: http://lkml.kernel.org/r/336da800bf6070eae11f4e0a3b9ca64c27658114.1508430423.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar [email protected]

x86/mm: Limit mmap() of /dev/mem to valid physical addresses

Currently, it is possible to mmap() any offset from /dev/mem. If a
program mmaps() /dev/mem offsets outside of the addressable limits
of a system, the page table can be corrupted by setting reserved bits.

For example if you mmap() offset 0x0001000000000000 of /dev/mem on an
x86_64 system with a 48-bit bus, the page fault handler will be called
with error_code set to RSVD. The kernel then crashes with a page table
corruption error.

This change prevents this page table corruption on x86 by refusing
to mmap offsets higher than the highest valid address in the system.

ALSA: hda/realtek - Add support for ALC236/ALC3204

Add support for ALC236/ALC3204.
Add headset mode support for ALC236/ALC3204.

Signed-off-by: Kailang Yang [email protected]
Cc: [email protected]
Signed-off-by: Takashi Iwai [email protected]

mmc: tmio: fix swiotlb buffer is full

Since the commit de3ee99 ("mmc: Delete bounce buffer handling")
deletes the bounce buffer handling, a request data size will be referred
to max_{req,seg}_size instead of MMC_QUEUE_BOUNCESZ (64k bytes).

In other hand, renesas_sdhi_internal_dmac.c will set very big value of
max_{req,seg}_size because the max_blk_count is set to 0xffffffff.
And then, "swiotlb buffer is full" happens because swiotlb can handle
a memory size up to 256k bytes only (IO_TLB_SEGSIZE = 128 and
IO_TLB_SHIFT = 11).

So, as a workaround, this patch avoids the issue by setting
the max_{req,seg}_size up to 256k bytes if swiotlb is running.

Reported-by: Dirk Behme [email protected]
Signed-off-by: Yoshihiro Shimoda [email protected]
Acked-by: Wolfram Sang [email protected]
Reviewed-by: Geert Uytterhoeven [email protected]
Signed-off-by: Ulf Hansson [email protected]

mmc: renesas_sdhi: fix kernel panic in _internal_dmac.c

Since this driver checks if the return value of dma_map_sg() is minus
or not and keeps to enable the DMAC, it may cause kernel panic when
the dma_map_sg() returns 0. So, this patch fixes the issue.

Reported-by: Dirk Behme [email protected]
Fixes: 2a68ea7 ("mmc: renesas-sdhi: add support for R-Car Gen3 SDHI DMAC")
Signed-off-by: Yoshihiro Shimoda [email protected]
Reviewed-by: Geert Uytterhoeven [email protected]
Acked-by: Wolfram Sang [email protected]
Signed-off-by: Ulf Hansson [email protected]

binder: call poll_wait() unconditionally.

Because we're not guaranteed that subsequent calls
to poll() will have a poll_table_struct parameter
with _qproc set. When _qproc is not set, poll_wait()
is a noop, and we won't be woken up correctly.

Signed-off-by: Martijn Coenen [email protected]
Signed-off-by: Greg Kroah-Hartman [email protected]

clockevents/drivers/cs5535: Improve resilience to spurious interrupts

The interrupt handler mfgpt_tick() is not robust versus spurious interrupts
which happen before the clock event device is registered and fully
initialized.

The reason is that the safe guard against spurious interrupts solely checks
for the clockevents shutdown state, but lacks a check for detached
state. If the interrupt hits while the device is in detached state it
passes the safe guard and dereferences the event handler call back which is
NULL.

Add the missing state check.

Fixes: 8f9327c ("clockevents/drivers/cs5535: Migrate to new 'set-state' interface")
Suggested-by: Thomas Gleixner [email protected]
Signed-off-by: David Kozub [email protected]
Signed-off-by: Thomas Gleixner [email protected]
Cc: Daniel Lezcano [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect

Now sctp processes icmp redirect packet in sctp_icmp_redirect where
it calls sctp_transport_dst_check in which tp->dst can be released.

The problem is before calling sctp_transport_dst_check, it doesn't
check sock_owned_by_user, which means tp->dst could be freed while
a process is accessing it with owning the socket.

An use-after-free issue could be triggered by this.

This patch is to fix it by checking sock_owned_by_user before calling
sctp_transport_dst_check in sctp_icmp_redirect, so that it would not
release tp->dst if users still hold sock lock.

Besides, the same issue fixed in commit 45caeaa ("dccp/tcp: fix
routing redirect race") on sctp also needs this check.

Fixes: 55be7a9 ("ipv4: Add redirect support to all protocol icmp error handlers")
Reported-by: Eric Dumazet [email protected]
Signed-off-by: Xin Long [email protected]
Acked-by: Marcelo Ricardo Leitner [email protected]
Acked-by: Neil Horman [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: enforce TCP only support for sockmap

Only TCP sockets have been tested and at the moment the state change
callback only handles TCP sockets. This adds a check to ensure that
sockets actually being added are TCP sockets.

For net-next we can consider UDP support.

Signed-off-by: John Fastabend [email protected]
Acked-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region

SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.

This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.

Signed-off-by: John Fastabend [email protected]
Acked-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: remove mark access for SK_SKB program types

The skb->mark field is a union with reserved_tailroom which is used
in the TCP code paths from stream memory allocation. Allowing SK_SKB
programs to set this field creates a conflict with future code
optimizations, such as "gifting" the skb to the egress path instead
of creating a new skb and doing a memcpy.

Because we do not have a released version of SK_SKB yet lets just
remove it for now. A more appropriate scratch pad to use at the
socket layer is dev_scratch, but lets add that in future kernels
when needed.

Signed-off-by: John Fastabend [email protected]
Acked-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: require CAP_NET_ADMIN when using sockmap maps

Restrict sockmap to CAP_NET_ADMIN.

Signed-off-by: John Fastabend [email protected]
Acked-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: require CAP_NET_ADMIN when using devmap

Devmap is used with XDP which requires CAP_NET_ADMIN so lets also
make CAP_NET_ADMIN required to use the map.

Signed-off-by: John Fastabend [email protected]
Acked-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Signed-off-by: David S. Miller [email protected]

vmbus: hvsock: add proper sync for vmbus_hvsock_device_unregister()

Without the patch, vmbus_hvsock_device_unregister() can destroy the device
prematurely when close() is called, and can cause NULl dereferencing or
potential data loss (the last portion of the data stream may be dropped
prematurely).

Signed-off-by: Dexuan Cui [email protected]
Cc: Haiyang Zhang [email protected]
Cc: Stephen Hemminger [email protected]
Signed-off-by: K. Y. Srinivasan [email protected]
Signed-off-by: Greg Kroah-Hartman [email protected]

waitid(): Avoid unbalanced user_access_end() on access_ok() error

As pointed out by Linus and David, the earlier waitid() fix resulted in
a (currently harmless) unbalanced user_access_end() call. This fixes it
to just directly return EFAULT on access_ok() failure.

Fixes: 96ca579 ("waitid(): Add missing access_ok() checks")
Acked-by: David Daney [email protected]
Cc: Al Viro [email protected]
Signed-off-by: Kees Cook [email protected]
Signed-off-by: Linus Torvalds [email protected]

tcp/dccp: fix ireq->opt races

syzkaller found another bug in DCCP/TCP stacks [1]

For the reasons explained in commit ce10500 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.

Note the opt field is renamed to ireq_opt to ease grep games.

[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295

CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
__tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000

Allocated by task 3295:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
__do_kmalloc mm/slab.c:3725 [inline]
__kmalloc+0x162/0x760 mm/slab.c:3734
kmalloc include/linux/slab.h:498 [inline]
tcp_v4_save_options include/net/tcp.h:1962 [inline]
tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3306:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
__sk_destruct+0xfd/0x910 net/core/sock.c:1560
sk_destruct+0x47/0x80 net/core/sock.c:1595
__sk_free+0x57/0x230 net/core/sock.c:1603
sk_free+0x2a/0x40 net/core/sock.c:1614
sock_put include/net/sock.h:1652 [inline]
inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe

Fixes: e994b2f ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet [email protected]
Signed-off-by: David S. Miller [email protected]

packet: avoid panic in packet_getsockopt()

syzkaller got crashes in packet_getsockopt() processing
PACKET_ROLLOVER_STATS command while another thread was managing
to change po->rollover

Using RCU will fix this bug. We might later add proper RCU annotations
for sparse sake.

In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
variant, as spotted by John.

Fixes: a9b6391 ("packet: rollover statistics")
Signed-off-by: Eric Dumazet [email protected]
Cc: Willem de Bruijn [email protected]
Cc: John Sperbeck [email protected]
Signed-off-by: David S. Miller [email protected]

net/ncsi: Fix AEN HNCDSC packet length

Correct the value of the HNCDSC AEN packet.
Fixes: 7a82ecf "net/ncsi: NCSI AEN packet handler"

Signed-off-by: Samuel Mendoza-Jonas [email protected]
Signed-off-by: David S. Miller [email protected]

net/ncsi: Stop monitor if channel times out or is inactive

ncsi_channel_monitor() misses stopping the channel monitor in several
places that it should, causing a WARN_ON_ONCE() to trigger when the
monitor is re-started later, eg:

[ 459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90
[ 459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140
[ 459.040000] Hardware name: ASpeed SoC
[ 459.040000] Workqueue: events ncsi_dev_work
[ 459.040000] [<80010094>] (unwind_backtrace) from [<8000d950>] (show_stack+0x20/0x24)
[ 459.040000] [<8000d950>] (show_stack) from [<801dbf70>] (dump_stack+0x20/0x28)
[ 459.040000] [<801dbf70>] (dump_stack) from [<80018d7c>] (__warn+0xe0/0x108)
[ 459.040000] [<80018d7c>] (__warn) from [<80018e70>] (warn_slowpath_null+0x30/0x38)
[ 459.040000] [<80018e70>] (warn_slowpath_null) from [<803f6a08>] (ncsi_start_channel_monitor+0x7c/0x90)
[ 459.040000] [<803f6a08>] (ncsi_start_channel_monitor) from [<803f7664>] (ncsi_configure_channel+0xdc/0x5fc)
[ 459.040000] [<803f7664>] (ncsi_configure_channel) from [<803f8160>] (ncsi_dev_work+0xac/0x474)
[ 459.040000] [<803f8160>] (ncsi_dev_work) from [<8002d244>] (process_one_work+0x1e0/0x450)
[ 459.040000] [<8002d244>] (process_one_work) from [<8002d510>] (worker_thread+0x5c/0x570)
[ 459.040000] [<8002d510>] (worker_thread) from [<80033614>] (kthread+0x124/0x164)
[ 459.040000] [<80033614>] (kthread) from [<8000a5e8>] (ret_from_fork+0x14/0x2c)

This also updates the monitor instead of just returning if
ncsi_xmit_cmd() fails to send the get-link-status command so that the
monitor properly times out.

Fixes: e6f44ed "net/ncsi: Package and channel management"

Signed-off-by: Samuel Mendoza-Jonas [email protected]
Signed-off-by: David S. Miller [email protected]

net/ncsi: Disable HWA mode when no channels are found

When there are no NCSI channels probed, HWA (Hardware Arbitration)
mode is enabled. It's not correct because HWA depends on the fact:
NCSI channels exist and all of them support HWA mode. This disables
HWA when no channels are probed.

Signed-off-by: Gavin Shan [email protected]
Signed-off-by: Samuel Mendoza-Jonas [email protected]
Signed-off-by: David S. Miller [email protected]

net/ncsi: Enforce failover on link monitor timeout

The NCSI channel has been configured to provide service if its link
monitor timer is enabled, regardless of its state (inactive or active).
So the timeout event on the link monitor indicates the out-of-service
on that channel, for which a failover is needed.

This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor
timeout, regardless the channel's original state (inactive or active).
Also, the link is put into "down" state to give the failing channel
lowest priority when selecting for the active channel. The state of
failing channel should be set to active in order for deinitialization
and failover to be done.

Signed-off-by: Gavin Shan [email protected]
Signed-off-by: Samuel Mendoza-Jonas [email protected]
Signed-off-by: David S. Miller [email protected]

net/ncsi: Fix length of GVI response packet

The length of GVI (GetVersionInfo) response packet should be 40 instead
of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats.

ethtool --ncsi eth0 swstats

RESPONSE OK TIMEOUT ERROR

GVI 0 0 2

With this applied, no error reported on GVI response packets:

ethtool --ncsi eth0 swstats

RESPONSE OK TIMEOUT ERROR

GVI 2 0 0

Signed-off-by: Gavin Shan [email protected]
Signed-off-by: Samuel Mendoza-Jonas [email protected]
Signed-off-by: David S. Miller [email protected]

hv_sock: add locking in the open/close/release code paths

Without the patch, when hvs_open_connection() hasn't completely established
a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
inserted the sock into the connected queue), vsock_stream_connect() may see
the sk_state change and return the connection to the userspace, and next
when the userspace closes the connection quickly, hvs_release() may not see
the connection in the connected queue; finally hvs_open_connection()
inserts the connection into the queue, but we won't be able to purge the
connection for ever.

Signed-off-by: Dexuan Cui [email protected]
Cc: K. Y. Srinivasan [email protected]
Cc: Haiyang Zhang [email protected]
Cc: Stephen Hemminger [email protected]
Cc: Vitaly Kuznetsov [email protected]
Cc: Cathy Avery [email protected]
Cc: Rolf Neugebauer [email protected]
Cc: Marcelo Cerri [email protected]
Signed-off-by: David S. Miller [email protected]

geneve: Fix function matching VNI and tunnel ID on big-endian

On big-endian machines, functions converting between tunnel ID
and VNI use the three LSBs of tunnel ID storage to map VNI.

The comparison function eq_tun_id_and_vni(), on the other hand,
attempted to map the VNI from the three MSBs. Fix it by using
the same check implemented on LE, which maps VNI from the three
LSBs of tunnel ID.

Fixes: 2e0b26e ("geneve: Optimize geneve device lookup.")
Signed-off-by: Stefano Brivio [email protected]
Reviewed-by: Jakub Sitnicki [email protected]
Signed-off-by: David S. Miller [email protected]

udp: make some messages more descriptive

In the UDP code there are two leftover error messages with very few meaning.
Replace them with a more descriptive error message as some users
reported them as "strange network error".

Signed-off-by: Matteo Croce [email protected]
Signed-off-by: David S. Miller [email protected]

android: binder: Don't get mm from task

Use binder_alloc struct's mm_struct rather than getting
a reference to the mm struct through get_task_mm to
avoid a potential deadlock between lru lock, task lock and
dentry lock, since a thread can be holding the task lock
and the dentry lock while trying to acquire the lru lock.

Acked-by: Arve Hjønnevåg [email protected]
Signed-off-by: Sherry Yang [email protected]
Signed-off-by: Greg Kroah-Hartman [email protected]

android: binder: Fix null ptr dereference in debug msg

Don't access next->data in kernel debug message when the
next buffer is null.

Acked-by: Arve Hjønnevåg [email protected]
Signed-off-by: Sherry Yang [email protected]
Signed-off-by: Greg Kroah-Hartman [email protected]

net: aquantia: Reset nic statistics on interface up/down

Internal statistics system on chip never gets reset until hardware
reboot. This is quite inconvenient in terms of ethtool statistics usage.

This patch implements incremental statistics update inside of
service callback.

Upon nic initialization, first request is done to fetch
initial stat data, current collected stat data gets cleared.
Internal statistics mailbox readout is improved to save space and
increase readability

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

net: aquantia: Add queue restarts stats counter

Queue stat strings are cleaned up, duplicate stat name strings removed,
queue restarts counter added

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

net: aquantia: Fixed transient link up/down/up notification

When doing ifconfig down/up, driver did not reported carrier_off neither
in nic_stop nor in nic_start. That caused link to be visible as "up"
during couple of seconds immediately after "ifconfig up".

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

net: aquantia: Limit number of MSIX irqs to the number of cpus

There is no much practical use from having MSIX vectors more that number
of cpus, thus cap this first with preconfigured limit, then with number
of cpus online.

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

net: aquantia: mmio unmap was not performed on driver removal

That may lead to mmio resource leakage.

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

net: aquantia: Enable coalescing management via ethtool interface

Aquantia NIC allows both TX and RX interrupt throttle rate (ITR)
management, but this was used in a very limited way via predefined
values. This patch allows to setup ITR default values via module
command line arguments and via standard ethtool coalescing settings.

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

net: aquantia: Bad udp rate on default interrupt coalescing

Default Tx rates cause very long ISR delays on Tx.
0xff is 510us delay, giving only ~ 2000 interrupts per seconds for
Tx rings cleanup. With these settings udp tx rate was never higher than
~800Mbps on a single stream. Changing min delay to 0xF makes it
way better with ~6Gbps

TCP stream performance is almost unaffected by this change, since LSO
optimizations play important role.

CPU load is affected insignificantly by this change.

Signed-off-by: Pavel Belous [email protected]
Signed-off-by: Igor Russkikh [email protected]
Signed-off-by: David S. Miller [email protected]

cpu/hotplug: Reset node state after operation

The recent rework of the cpu hotplug internals changed the usage of the per
cpu state->node field, but missed to clean it up after usage.

So subsequent hotplug operations use the stale pointer from a previous
operation and hand it into the callback functions. The callbacks then
dereference a pointer which either belongs to a different facility or
points to freed and potentially reused memory. In either case data
corruption and crashes are the obvious consequence.

Reset the node and the last pointers in the per cpu state to NULL after the
operation which set them has completed.

Fixes: 96abb96 ("smp/hotplug: Allow external multi-instance rollback")
Reported-by: Tvrtko Ursulin [email protected]
Signed-off-by: Thomas Gleixner [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Sebastian Andrzej Siewior [email protected]
Cc: Boris Ostrovsky [email protected]
Cc: "Paul E. McKenney" [email protected]
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos

hwmon: (da9052) Increase sample rate when using TSI

The TSI channel, which is usually used for touchscreen support, but can
be used as 4 general purpose ADCs. When used as a touchscreen interface
the touchscreen driver switches the device into 1ms sampling mode (rather
than the default 10ms economy mode) as recommended by the manufacturer.
When using the TSI channels as a general purpose ADC we are currently not
doing this and testing suggests that this can result in ADC timeouts:

[ 5827.198289] da9052 spi2.0: timeout waiting for ADC conversion interrupt
[ 5827.728293] da9052 spi2.0: timeout waiting for ADC conversion interrupt
[ 5993.808335] da9052 spi2.0: timeout waiting for ADC conversion interrupt
[ 5994.328441] da9052 spi2.0: timeout waiting for ADC conversion interrupt
[ 5994.848291] da9052 spi2.0: timeout waiting for ADC conversion interrupt

Switching to the 1ms timing resolves this issue.

Fixes: 4f16cab ("hwmon: da9052: Add support for TSI channel")
Signed-off-by: Martyn Welch [email protected]
Acked-by: Steve Twiss [email protected]
Signed-off-by: Guenter Roeck [email protected]

drm/amd/powerplay: fix uninitialized variable

refresh_rate was not initialized when program
display gap.
this patch can fix vce ring test failed
when do S3 on Polaris10.

bug: https://bugs.freedesktop.org/show_bug.cgi?id=103102
bug: https://bugzilla.kernel.org/show_bug.cgi?id=196615
Reviewed-by: Alex Deucher [email protected]
Signed-off-by: Rex Zhu [email protected]
Signed-off-by: Alex Deucher [email protected]
Cc: [email protected]

bpf: devmap fix arithmetic overflow in bitmap_size calculation

An integer overflow is possible in dev_map_bitmap_size() when
calculating the BITS_TO_LONG logic which becomes, after macro
replacement,

(((n) + (d) - 1)/ (d))

where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid
overflow cast to u64 before arithmetic.

Reported-by: Richard Weinberger [email protected]
Acked-by: Daniel Borkmann [email protected]
Signed-off-by: John Fastabend [email protected]
Acked-by: Alexei Starovoitov [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: fix off by one for range markings with L{T, E} patterns

During review I noticed that the current logic for direct packet
access marking in check_cond_jmp_op() has an off by one for the
upper right range border when marking in find_good_pkt_pointers()
with BPF_JLT and BPF_JLE. It's not really harmful given access
up to pkt_end is always safe, but we should nevertheless correct
the range marking before it becomes ABI. If pkt_data' denotes a
pkt_data derived pointer (pkt_data + X), then for pkt_data' < pkt_end
in the true branch as well as for pkt_end <= pkt_data' in the false
branch we mark the range with X although it should really be X - 1
in these cases. For example, X could be pkt_end - pkt_data, then
when testing for pkt_data' < pkt_end the verifier simulation cannot
deduce that a byte load of pkt_data' - 1 would succeed in this
branch.

Fixes: b4e432f ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
Signed-off-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Acked-by: John Fastabend [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: fix pattern matches for direct packet access

Alexander had a test program with direct packet access, where
the access test was in the form of data + X > data_end. In an
unrelated change to the program LLVM decided to swap the branches
and emitted code for the test in form of data + X <= data_end.
We hadn't seen these being generated previously, thus verifier
would reject the program. Therefore, fix up the verifier to
detect all test cases, so we don't run into such issues in the
future.

Fixes: b4e432f ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier")
Reported-by: Alexander Alemayhu [email protected]
Signed-off-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Acked-by: John Fastabend [email protected]
Signed-off-by: David S. Miller [email protected]

bpf: add test cases to bpf selftests to cover all access tests

Lets add test cases to cover really all possible direct packet
access tests for good/bad access cases so we keep tracking them.

Signed-off-by: Daniel Borkmann [email protected]
Acked-by: Alexei Starovoitov [email protected]
Acked-by: John Fastabend [email protected]
Signed-off-by: David S. Miller [email protected]

sock: correct sk_wmem_queued accounting on efault in tcp zerocopy

Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
after triggering an EFAULT in __zerocopy_sg_from_iter.

On this error, skb_zerocopy_stream_iter resets the skb to its state
before the operation with __pskb_trim. It cannot kfree_skb like
datagram callers, as the skb may have data from a previous send call.

__pskb_trim calls skb_condense for unowned skbs, which adjusts their
truesize. These tcp skbuffs are owned and their truesize must add up
to sk_wmem_queued. But they match because their skb->sk is NULL until
tcp_transmit_skb.

Temporarily set skb->sk when calling __pskb_trim to signal that the
skbuffs are owned and avoid the skb_condense path.

Fixes: 5226779 ("sock: add MSG_ZEROCOPY")
Signed-off-by: Willem de Bruijn [email protected]
Reviewed-by: Eric Dumazet [email protected]
Signed-off-by: David S. Miller [email protected]

net: bridge: fix returning of vlan range op errors

When vlan tunnels were introduced, vlan range errors got silently
dropped and instead 0 was returned always. Restore the previous
behaviour and return errors to user-space.

Fixes: efa5356 ("bridge: per vlan dst_metadata netlink support")
Signed-off-by: Nikolay Aleksandrov [email protected]
Acked-by: Roopa Prabhu [email protected]
Signed-off-by: David S. Miller [email protected]

soreuseport: fix initialization race

Syzkaller stumbled upon a way to trigger
WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39

There are two initialization paths for the sock_reuseport structure in a
socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
SO_ATTACH_REUSEPORT_[CE]BPF before bind. The existing implementation
assumedthat the socket lock protected both of these paths when it actually
only protects the SO_ATTACH_REUSEPORT path. Syzkaller triggered this
double allocation by running these paths concurrently.

This patch moves the check for double allocation into the reuseport_alloc
function which is protected by a global spin lock.

Fixes: e32ea7e ("soreuseport: fast reuseport UDP socket selection")
Fixes: c125e80 ("soreuseport: fast reuseport TCP socket selection")
Signed-off-by: Craig Gallek [email protected]
Signed-off-by: David S. Miller [email protected]

net: ethtool: remove error check for legacy setting transceiver type

Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
restores the transceiver type to struct ethtool_link_settings and
convert_link_ksettings_to_legacy_settings() but forgets to remove the
error check for the same in convert_legacy_settings_to_link_ksettings().
This prevents older versions of ethtool to change link settings.

# ethtool --version
ethtool version 3.16

# ethtool -s eth0 autoneg on speed 100 duplex full
Cannot set new settings: Invalid argument
  not setting speed
  not setting duplex
  not setting autoneg

While newer versions of ethtool works.

# ethtool --version
ethtool version 4.10

# ethtool -s eth0 autoneg on speed 100 duplex full
[   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
[   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx

Fixes: 19cab88 ("net: ethtool: Add back transceiver type")
Signed-off-by: Niklas Söderlund [email protected]
Reported-by: Renjith R V [email protected]
Tested-by: Geert Uytterhoeven [email protected]
Signed-off-by: David S. Miller [email protected]

mlxsw: reg: Add Tunneling IPinIP General Configuration Register

The TIGCR register is used for setting up the IPinIP Tunnel
configuration.

Fixes: ee954d1 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata [email protected]
Reviewed-by: Ido Schimmel [email protected]
Signed-off-by: Jiri Pirko [email protected]
Signed-off-by: David S. Miller [email protected]

mlxsw: spectrum_router: Configure TIGCR on init

Spectrum tunnels do not default to ttl of "inherit" like the Linux ones
do. Configure TIGCR on router init so that the TTL of tunnel packets is
copied from the overlay packets.

Fixes: ee954d1 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata [email protected]
Reviewed-by: Ido Schimmel [email protected]
Signed-off-by: Jiri Pirko [email protected]
Signed-off-by: David S. Miller [email protected]

net: stmmac: Add missing call to dev_kfree_skb()

When RX HW timestamp is enabled and a frame is discarded we are
not freeing the skb but instead only setting to NULL the entry.

Add a call to dev_kfree_skb_any() so that skb entry is correctly
freed.

Signed-off-by: Jose Abreu [email protected]
Cc: David S. Miller [email protected]
Cc: Joao Pinto [email protected]
Cc: Giuseppe Cavallaro [email protected]
Cc: Alexandre Torgue [email protected]
Signed-off-by: David S. Miller [email protected]

net: stmmac: Fix stmmac_get_rx_hwtstamp()

When using GMAC4 the valid timestamp is from CTX next desc but
we are passing the previous desc to get_rx_timestamp_status()
callback.

Fix this and while at it rework a little bit the function logic.

Signed-off-by: Jose Abreu [email protected]
Cc: David S. Miller [email protected]
Cc: Joao Pinto [email protected]
Cc: Giuseppe Cavallaro [email protected]
Cc: Alexandre Torgue [email protected]
Signed-off-by: David S. Miller [email protected]

net: stmmac: Prevent infinite loop in get_rx_timestamp_status()

Prevent infinite loop by correctly setting the loop condition to
break when i == 10.

Signed-off-by: Jose Abreu [email protected]
Cc: David S. Miller [email protected]
Cc: Joao Pinto [email protected]
Cc: Giuseppe Cavallaro [email protected]
Cc: Alexandre Torgue [email protected]
Signed-off-by: David S. Miller [email protected]

rxrpc: Don't release call mutex on error pointer

Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
call pointer actually holds an error value.

Fixes: 540b1c4 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
Reported-by: Marc Dionne [email protected]
Signed-off-by: David Howells [email protected]
Signed-off-by: David S. Miller [email protected]

textsearch: fix typos in library helpers

Fix spellos (typos) in textsearch library helpers.

Signed-off-by: Randy Dunlap [email protected]
Signed-off-by: David S. Miller [email protected]

of_mdio: Fix broken PHY IRQ in case of probe deferral

If an Ethernet PHY is initialized before the interrupt controller it is
connected to, a message like the following is printed:

irq: no irq domain found for /interrupt-controller@e61c0000 !

However, the actual error is ignored, leading to a non-functional (POLL)
PHY interrupt later:

Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)

Depending on whether the PHY driver will fall back to polling, Ethernet
may or may not work.

To fix this:

Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to
of_irq_get().
Unlike the former, the latter returns -EPROBE_DEFER if the
interrupt controller is not yet available, so this condition can be
detected.
Other errors are handled the same as before, i.e. use the passed
mdio->irq[addr] as interrupt.
Propagate and handle errors from of_mdiobus_register_phy() and
of_mdiobus_register_device().

Signed-off-by: Geert Uytterhoeven [email protected]
Signed-off-by: David S. Miller [email protected]

ipv6: flowlabel: do not leave opt->tot_len with garbage

When syzkaller team brought us a C repro for the crash [1] that
had been reported many times in the past, I finally could find
the root cause.

If FlowLabel info is merged by fl6_merge_options(), we leave
part of the opt_space storage provided by udp/raw/l2tp with random value
in opt_space.tot_len, unless a control message was provided at sendmsg()
time.

Then ip6_setup_cork() would use this random value to perform a kzalloc()
call. Undefined behavior and crashes.

Fix is to properly set tot_len in fl6_merge_options()

At the same time, we can also avoid consuming memory and cpu cycles
to clear it, if every option is copied via a kmemdup(). This is the
change in ip6_setup_cork().

[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cb64a100 task.stack: ffff8801cc350000
RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x358/0x5a0 net/socket.c:1750
SyS_sendto+0x40/0x50 net/socket.c:1718
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550

Signed-off-by: Eric Dumazet [email protected]
Reported-by: Dmitry Vyukov [email protected]
Signed-off-by: David S. Miller [email protected]

stmmac: Don't access tx_q->dirty_tx before netif_tx_lock

This is the possible reason for different hard to reproduce
problems on my ARMv7-SMP test system.

The symptoms are in recent kernels imprecise external aborts,
and in older kernels various kinds of network stalls and
unexpected page allocation failures.

My testing indicates that the trouble started between v4.5 and v4.6
and prevails up to v4.14.

Using the dirty_tx before acquiring the spin lock is clearly
wrong and was first introduced with v4.6.

Fixes: e3ad57c ("stmmac: review RX/TX ring management")

Signed-off-by: Bernd Edlinger [email protected]
Signed-off-by: David S. Miller [email protected]

x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't

Some F14h machines have an erratum which, "under a highly specific
and detailed set of internal timing conditions" can lead to skipping
instructions and RIP corruption.

Add the fix for those machines when their BIOS doesn't apply it or
there simply isn't BIOS update for them.

Tested-by: [email protected]
Signed-off-by: Borislav Petkov [email protected]
Cc: [email protected]
Cc: Linus Torvalds [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Sherry Hurwitz [email protected]
Cc: Thomas Gleixner [email protected]
Cc: Yazen Ghannam [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285
[ Added pr_info() that we activated the workaround. ]
Signed-off-by: Ingo Molnar [email protected]

Input: do not use property bits when generating module alias

The commit 8724ecb ("Input: allow matching device IDs on property
bits") started using property bits when generating module aliases for input
handlers, but did not adjust the generation of MODALIAS attribute on input
device uevents, breaking automatic module loading. Given that no handler
currently uses property bits in their module tables, let's revert this part
of the commit for now.

Reported-by: Damien Wyart [email protected]
Tested-by: Damien Wyart [email protected]
Fixes: 8724ecb ("Input: allow matching device IDs on property bits")
Signed-off-by: Dmitry Torokhov [email protected]

tcp: do tcp_mstamp_refresh before retransmits on TSQ handler

When retransmission on TSQ handler was introduced in the commit
f9616c3 ("tcp: implement TSQ for retransmits"), the retransmitted
skbs' timestamps were updated on the actual transmission. In the later
commit 385e207 ("tcp: use tp->tcp_mstamp in output path"), it stops
being done so. In the commit, the comment says "We try to refresh
tp->tcp_mstamp only when necessary", and at present tcp_tsq_handler and
tcp_v4_mtu_reduced applies to this. About the latter, it's okay since
it's rare enough.

About the former, even though possible retransmissions on the tasklet
comes just after the destructor run in NET_RX softirq handling, the time
between them could be nonnegligibly large to the extent that
tcp_rack_advance or rto rearming be affected if other (remaining) RX,
BLOCK and (preceding) TASKLET sofirq handlings are unexpectedly heavy.

So in the same way as tcp_write_timer_handler does, doing tcp_mstamp_refresh
ensures the accuracy of algorithms relying on it.

Fixes: 385e207 ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Koichiro Den [email protected]
Reviewed-by: Eric Dumazet [email protected]
Signed-off-by: David S. Miller [email protected]

tcp/dccp: fix lockdep splat in inet_csk_route_req()

This patch fixes the following lockdep splat in inet_csk_route_req()

lockdep_rcu_suspicious
inet_csk_route_req
tcp_v4_send_synack
tcp_rtx_synack
inet_rtx_syn_ack
tcp_fastopen_synack_time
tcp_retransmit_timer
tcp_write_timer_handler
tcp_write_timer
call_timer_fn

Thread running inet_csk_route_req() owns a reference on the request
socket, so we have the guarantee ireq->ireq_opt wont be changed or
freed.

lockdep can enforce this invariant for us.

Fixes: c92e8c0 ("tcp/dccp: fix ireq->opt races")
Signed-off-by: Eric Dumazet [email protected]
Signed-off-by: David S. Miller [email protected]

ipsec: Fix aborted xfrm policy dump crash

An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.

The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash. This can be
triggered if a dump fails because the target socket's receive
buffer is full.

This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.

Fixes: 12a169e ("ipsec: Put dumpers on the dump list")
Signed-off-by: Herbert Xu [email protected]
Signed-off-by: Steffen Klassert [email protected]

scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER

The legacy block layer handles requests as follows:

If the prep function returns BLKPREP_OK, let blk_peek_request()
return the pointer to that request.
If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED
flag and retry calling the prep function later.
If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end
the request.

In none of these cases it is correct to clear the SCMD_INITIALIZED
flag from inside scsi_prep_fn(). Since scsi_prep_fn() already
guarantees that scsi_init_command() will be called once even if
scsi_prep_fn() is called multiple times, remove the code that clears
SCMD_INITIALIZED from scsi_prep_fn().

The scsi-mq code handles requests as follows:

If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag
and submit the request to the SCSI LLD.
If scsi_mq_prep_fn() returns BLKPREP_DEFER, call
blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE.
If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call
scsi_mq_uninit_cmd() and let the blk-mq core end the request.

In none of these cases scsi_mq_prep_fn() should clear the
SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn()
function that clears that flag.

This patch avoids that the following warning is triggered when using
the legacy block layer:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220
CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1
task: ffff91c147a4b800 task.stack: ffffb282c37b8000
RIP: 0010:scsi_end_request+0x1de/0x220
Call Trace:

scsi_io_completion+0x204/0x5e0
scsi_finish_command+0xce/0xe0
scsi_softirq_done+0x126/0x130
blk_done_softirq+0x6e/0x80
__do_softirq+0xcf/0x2a8
irq_exit+0xab/0xb0
do_IRQ+0x7b/0xc0
common_interrupt+0x90/0x90

RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10
__test_set_page_writeback+0xc7/0x2c0
__block_write_full_page+0x158/0x3b0
block_write_full_page+0xc4/0xd0
blkdev_writepage+0x13/0x20
__writepage+0x12/0x40
write_cache_pages+0x204/0x500
generic_writepages+0x48/0x70
blkdev_writepages+0x9/0x10
do_writepages+0x34/0xc0
__filemap_fdatawrite_range+0x6c/0x90
file_write_and_wait_range+0x31/0x90
blkdev_fsync+0x16/0x40
vfs_fsync_range+0x44/0xa0
do_fsync+0x38/0x60
SyS_fsync+0xb/0x10
entry_SYSCALL_64_fastpath+0x13/0x94
---[ end trace 86e8ef85a4a6c1d1 ]---

Fixes: commit 64104f7 ("scsi: Call scsi_initialize_rq() for filesystem requests")
Signed-off-by: Bart Van Assche [email protected]
Cc: Damien Le Moal [email protected]
Cc: Christoph Hellwig [email protected]
Cc: Hannes Reinecke [email protected]
Cc: Johannes Thumshirn [email protected]
Reviewed-by: Damien Le Moal [email protected]
Reviewed-by: Johannes Thumshirn [email protected]
Signed-off-by: Martin K. Petersen [email protected]

Linux 4.14-rc6
x86/entry: Fix idtentry unwind hint

This fixes the following ORC warning in the 'int3' entry code:

WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b

The ORC metadata had the wrong stack offset for the iret registers.

Their location on the stack is dependent on whether the exception has an
error code.

Reported-and-tested-by: Andrei Vagin [email protected]
Signed-off-by: Josh Poimboeuf [email protected]
Cc: Andy Lutomirski [email protected]
Cc: Linus Torvalds [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Thomas Gleixner [email protected]
Fixes: 8c1f755 ("x86/entry/64: Add unwind hint annotations")
Link: http://lkml.kernel.org/r/931d57f0551ed7979d5e7e05370d445c8e5137f8.1508516398.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar [email protected]

x86/unwind: Show function name+offset in ORC error messages

Improve the warning messages to show the relevant function name+offset.
This makes it much easier to diagnose problems with the ORC metadata.

Before:

WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b

After:

WARNING: can't dereference iret registers at ffff880178f5ffe0 for ip int3+0x5b/0x60

Reported-by: Andrei Vagin [email protected]
Signed-off-by: Josh Poimboeuf [email protected]
Cc: Andy Lutomirski [email protected]
Cc: Linus Torvalds [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Thomas Gleixner [email protected]
Fixes: ee9f8fc ("x86/unwind: Add the ORC unwinder")
Link: http://lkml.kernel.org/r/6bada6b9eac86017e16bd79e1e77877935cb50bb.1508516398.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar [email protected]

sched/swait: Document it clearly that the swait facilities are special and shouldn't be used

We currently welcome using swait over wait whenever possible because
it is a slimmer data structure. However, Linus has made it very clear
that he does not want this used, unless under very specific RT scenarios
(such as current users).

Update the comments before kernel hipsters start thinking swait is the
cool thing to do.

Signed-off-by: Davidlohr Bueso [email protected]
Acked-by: Luis R. Rodriguez [email protected]
Cc: Andrew Morton [email protected]
Cc: Linus Torvalds [email protected]
Cc: Paul E. McKenney [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Thomas Gleixner [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar [email protected]

platform/x86: intel_pmc_ipc: Use devm_* calls in driver probe function

This patch cleans up unnecessary free/alloc calls in ipc_plat_probe(),
ipc_pci_probe() and ipc_plat_get_res() functions by using devm_*
calls.

This patch also adds proper error handling for failure cases in
ipc_pci_probe() function.

Signed-off-by: Kuppuswamy Sathyanarayanan [email protected]
[andy: fixed style issues, missed devm_free_irq(), removed unnecessary log message]
Signed-off-by: Andy Shevchenko [email protected]

platform/x86: intel_pmc_ipc: Use spin_lock to protect GCR updates

Currently, update_no_reboot_bit() function implemented in this driver
uses mutex_lock() to protect its register updates. But this function is
called with in atomic context in iTCO_wdt_start() and iTCO_wdt_stop()
functions in iTCO_wdt.c driver, which in turn causes "sleeping into
atomic context" issue. This patch fixes this issue by replacing the
mutex_lock() with spin_lock() to protect the GCR read/write/update APIs.

Fixes: 9d855d4 ("platform/x86: intel_pmc_ipc: Fix iTCO_wdt GCS memory mapping failure")
Signed-off-by: Kuppuswamy Sathyanarayanan [email protected]
Signed-off-by: Andy Shevchenko [email protected]

kbuild doc: a bundle of fixes on makefiles.txt

It does several fixes:

move the displaced ld example to its reasonable place.
add new example for command gzip.
fix 2 number errors.
fix format of chapter 7.x, make it looks the same as other chapters.

Signed-off-by: Cao jin [email protected]
Signed-off-by: Masahiro Yamada [email protected]

kbuild: clang: fix build failures with sparse check

We should avoid using the space character when passing arguments to
clang, because static code analysis check tool such as sparse may
misinterpret the arguments followed by spaces as build targets hence
cause the build to fail.

Signed-off-by: David Lin [email protected]
Signed-off-by: Masahiro Yamada [email protected]

xfs: fix AIM7 regression

Apparently our current rwsem code doesn't like doing the trylock, then
lock for real scheme. So change our read/write methods to just do the
trylock for the RWF_NOWAIT case. This fixes a ~25% regression in
AIM7.

Fixes: 91f9943 ("fs: support RWF_NOWAIT for buffered reads")
Reported-by: kernel test robot [email protected]
Signed-off-by: Christoph Hellwig [email protected]
Reviewed-by: Darrick J. Wong [email protected]
Signed-off-by: Darrick J. Wong [email protected]

drivers/net/usb: add device id for TP-LINK UE300 USB 3.0 Ethernet

This product is named 'TP-LINK USB 3.0 Gigabit Ethernet Network
Adapter (Model No.is UE300)'. It uses chip RTL8153 and works with
driver drivers/net/usb/r8152.c

Signed-off-by: Ran Wang [email protected]
Signed-off-by: David S. Miller [email protected]

cdc_ether: flag the Huawei ME906/ME909 as WWAN

The Huawei ME906 (12d1:15c1) comes with a standard ECM interface that
requires management via AT commands sent over one of the control TTYs
(e.g. connected with AT^NDISDUP).

Signed-off-by: Aleksander Morgado [email protected]
Signed-off-by: David S. Miller [email protected]

net: mvpp2: fix TSO headers allocation and management

TSO headers are managed with txq index and therefore should be aligned
with the txq size, not with the aggregated txq size.

Fixes: 186cd4d ("net: mvpp2: software tso support")
Reported-by: Marc Zyngier [email protected]
Signed-off-by: Yan Markman [email protected]
Signed-off-by: Antoine Tenart [email protected]
Signed-off-by: David S. Miller [email protected]

net: mvpp2: do not unmap TSO headers buffers

The TSO header buffers are coming from a per cpu pool and should not
be unmapped as they are reused. The PPv2 driver was unmapping all
descriptors buffers unconditionally. This patch fixes this by checking
the buffers dma addresses before unmapping them, and by not unmapping
those who are located in the TSO header pool.

Fixes: 186cd4d ("net: mvpp2: software tso support")
Signed-off-by: Antoine Tenart [email protected]
Signed-off-by: David S. Miller [email protected]

net: mvpp2: do not call txq_done from the Tx path when Tx irqs are used

When Tx IRQs are used, txq_bufs_free() can be called from both the Tx
path and from NAPI poll(). This led to CPU stalls as if these two tasks
(Tx and Poll) are scheduled on two CPUs at the same time, DMA unmapping
operations are done on the same txq buffers.

This patch adds a check not to call txq_done() from the Tx path if Tx
interrupts are used as it does not make sense to do so.

Fixes: edc660f ("net: mvpp2: replace TX coalescing interrupts with hrtimer")
Signed-off-by: Antoine Tenart [email protected]
Signed-off-by: David S. Miller [email protected]

sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND

Commit 9b97420 ("sctp: support ipv6 nonlocal bind")
introduced support for the above options as v4 sctp did,
so patched sctp_v6_available().

In the v4 implementation it's enough, because
sctp_inet_bind_verify() just returns with sctp_v4_available().
However sctp_inet6_bind_verify() has an extra check before that
for link-local scope_id, which won't respect the above options.

Added the checks before calling ipv6_chk_addr(), but
not before the validation of scope_id.

before (w/ both options):
./v6test fe80::10 sctp
bind failed, errno: 99 (Cannot assign requested address)
./v6test fe80::10 tcp
bind success, errno: 0 (Success)

after (w/ both options):
./v6test fe80::10 sctp
bind success, errno: 0 (Success)

Signed-off-by: Laszlo Toth [email protected]
Reviewed-by: Xin Long [email protected]
Signed-off-by: David S. Miller [email protected]

can: sun4i: fix loopback mode

Fix loopback mode by setting the right flag and remove presume mode.

Signed-off-by: Gerhard Bertelsmann [email protected]
Cc: linux-stable [email protected]
Signed-off-by: Marc Kleine-Budde [email protected]

can: kvaser_usb: Correct return value in printout

If the return value from kvaser_usb_send_simple_msg() was non-zero, the
return value from kvaser_usb_flush_queue() was printed in the kernel
warning.

Signed-off-by: Jimmy Assarsson [email protected]
Cc: linux-stable [email protected]
Signed-off-by: Marc Kleine-Budde [email protected]

can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages

To avoid kernel warning "Unhandled message (68)", ignore the
CMD_FLUSH_QUEUE_REPLY message for now.

As of Leaf v2 firmware version v4.1.844 (2017-02-15), flush tx queue is
synchronous. There is a capability bit indicating whether flushing tx
queue is synchronous or asynchronous.

A proper solution would be to query the device for capabilities. If the
synchronous tx flush capability bit is set, we should wait for
CMD_FLUSH_QUEUE_REPLY message, while flushing the tx queue.

Signed-off-by: Jimmy Assarsson [email protected]
Cc: linux-stable [email protected]
Signed-off-by: Marc Kleine-Budde [email protected]

perf/x86/intel/bts: Fix exclusive event reference leak

Commit:

d2878d6 ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")

... adds a privilege check in the exactly wrong place in the event init path:
after the 'LBR exclusive' reference has been taken, and doesn't release it
in the case of insufficient privileges. After this, nobody in the system
gets to use PT or LBR afterwards.

This patch moves the privilege check to where it should have been in the
first place.

Signed-off-by: Alexander Shishkin [email protected]
Cc: Linus Torvalds [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Peter Zijlstra [email protected]
Cc: Thomas Gleixner [email protected]
Fixes: d2878d6 ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar [email protected]

ALSA: hda - fix headset mic problem for Dell machines with alc236

We have several Dell laptops which use the codec alc236, the headset
mic can't work on these machines. Following the commit 736f20a, we
add the pin cfg table to make the headset mic work.

Cc: [email protected]
Signed-off-by: Hui Wang [email protected]
Signed-off-by: Takashi Iwai [email protected]

hwmon: (tmp102) Fix first temperature reading

Commit 3d8f7a8 ("hwmon: (tmp102) Improve handling of initial read
delay") reduced the initial temperature read delay and made it dependent
on the chip's shutdown mode. If the chip was not in shutdown mode at probe,
the read delay no longer applies.

This ignores the fact that the chip initialization changes the temperature
sensor resolution, and that the temperature register values change when
the resolution is changed. As a result, the reported temperature is twice
as high as the real temperature until the first temperature conversion
after the configuration change is complete. This can result in unexpected
behavior and, worst case, in a system shutdown. To fix the problem,
let's just always wait for a conversion to complete before reporting
a temperatu…

* objtool: Fix memory leak in decode_instructions() When an error occurs before adding an allocated insn to the list, free it before returning. Signed-off-by: Kamalesh Babulal <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/336da800bf6070eae11f4e0a3b9ca64c27658114.1508430423.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]> * x86/mm: Limit mmap() of /dev/mem to valid physical addresses Currently, it is possible to mmap() any offset from /dev/mem. If a program mmaps() /dev/mem offsets outside of the addressable limits of a system, the page table can be corrupted by setting reserved bits. For example if you mmap() offset 0x0001000000000000 of /dev/mem on an x86_64 system with a 48-bit bus, the page fault handler will be called with error_code set to RSVD. The kernel then crashes with a page table corruption error. This change prevents this page table corruption on x86 by refusing to mmap offsets higher than the highest valid address in the system. Signed-off-by: Craig Bergstrom <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luis R. Rodriguez <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Toshi Kani <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> * ALSA: hda/realtek - Add support for ALC236/ALC3204 Add support for ALC236/ALC3204. Add headset mode support for ALC236/ALC3204. Signed-off-by: Kailang Yang <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> * mmc: tmio: fix swiotlb buffer is full Since the commit de3ee99b097d ("mmc: Delete bounce buffer handling") deletes the bounce buffer handling, a request data size will be referred to max_{req,seg}_size instead of MMC_QUEUE_BOUNCESZ (64k bytes). In other hand, renesas_sdhi_internal_dmac.c will set very big value of max_{req,seg}_size because the max_blk_count is set to 0xffffffff. And then, "swiotlb buffer is full" happens because swiotlb can handle a memory size up to 256k bytes only (IO_TLB_SEGSIZE = 128 and IO_TLB_SHIFT = 11). So, as a workaround, this patch avoids the issue by setting the max_{req,seg}_size up to 256k bytes if swiotlb is running. Reported-by: Dirk Behme <[email protected]> Signed-off-by: Yoshihiro Shimoda <[email protected]> Acked-by: Wolfram Sang <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Ulf Hansson <[email protected]> * mmc: renesas_sdhi: fix kernel panic in _internal_dmac.c Since this driver checks if the return value of dma_map_sg() is minus or not and keeps to enable the DMAC, it may cause kernel panic when the dma_map_sg() returns 0. So, this patch fixes the issue. Reported-by: Dirk Behme <[email protected]> Fixes: 2a68ea7896e3 ("mmc: renesas-sdhi: add support for R-Car Gen3 SDHI DMAC") Signed-off-by: Yoshihiro Shimoda <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Acked-by: Wolfram Sang <[email protected]> Signed-off-by: Ulf Hansson <[email protected]> * binder: call poll_wait() unconditionally. Because we're not guaranteed that subsequent calls to poll() will have a poll_table_struct parameter with _qproc set. When _qproc is not set, poll_wait() is a noop, and we won't be woken up correctly. Signed-off-by: Martijn Coenen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * clockevents/drivers/cs5535: Improve resilience to spurious interrupts The interrupt handler mfgpt_tick() is not robust versus spurious interrupts which happen before the clock event device is registered and fully initialized. The reason is that the safe guard against spurious interrupts solely checks for the clockevents shutdown state, but lacks a check for detached state. If the interrupt hits while the device is in detached state it passes the safe guard and dereferences the event handler call back which is NULL. Add the missing state check. Fixes: 8f9327cbb6e8 ("clockevents/drivers/cs5535: Migrate to new 'set-state' interface") Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: David Kozub <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] * sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect Now sctp processes icmp redirect packet in sctp_icmp_redirect where it calls sctp_transport_dst_check in which tp->dst can be released. The problem is before calling sctp_transport_dst_check, it doesn't check sock_owned_by_user, which means tp->dst could be freed while a process is accessing it with owning the socket. An use-after-free issue could be triggered by this. This patch is to fix it by checking sock_owned_by_user before calling sctp_transport_dst_check in sctp_icmp_redirect, so that it would not release tp->dst if users still hold sock lock. Besides, the same issue fixed in commit 45caeaa5ac0b ("dccp/tcp: fix routing redirect race") on sctp also needs this check. Fixes: 55be7a9c6074 ("ipv4: Add redirect support to all protocol icmp error handlers") Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: enforce TCP only support for sockmap Only TCP sockets have been tested and at the moment the state change callback only handles TCP sockets. This adds a check to ensure that sockets actually being added are TCP sockets. For net-next we can consider UDP support. Signed-off-by: John Fastabend <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: avoid preempt enable/disable in sockmap using tcp_skb_cb region SK_SKB BPF programs are run from the socket/tcp context but early in the stack before much of the TCP metadata is needed in tcp_skb_cb. So we can use some unused fields to place BPF metadata needed for SK_SKB programs when implementing the redirect function. This allows us to drop the preempt disable logic. It does however require an API change so sk_redirect_map() has been updated to additionally provide ctx_ptr to skb. Note, we do however continue to disable/enable preemption around actual BPF program running to account for map updates. Signed-off-by: John Fastabend <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: remove mark access for SK_SKB program types The skb->mark field is a union with reserved_tailroom which is used in the TCP code paths from stream memory allocation. Allowing SK_SKB programs to set this field creates a conflict with future code optimizations, such as "gifting" the skb to the egress path instead of creating a new skb and doing a memcpy. Because we do not have a released version of SK_SKB yet lets just remove it for now. A more appropriate scratch pad to use at the socket layer is dev_scratch, but lets add that in future kernels when needed. Signed-off-by: John Fastabend <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: require CAP_NET_ADMIN when using sockmap maps Restrict sockmap to CAP_NET_ADMIN. Signed-off-by: John Fastabend <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: require CAP_NET_ADMIN when using devmap Devmap is used with XDP which requires CAP_NET_ADMIN so lets also make CAP_NET_ADMIN required to use the map. Signed-off-by: John Fastabend <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * vmbus: hvsock: add proper sync for vmbus_hvsock_device_unregister() Without the patch, vmbus_hvsock_device_unregister() can destroy the device prematurely when close() is called, and can cause NULl dereferencing or potential data loss (the last portion of the data stream may be dropped prematurely). Signed-off-by: Dexuan Cui <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: K. Y. Srinivasan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * waitid(): Avoid unbalanced user_access_end() on access_ok() error As pointed out by Linus and David, the earlier waitid() fix resulted in a (currently harmless) unbalanced user_access_end() call. This fixes it to just directly return EFAULT on access_ok() failure. Fixes: 96ca579a1ecc ("waitid(): Add missing access_ok() checks") Acked-by: David Daney <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> * tcp/dccp: fix ireq->opt races syzkaller found another bug in DCCP/TCP stacks [1] For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix ireq->pktopts race"), we need to make sure we do not access ireq->opt unless we own the request sock. Note the opt field is renamed to ireq_opt to ease grep games. [1] BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295 CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427 ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474 tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135 tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587 tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557 __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072 tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline] tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071 tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816 tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x40c341 RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341 RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1 R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000 Allocated by task 3295: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 __do_kmalloc mm/slab.c:3725 [inline] __kmalloc+0x162/0x760 mm/slab.c:3734 kmalloc include/linux/slab.h:498 [inline] tcp_v4_save_options include/net/tcp.h:1962 [inline] tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271 tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283 tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313 tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857 tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482 tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3306: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157 __sk_destruct+0xfd/0x910 net/core/sock.c:1560 sk_destruct+0x47/0x80 net/core/sock.c:1595 __sk_free+0x57/0x230 net/core/sock.c:1603 sk_free+0x2a/0x40 net/core/sock.c:1614 sock_put include/net/sock.h:1652 [inline] inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959 tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765 tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675 ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216 NF_HOOK include/linux/netfilter.h:249 [inline] ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257 dst_input include/net/dst.h:464 [inline] ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397 NF_HOOK include/linux/netfilter.h:249 [inline] ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493 __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476 __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514 netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587 netif_receive_skb+0xae/0x390 net/core/dev.c:4611 tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372 tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766 tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792 call_write_iter include/linux/fs.h:1770 [inline] new_sync_write fs/read_write.c:468 [inline] __vfs_write+0x68a/0x970 fs/read_write.c:481 vfs_write+0x18f/0x510 fs/read_write.c:543 SYSC_write fs/read_write.c:588 [inline] SyS_write+0xef/0x220 fs/read_write.c:580 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> * packet: avoid panic in packet_getsockopt() syzkaller got crashes in packet_getsockopt() processing PACKET_ROLLOVER_STATS command while another thread was managing to change po->rollover Using RCU will fix this bug. We might later add proper RCU annotations for sparse sake. In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu() variant, as spotted by John. Fixes: a9b6391814d5 ("packet: rollover statistics") Signed-off-by: Eric Dumazet <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: John Sperbeck <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/ncsi: Fix AEN HNCDSC packet length Correct the value of the HNCDSC AEN packet. Fixes: 7a82ecf4cfb85 "net/ncsi: NCSI AEN packet handler" Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/ncsi: Stop monitor if channel times out or is inactive ncsi_channel_monitor() misses stopping the channel monitor in several places that it should, causing a WARN_ON_ONCE() to trigger when the monitor is re-started later, eg: [ 459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90 [ 459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140 [ 459.040000] Hardware name: ASpeed SoC [ 459.040000] Workqueue: events ncsi_dev_work [ 459.040000] [<80010094>] (unwind_backtrace) from [<8000d950>] (show_stack+0x20/0x24) [ 459.040000] [<8000d950>] (show_stack) from [<801dbf70>] (dump_stack+0x20/0x28) [ 459.040000] [<801dbf70>] (dump_stack) from [<80018d7c>] (__warn+0xe0/0x108) [ 459.040000] [<80018d7c>] (__warn) from [<80018e70>] (warn_slowpath_null+0x30/0x38) [ 459.040000] [<80018e70>] (warn_slowpath_null) from [<803f6a08>] (ncsi_start_channel_monitor+0x7c/0x90) [ 459.040000] [<803f6a08>] (ncsi_start_channel_monitor) from [<803f7664>] (ncsi_configure_channel+0xdc/0x5fc) [ 459.040000] [<803f7664>] (ncsi_configure_channel) from [<803f8160>] (ncsi_dev_work+0xac/0x474) [ 459.040000] [<803f8160>] (ncsi_dev_work) from [<8002d244>] (process_one_work+0x1e0/0x450) [ 459.040000] [<8002d244>] (process_one_work) from [<8002d510>] (worker_thread+0x5c/0x570) [ 459.040000] [<8002d510>] (worker_thread) from [<80033614>] (kthread+0x124/0x164) [ 459.040000] [<80033614>] (kthread) from [<8000a5e8>] (ret_from_fork+0x14/0x2c) This also updates the monitor instead of just returning if ncsi_xmit_cmd() fails to send the get-link-status command so that the monitor properly times out. Fixes: e6f44ed6d04d3 "net/ncsi: Package and channel management" Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/ncsi: Disable HWA mode when no channels are found When there are no NCSI channels probed, HWA (Hardware Arbitration) mode is enabled. It's not correct because HWA depends on the fact: NCSI channels exist and all of them support HWA mode. This disables HWA when no channels are probed. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/ncsi: Enforce failover on link monitor timeout The NCSI channel has been configured to provide service if its link monitor timer is enabled, regardless of its state (inactive or active). So the timeout event on the link monitor indicates the out-of-service on that channel, for which a failover is needed. This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor timeout, regardless the channel's original state (inactive or active). Also, the link is put into "down" state to give the failing channel lowest priority when selecting for the active channel. The state of failing channel should be set to active in order for deinitialization and failover to be done. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net/ncsi: Fix length of GVI response packet The length of GVI (GetVersionInfo) response packet should be 40 instead of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats. # ethtool --ncsi eth0 swstats : RESPONSE OK TIMEOUT ERROR ======================================= GVI 0 0 2 With this applied, no error reported on GVI response packets: # ethtool --ncsi eth0 swstats : RESPONSE OK TIMEOUT ERROR ======================================= GVI 2 0 0 Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: David S. Miller <[email protected]> * hv_sock: add locking in the open/close/release code paths Without the patch, when hvs_open_connection() hasn't completely established a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't inserted the sock into the connected queue), vsock_stream_connect() may see the sk_state change and return the connection to the userspace, and next when the userspace closes the connection quickly, hvs_release() may not see the connection in the connected queue; finally hvs_open_connection() inserts the connection into the queue, but we won't be able to purge the connection for ever. Signed-off-by: Dexuan Cui <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Cathy Avery <[email protected]> Cc: Rolf Neugebauer <[email protected]> Cc: Marcelo Cerri <[email protected]> Signed-off-by: David S. Miller <[email protected]> * geneve: Fix function matching VNI and tunnel ID on big-endian On big-endian machines, functions converting between tunnel ID and VNI use the three LSBs of tunnel ID storage to map VNI. The comparison function eq_tun_id_and_vni(), on the other hand, attempted to map the VNI from the three MSBs. Fix it by using the same check implemented on LE, which maps VNI from the three LSBs of tunnel ID. Fixes: 2e0b26e10352 ("geneve: Optimize geneve device lookup.") Signed-off-by: Stefano Brivio <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Signed-off-by: David S. Miller <[email protected]> * udp: make some messages more descriptive In the UDP code there are two leftover error messages with very few meaning. Replace them with a more descriptive error message as some users reported them as "strange network error". Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: David S. Miller <[email protected]> * android: binder: Don't get mm from task Use binder_alloc struct's mm_struct rather than getting a reference to the mm struct through get_task_mm to avoid a potential deadlock between lru lock, task lock and dentry lock, since a thread can be holding the task lock and the dentry lock while trying to acquire the lru lock. Acked-by: Arve Hjønnevåg <[email protected]> Signed-off-by: Sherry Yang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * android: binder: Fix null ptr dereference in debug msg Don't access next->data in kernel debug message when the next buffer is null. Acked-by: Arve Hjønnevåg <[email protected]> Signed-off-by: Sherry Yang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * net: aquantia: Reset nic statistics on interface up/down Internal statistics system on chip never gets reset until hardware reboot. This is quite inconvenient in terms of ethtool statistics usage. This patch implements incremental statistics update inside of service callback. Upon nic initialization, first request is done to fetch initial stat data, current collected stat data gets cleared. Internal statistics mailbox readout is improved to save space and increase readability Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: aquantia: Add queue restarts stats counter Queue stat strings are cleaned up, duplicate stat name strings removed, queue restarts counter added Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: aquantia: Fixed transient link up/down/up notification When doing ifconfig down/up, driver did not reported carrier_off neither in nic_stop nor in nic_start. That caused link to be visible as "up" during couple of seconds immediately after "ifconfig up". Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: aquantia: Limit number of MSIX irqs to the number of cpus There is no much practical use from having MSIX vectors more that number of cpus, thus cap this first with preconfigured limit, then with number of cpus online. Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: aquantia: mmio unmap was not performed on driver removal That may lead to mmio resource leakage. Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: aquantia: Enable coalescing management via ethtool interface Aquantia NIC allows both TX and RX interrupt throttle rate (ITR) management, but this was used in a very limited way via predefined values. This patch allows to setup ITR default values via module command line arguments and via standard ethtool coalescing settings. Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: aquantia: Bad udp rate on default interrupt coalescing Default Tx rates cause very long ISR delays on Tx. 0xff is 510us delay, giving only ~ 2000 interrupts per seconds for Tx rings cleanup. With these settings udp tx rate was never higher than ~800Mbps on a single stream. Changing min delay to 0xF makes it way better with ~6Gbps TCP stream performance is almost unaffected by this change, since LSO optimizations play important role. CPU load is affected insignificantly by this change. Signed-off-by: Pavel Belous <[email protected]> Signed-off-by: Igor Russkikh <[email protected]> Signed-off-by: David S. Miller <[email protected]> * cpu/hotplug: Reset node state after operation The recent rework of the cpu hotplug internals changed the usage of the per cpu state->node field, but missed to clean it up after usage. So subsequent hotplug operations use the stale pointer from a previous operation and hand it into the callback functions. The callbacks then dereference a pointer which either belongs to a different facility or points to freed and potentially reused memory. In either case data corruption and crashes are the obvious consequence. Reset the node and the last pointers in the per cpu state to NULL after the operation which set them has completed. Fixes: 96abb968549c ("smp/hotplug: Allow external multi-instance rollback") Reported-by: Tvrtko Ursulin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos * hwmon: (da9052) Increase sample rate when using TSI The TSI channel, which is usually used for touchscreen support, but can be used as 4 general purpose ADCs. When used as a touchscreen interface the touchscreen driver switches the device into 1ms sampling mode (rather than the default 10ms economy mode) as recommended by the manufacturer. When using the TSI channels as a general purpose ADC we are currently not doing this and testing suggests that this can result in ADC timeouts: [ 5827.198289] da9052 spi2.0: timeout waiting for ADC conversion interrupt [ 5827.728293] da9052 spi2.0: timeout waiting for ADC conversion interrupt [ 5993.808335] da9052 spi2.0: timeout waiting for ADC conversion interrupt [ 5994.328441] da9052 spi2.0: timeout waiting for ADC conversion interrupt [ 5994.848291] da9052 spi2.0: timeout waiting for ADC conversion interrupt Switching to the 1ms timing resolves this issue. Fixes: 4f16cab19a3d5 ("hwmon: da9052: Add support for TSI channel") Signed-off-by: Martyn Welch <[email protected]> Acked-by: Steve Twiss <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> * drm/amd/powerplay: fix uninitialized variable refresh_rate was not initialized when program display gap. this patch can fix vce ring test failed when do S3 on Polaris10. bug: https://bugs.freedesktop.org/show_bug.cgi?id=103102 bug: https://bugzilla.kernel.org/show_bug.cgi?id=196615 Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Rex Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] * bpf: devmap fix arithmetic overflow in bitmap_size calculation An integer overflow is possible in dev_map_bitmap_size() when calculating the BITS_TO_LONG logic which becomes, after macro replacement, (((n) + (d) - 1)/ (d)) where 'n' is a __u32 and 'd' is (8 * sizeof(long)). To avoid overflow cast to u64 before arithmetic. Reported-by: Richard Weinberger <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: John Fastabend <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: fix off by one for range markings with L{T, E} patterns During review I noticed that the current logic for direct packet access marking in check_cond_jmp_op() has an off by one for the upper right range border when marking in find_good_pkt_pointers() with BPF_JLT and BPF_JLE. It's not really harmful given access up to pkt_end is always safe, but we should nevertheless correct the range marking before it becomes ABI. If pkt_data' denotes a pkt_data derived pointer (pkt_data + X), then for pkt_data' < pkt_end in the true branch as well as for pkt_end <= pkt_data' in the false branch we mark the range with X although it should really be X - 1 in these cases. For example, X could be pkt_end - pkt_data, then when testing for pkt_data' < pkt_end the verifier simulation cannot deduce that a byte load of pkt_data' - 1 would succeed in this branch. Fixes: b4e432f1000a ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: fix pattern matches for direct packet access Alexander had a test program with direct packet access, where the access test was in the form of data + X > data_end. In an unrelated change to the program LLVM decided to swap the branches and emitted code for the test in form of data + X <= data_end. We hadn't seen these being generated previously, thus verifier would reject the program. Therefore, fix up the verifier to detect all test cases, so we don't run into such issues in the future. Fixes: b4e432f1000a ("bpf: enable BPF_J{LT, LE, SLT, SLE} opcodes in verifier") Reported-by: Alexander Alemayhu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]> * bpf: add test cases to bpf selftests to cover all access tests Lets add test cases to cover really all possible direct packet access tests for good/bad access cases so we keep tracking them. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]> * sock: correct sk_wmem_queued accounting on efault in tcp zerocopy Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues after triggering an EFAULT in __zerocopy_sg_from_iter. On this error, skb_zerocopy_stream_iter resets the skb to its state before the operation with __pskb_trim. It cannot kfree_skb like datagram callers, as the skb may have data from a previous send call. __pskb_trim calls skb_condense for unowned skbs, which adjusts their truesize. These tcp skbuffs are owned and their truesize must add up to sk_wmem_queued. But they match because their skb->sk is NULL until tcp_transmit_skb. Temporarily set skb->sk when calling __pskb_trim to signal that the skbuffs are owned and avoid the skb_condense path. Fixes: 52267790ef52 ("sock: add MSG_ZEROCOPY") Signed-off-by: Willem de Bruijn <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: bridge: fix returning of vlan range op errors When vlan tunnels were introduced, vlan range errors got silently dropped and instead 0 was returned always. Restore the previous behaviour and return errors to user-space. Fixes: efa5356b0d97 ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Nikolay Aleksandrov <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]> * soreuseport: fix initialization race Syzkaller stumbled upon a way to trigger WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41 reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39 There are two initialization paths for the sock_reuseport structure in a socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through SO_ATTACH_REUSEPORT_[CE]BPF before bind. The existing implementation assumedthat the socket lock protected both of these paths when it actually only protects the SO_ATTACH_REUSEPORT path. Syzkaller triggered this double allocation by running these paths concurrently. This patch moves the check for double allocation into the reuseport_alloc function which is protected by a global spin lock. Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection") Fixes: c125e80b8868 ("soreuseport: fast reuseport TCP socket selection") Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: ethtool: remove error check for legacy setting transceiver type Commit 9cab88726929605 ("net: ethtool: Add back transceiver type") restores the transceiver type to struct ethtool_link_settings and convert_link_ksettings_to_legacy_settings() but forgets to remove the error check for the same in convert_legacy_settings_to_link_ksettings(). This prevents older versions of ethtool to change link settings. # ethtool --version ethtool version 3.16 # ethtool -s eth0 autoneg on speed 100 duplex full Cannot set new settings: Invalid argument not setting speed not setting duplex not setting autoneg While newer versions of ethtool works. # ethtool --version ethtool version 4.10 # ethtool -s eth0 autoneg on speed 100 duplex full [ 57.703268] sh-eth ee700000.ethernet eth0: Link is Down [ 59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx Fixes: 19cab88726929605 ("net: ethtool: Add back transceiver type") Signed-off-by: Niklas Söderlund <[email protected]> Reported-by: Renjith R V <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]> * mlxsw: reg: Add Tunneling IPinIP General Configuration Register The TIGCR register is used for setting up the IPinIP Tunnel configuration. Fixes: ee954d1a91b2 ("mlxsw: spectrum_router: Support GRE tunnels") Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> * mlxsw: spectrum_router: Configure TIGCR on init Spectrum tunnels do not default to ttl of "inherit" like the Linux ones do. Configure TIGCR on router init so that the TTL of tunnel packets is copied from the overlay packets. Fixes: ee954d1a91b2 ("mlxsw: spectrum_router: Support GRE tunnels") Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: stmmac: Add missing call to dev_kfree_skb() When RX HW timestamp is enabled and a frame is discarded we are not freeing the skb but instead only setting to NULL the entry. Add a call to dev_kfree_skb_any() so that skb entry is correctly freed. Signed-off-by: Jose Abreu <[email protected]> Cc: David S. Miller <[email protected]> Cc: Joao Pinto <[email protected]> Cc: Giuseppe Cavallaro <[email protected]> Cc: Alexandre Torgue <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: stmmac: Fix stmmac_get_rx_hwtstamp() When using GMAC4 the valid timestamp is from CTX next desc but we are passing the previous desc to get_rx_timestamp_status() callback. Fix this and while at it rework a little bit the function logic. Signed-off-by: Jose Abreu <[email protected]> Cc: David S. Miller <[email protected]> Cc: Joao Pinto <[email protected]> Cc: Giuseppe Cavallaro <[email protected]> Cc: Alexandre Torgue <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: stmmac: Prevent infinite loop in get_rx_timestamp_status() Prevent infinite loop by correctly setting the loop condition to break when i == 10. Signed-off-by: Jose Abreu <[email protected]> Cc: David S. Miller <[email protected]> Cc: Joao Pinto <[email protected]> Cc: Giuseppe Cavallaro <[email protected]> Cc: Alexandre Torgue <[email protected]> Signed-off-by: David S. Miller <[email protected]> * rxrpc: Don't release call mutex on error pointer Don't release call mutex at the end of rxrpc_kernel_begin_call() if the call pointer actually holds an error value. Fixes: 540b1c48c37a ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg") Reported-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]> * textsearch: fix typos in library helpers Fix spellos (typos) in textsearch library helpers. Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: David S. Miller <[email protected]> * of_mdio: Fix broken PHY IRQ in case of probe deferral If an Ethernet PHY is initialized before the interrupt controller it is connected to, a message like the following is printed: irq: no irq domain found for /interrupt-controller@e61c0000 ! However, the actual error is ignored, leading to a non-functional (POLL) PHY interrupt later: Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL) Depending on whether the PHY driver will fall back to polling, Ethernet may or may not work. To fix this: 1. Switch of_mdiobus_register_phy() from irq_of_parse_and_map() to of_irq_get(). Unlike the former, the latter returns -EPROBE_DEFER if the interrupt controller is not yet available, so this condition can be detected. Other errors are handled the same as before, i.e. use the passed mdio->irq[addr] as interrupt. 2. Propagate and handle errors from of_mdiobus_register_phy() and of_mdiobus_register_device(). Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ipv6: flowlabel: do not leave opt->tot_len with garbage When syzkaller team brought us a C repro for the crash [1] that had been reported many times in the past, I finally could find the root cause. If FlowLabel info is merged by fl6_merge_options(), we leave part of the opt_space storage provided by udp/raw/l2tp with random value in opt_space.tot_len, unless a control message was provided at sendmsg() time. Then ip6_setup_cork() would use this random value to perform a kzalloc() call. Undefined behavior and crashes. Fix is to properly set tot_len in fl6_merge_options() At the same time, we can also avoid consuming memory and cpu cycles to clear it, if every option is copied via a kmemdup(). This is the change in ip6_setup_cork(). [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801cb64a100 task.stack: ffff8801cc350000 RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: 0018:ffff8801cc357550 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010 RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014 RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10 R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0 R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0 FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0 DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729 udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x358/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4520a9 RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9 RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016 RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029 Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: David S. Miller <[email protected]> * stmmac: Don't access tx_q->dirty_tx before netif_tx_lock This is the possible reason for different hard to reproduce problems on my ARMv7-SMP test system. The symptoms are in recent kernels imprecise external aborts, and in older kernels various kinds of network stalls and unexpected page allocation failures. My testing indicates that the trouble started between v4.5 and v4.6 and prevails up to v4.14. Using the dirty_tx before acquiring the spin lock is clearly wrong and was first introduced with v4.6. Fixes: e3ad57c96715 ("stmmac: review RX/TX ring management") Signed-off-by: Bernd Edlinger <[email protected]> Signed-off-by: David S. Miller <[email protected]> * x86/cpu/AMD: Apply the Erratum 688 fix when the BIOS doesn't Some F14h machines have an erratum which, "under a highly specific and detailed set of internal timing conditions" can lead to skipping instructions and RIP corruption. Add the fix for those machines when their BIOS doesn't apply it or there simply isn't BIOS update for them. Tested-by: <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sherry Hurwitz <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yazen Ghannam <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: https://bugzilla.kernel.org/show_bug.cgi?id=197285 [ Added pr_info() that we activated the workaround. ] Signed-off-by: Ingo Molnar <[email protected]> * Input: do not use property bits when generating module alias The commit 8724ecb07229 ("Input: allow matching device IDs on property bits") started using property bits when generating module aliases for input handlers, but did not adjust the generation of MODALIAS attribute on input device uevents, breaking automatic module loading. Given that no handler currently uses property bits in their module tables, let's revert this part of the commit for now. Reported-by: Damien Wyart <[email protected]> Tested-by: Damien Wyart <[email protected]> Fixes: 8724ecb07229 ("Input: allow matching device IDs on property bits") Signed-off-by: Dmitry Torokhov <[email protected]> * tcp: do tcp_mstamp_refresh before retransmits on TSQ handler When retransmission on TSQ handler was introduced in the commit f9616c35a0d7 ("tcp: implement TSQ for retransmits"), the retransmitted skbs' timestamps were updated on the actual transmission. In the later commit 385e20706fac ("tcp: use tp->tcp_mstamp in output path"), it stops being done so. In the commit, the comment says "We try to refresh tp->tcp_mstamp only when necessary", and at present tcp_tsq_handler and tcp_v4_mtu_reduced applies to this. About the latter, it's okay since it's rare enough. About the former, even though possible retransmissions on the tasklet comes just after the destructor run in NET_RX softirq handling, the time between them could be nonnegligibly large to the extent that tcp_rack_advance or rto rearming be affected if other (remaining) RX, BLOCK and (preceding) TASKLET sofirq handlings are unexpectedly heavy. So in the same way as tcp_write_timer_handler does, doing tcp_mstamp_refresh ensures the accuracy of algorithms relying on it. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Koichiro Den <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> * tcp/dccp: fix lockdep splat in inet_csk_route_req() This patch fixes the following lockdep splat in inet_csk_route_req() lockdep_rcu_suspicious inet_csk_route_req tcp_v4_send_synack tcp_rtx_synack inet_rtx_syn_ack tcp_fastopen_synack_time tcp_retransmit_timer tcp_write_timer_handler tcp_write_timer call_timer_fn Thread running inet_csk_route_req() owns a reference on the request socket, so we have the guarantee ireq->ireq_opt wont be changed or freed. lockdep can enforce this invariant for us. Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> * ipsec: Fix aborted xfrm policy dump crash An independent security researcher, Mohamed Ghannam, has reported this vulnerability to Beyond Security's SecuriTeam Secure Disclosure program. The xfrm_dump_policy_done function expects xfrm_dump_policy to have been called at least once or it will crash. This can be triggered if a dump fails because the target socket's receive buffer is full. This patch fixes it by using the cb->start mechanism to ensure that the initialisation is always done regardless of the buffer situation. Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list") Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Steffen Klassert <[email protected]> * scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER The legacy block layer handles requests as follows: - If the prep function returns BLKPREP_OK, let blk_peek_request() return the pointer to that request. - If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED flag and retry calling the prep function later. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end the request. In none of these cases it is correct to clear the SCMD_INITIALIZED flag from inside scsi_prep_fn(). Since scsi_prep_fn() already guarantees that scsi_init_command() will be called once even if scsi_prep_fn() is called multiple times, remove the code that clears SCMD_INITIALIZED from scsi_prep_fn(). The scsi-mq code handles requests as follows: - If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag and submit the request to the SCSI LLD. - If scsi_mq_prep_fn() returns BLKPREP_DEFER, call blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE. - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call scsi_mq_uninit_cmd() and let the blk-mq core end the request. In none of these cases scsi_mq_prep_fn() should clear the SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn() function that clears that flag. This patch avoids that the following warning is triggered when using the legacy block layer: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220 CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1 task: ffff91c147a4b800 task.stack: ffffb282c37b8000 RIP: 0010:scsi_end_request+0x1de/0x220 Call Trace: <IRQ> scsi_io_completion+0x204/0x5e0 scsi_finish_command+0xce/0xe0 scsi_softirq_done+0x126/0x130 blk_done_softirq+0x6e/0x80 __do_softirq+0xcf/0x2a8 irq_exit+0xab/0xb0 do_IRQ+0x7b/0xc0 common_interrupt+0x90/0x90 </IRQ> RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10 __test_set_page_writeback+0xc7/0x2c0 __block_write_full_page+0x158/0x3b0 block_write_full_page+0xc4/0xd0 blkdev_writepage+0x13/0x20 __writepage+0x12/0x40 write_cache_pages+0x204/0x500 generic_writepages+0x48/0x70 blkdev_writepages+0x9/0x10 do_writepages+0x34/0xc0 __filemap_fdatawrite_range+0x6c/0x90 file_write_and_wait_range+0x31/0x90 blkdev_fsync+0x16/0x40 vfs_fsync_range+0x44/0xa0 do_fsync+0x38/0x60 SyS_fsync+0xb/0x10 entry_SYSCALL_64_fastpath+0x13/0x94 ---[ end trace 86e8ef85a4a6c1d1 ]--- Fixes: commit 64104f703212 ("scsi: Call scsi_initialize_rq() for filesystem requests") Signed-off-by: Bart Van Assche <[email protected]> Cc: Damien Le Moal <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Johannes Thumshirn <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> * Linux 4.14-rc6 * x86/entry: Fix idtentry unwind hint This fixes the following ORC warning in the 'int3' entry code: WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b The ORC metadata had the wrong stack offset for the iret registers. Their location on the stack is dependent on whether the exception has an error code. Reported-and-tested-by: Andrei Vagin <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations") Link: http://lkml.kernel.org/r/931d57f0551ed7979d5e7e05370d445c8e5137f8.1508516398.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]> * x86/unwind: Show function name+offset in ORC error messages Improve the warning messages to show the relevant function name+offset. This makes it much easier to diagnose problems with the ORC metadata. Before: WARNING: can't dereference iret registers at ffff8801c5f17fe0 for ip ffffffff95f0d94b After: WARNING: can't dereference iret registers at ffff880178f5ffe0 for ip int3+0x5b/0x60 Reported-by: Andrei Vagin <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Link: http://lkml.kernel.org/r/6bada6b9eac86017e16bd79e1e77877935cb50bb.1508516398.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]> * sched/swait: Document it clearly that the swait facilities are special and shouldn't be used We currently welcome using swait over wait whenever possible because it is a slimmer data structure. However, Linus has made it very clear that he does not want this used, unless under very specific RT scenarios (such as current users). Update the comments before kernel hipsters start thinking swait is the cool thing to do. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Luis R. Rodriguez <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> * platform/x86: intel_pmc_ipc: Use devm_* calls in driver probe function This patch cleans up unnecessary free/alloc calls in ipc_plat_probe(), ipc_pci_probe() and ipc_plat_get_res() functions by using devm_* calls. This patch also adds proper error handling for failure cases in ipc_pci_probe() function. Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> [andy: fixed style issues, missed devm_free_irq(), removed unnecessary log message] Signed-off-by: Andy Shevchenko <[email protected]> * platform/x86: intel_pmc_ipc: Use spin_lock to protect GCR updates Currently, update_no_reboot_bit() function implemented in this driver uses mutex_lock() to protect its register updates. But this function is called with in atomic context in iTCO_wdt_start() and iTCO_wdt_stop() functions in iTCO_wdt.c driver, which in turn causes "sleeping into atomic context" issue. This patch fixes this issue by replacing the mutex_lock() with spin_lock() to protect the GCR read/write/update APIs. Fixes: 9d855d4 ("platform/x86: intel_pmc_ipc: Fix iTCO_wdt GCS memory mapping failure") Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> * kbuild doc: a bundle of fixes on makefiles.txt It does several fixes: 1. move the displaced ld example to its reasonable place. 2. add new example for command gzip. 3. fix 2 number errors. 4. fix format of chapter 7.x, make it looks the same as other chapters. Signed-off-by: Cao jin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> * kbuild: clang: fix build failures with sparse check We should avoid using the space character when passing arguments to clang, because static code analysis check tool such as sparse may misinterpret the arguments followed by spaces as build targets hence cause the build to fail. Signed-off-by: David Lin <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> * xfs: fix AIM7 regression Apparently our current rwsem code doesn't like doing the trylock, then lock for real scheme. So change our read/write methods to just do the trylock for the RWF_NOWAIT case. This fixes a ~25% regression in AIM7. Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]> * drivers/net/usb: add device id for TP-LINK UE300 USB 3.0 Ethernet This product is named 'TP-LINK USB 3.0 Gigabit Ethernet Network Adapter (Model No.is UE300)'. It uses chip RTL8153 and works with driver drivers/net/usb/r8152.c Signed-off-by: Ran Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]> * cdc_ether: flag the Huawei ME906/ME909 as WWAN The Huawei ME906 (12d1:15c1) comes with a standard ECM interface that requires management via AT commands sent over one of the control TTYs (e.g. connected with AT^NDISDUP). Signed-off-by: Aleksander Morgado <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: mvpp2: fix TSO headers allocation and management TSO headers are managed with txq index and therefore should be aligned with the txq size, not with the aggregated txq size. Fixes: 186cd4d4e414 ("net: mvpp2: software tso support") Reported-by: Marc Zyngier <[email protected]> Signed-off-by: Yan Markman <[email protected]> Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: mvpp2: do not unmap TSO headers buffers The TSO header buffers are coming from a per cpu pool and should not be unmapped as they are reused. The PPv2 driver was unmapping all descriptors buffers unconditionally. This patch fixes this by checking the buffers dma addresses before unmapping them, and by not unmapping those who are located in the TSO header pool. Fixes: 186cd4d4e414 ("net: mvpp2: software tso support") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]> * net: mvpp2: do not call txq_done from the Tx path when Tx irqs are used When Tx IRQs are used, txq_bufs_free() can be called from both the Tx path and from NAPI poll(). This led to CPU stalls as if these two tasks (Tx and Poll) are scheduled on two CPUs at the same time, DMA unmapping operations are done on the same txq buffers. This patch adds a check not to call txq_done() from the Tx path if Tx interrupts are used as it does not make sense to do so. Fixes: edc660fa09e2 ("net: mvpp2: replace TX coalescing interrupts with hrtimer") Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]> * sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND Commit 9b9742022888 ("sctp: support ipv6 nonlocal bind") introduced support for the above options as v4 sctp did, so patched sctp_v6_available(). In the v4 implementation it's enough, because sctp_inet_bind_verify() just returns with sctp_v4_available(). However sctp_inet6_bind_verify() has an extra check before that for link-local scope_id, which won't respect the above options. Added the checks before calling ipv6_chk_addr(), but not before the validation of scope_id. before (w/ both options): ./v6test fe80::10 sctp bind failed, errno: 99 (Cannot assign requested address) ./v6test fe80::10 tcp bind success, errno: 0 (Success) after (w/ both options): ./v6test fe80::10 sctp bind success, errno: 0 (Success) Signed-off-by: Laszlo Toth <[email protected]> Reviewed-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]> * can: sun4i: fix loopback mode Fix loopback mode by setting the right flag and remove presume mode. Signed-off-by: Gerhard Bertelsmann <[email protected]> Cc: linux-stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]> * can: kvaser_usb: Correct return value in printout If the return value from kvaser_usb_send_simple_msg() was non-zero, the return value from kvaser_usb_flush_queue() was printed in the kernel warning. Signed-off-by: Jimmy Assarsson <[email protected]> Cc: linux-stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]> * can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages To avoid kernel warning "Unhandled message (68)", ignore the CMD_FLUSH_QUEUE_REPLY message for now. As of Leaf v2 firmware version v4.1.844 (2017-02-15), flush tx queue is synchronous. There is a capability bit indicating whether flushing tx queue is synchronous or asynchronous. A proper solution would be to query the device for capabilities. If the synchronous tx flush capability bit is set, we should wait for CMD_FLUSH_QUEUE_REPLY message, while flushing the tx queue. Signed-off-by: Jimmy Assarsson <[email protected]> Cc: linux-stable <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]> * perf/x86/intel/bts: Fix exclusive event reference leak Commit: d2878d642a4ed ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems") ... adds a privilege check in the exactly wrong place in the event init path: after the 'LBR exclusive' reference has been taken, and doesn't release it in the case of insufficient privileges. After this, nobody in the system gets to use PT or LBR afterwards. This patch moves the privilege check to where it should have been in the first place. Signed-off-by: Alexander Shishkin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: d2878d642a4ed ("perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> * ALSA: hda - fix headset mic problem for Dell machines with alc236 We have several Dell laptops which use the codec alc236, the headset mic can't work on these machines. Following the commit 736f20a70, we add the pin cfg table to make the headset mic work. Cc: <[email protected]> Signed-off-by: Hui Wang <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> * hwmon: (tmp102) Fix first temperature reading Commit 3d8f7a89a197 ("hwmon: (tmp102) Improve handling of initial read delay") reduced the initial temperature read delay and made it dependent on the chip's shutdown mode. If the chip was not in shutdown mode at probe, the read delay no longer applies. This ignores the fact that the chip initialization changes the temperature sensor resolution, and that the temperature register values change when the resolution is changed. As a result, the reported temperature is twice as high as the real temperature until the first temperature conversion after the configuration change is complete. This can result in unexpected behavior and, worst case, in a system shutdown. To fix the problem, let's just always wait for a conversion to complete before reporting a temperatu…

KernelPRBot · 2017-11-12T18:20:06Z

Hi @andrewlucianlu!

Thanks for your contribution to the Linux kernel!

Linux kernel development happens on mailing lists, rather than on GitHub - this GitHub repository is a read-only mirror that isn't used for accepting contributions. So that your change can become part of Linux, please email it to us as a patch.

Sending patches isn't quite as simple as sending a pull request, but fortunately it is a well documented process.

Here's what to do:

Format your contribution according to kernel requirements
Decide who to send your contribution to
Set up your system to send your contribution as an email
Send your contribution and wait for feedback

How do I format my contribution?

The Linux kernel community is notoriously picky about how contributions are formatted and sent. Fortunately, they have documented their expectations.

Firstly, all contributions need to be formatted as patches. A patch is a plain text document showing the change you want to make to the code, and documenting why it is a good idea.

You can create patches with git format-patch.

Secondly, patches need 'commit messages', which is the human-friendly documentation explaining what the change is and why it's necessary.

Thirdly, changes have some technical requirements. There is a Linux kernel coding style, and there are licensing requirements you need to comply with.

Both of these are documented in the Submitting Patches documentation that is part of the kernel.

Note that you will almost certainly have to modify your existing git commits to satisfy these requirements. Don't worry: there are many guides on the internet for doing this.

Who do I send my contribution to?

The Linux kernel is composed of a number of subsystems. These subsystems are maintained by different people, and have different mailing lists where they discuss proposed changes.

If you don't already know what subsystem your change belongs to, the get_maintainer.pl script in the kernel source can help you.

get_maintainer.pl will take the patch or patches you created in the previous step, and tell you who is responsible for them, and what mailing lists are used. You can also take a look at the MAINTAINERS file by hand.

Make sure that your list of recipients includes a mailing list. If you can't find a more specific mailing list, then LKML - the Linux Kernel Mailing List - is the place to send your patches.

It's not usually necessary to subscribe to the mailing list before you send the patches, but if you're interested in kernel development, subscribing to a subsystem mailing list is a good idea. (At this point, you probably don't need to subscribe to LKML - it is a very high traffic list with about a thousand messages per day, which is often not useful for beginners.)

How do I send my contribution?

Use git send-email, which will ensure that your patches are formatted in the standard manner. In order to use git send-email, you'll need to configure git to use your SMTP email server.

For more information about using git send-email, look at the Git documentation or type git help send-email. There are a number of useful guides and tutorials about git send-email that can be found on the internet.

How do I get help if I'm stuck?

Firstly, don't get discouraged! There are an enormous number of resources on the internet, and many kernel developers who would like to see you succeed.

Many issues - especially about how to use certain tools - can be resolved by using your favourite internet search engine.

If you can't find an answer, there are a few places you can turn:

Kernel Newbies - this website contains a lot of useful resources for new kernel developers.
The kernel documentation - see also the Documentation directory in the kernel tree.

If you get really, really stuck, you could try the owners of this bot, @daxtens and @ajdlinux. Please be aware that we do have full-time jobs, so we are almost certainly the slowest way to get answers!

I sent my patch - now what?

You wait.

You can check that your email has been received by checking the mailing list archives for the mailing list you sent your patch to. Messages may not be received instantly, so be patient. Kernel developers are generally very busy people, so it may take a few weeks before your patch is looked at.

Then, you keep waiting. Three things may happen:

You might get a response to your email. Often these will be comments, which may require you to make changes to your patch, or explain why your way is the best way. You should respond to these comments, and you may need to submit another revision of your patch to address the issues raised.
Your patch might be merged into the subsystem tree. Code that becomes part of Linux isn't merged into the main repository straight away - it first goes into the subsystem tree, which is managed by the subsystem maintainer. It is then batched up with a number of other changes sent to Linus for inclusion. (This process is described in some detail in the kernel development process guide).
Your patch might be ignored completely. This happens sometimes - don't take it personally. Here's what to do:
- Wait a bit more - patches often take several weeks to get a response; more if they were sent at a busy time.
- Kernel developers often silently ignore patches that break the rules. Check for obvious violations of the the Submitting Patches guidelines, the style guidelines, and any other documentation you can find about your subsystem. Check that you're sending your patch to the right place.
- Try again later. When you resend it, don't add angry commentary, as that will get your patch ignored. It might also get you silently blacklisted.

Further information

Working with the kernel development community - the official documentation for new kernel contributors

Happy hacking!

This message was posted by a bot - if you have any questions or suggestions, please talk to my owners, @ajdlinux and @daxtens, or raise an issue at https://github.com/ajdlinux/KernelPRBot.

Xaleth · 2017-11-12T18:32:44Z

This'll take forever to review. Wake up as an old person only to see this closed or merged.

Pro tip, fix the file conflicts.

When using a AIO read() operation on the function FS gadget driver a URB is submitted asynchronously and on URB completion the received data is copied to the userspace buffer associated with the read operation. This is done from a kernel worker thread invoking copy_to_user() (through copy_to_iter()). And while the user space process memory is made available to the kernel thread using use_mm(), some architecture require in addition to this that the operation runs with USER_DS set. Otherwise the userspace memory access will fail. For example on ARM64 with Privileged Access Never (PAN) and User Access Override (UAO) enabled the following crash occurs. Internal error: Accessing user space memory with fs=KERNEL_DS: 9600004f [#1] SMP Modules linked in: CPU: 2 PID: 1636 Comm: kworker/2:1 Not tainted 4.9.0-04081-g8ab2dfb-dirty torvalds#487 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) Workqueue: events ffs_user_copy_worker task: ffffffc87afc8080 task.stack: ffffffc87a00c000 PC is at __arch_copy_to_user+0x190/0x220 LR is at copy_to_iter+0x78/0x3c8 [...] [<ffffff800847b790>] __arch_copy_to_user+0x190/0x220 [<ffffff80086f25d8>] ffs_user_copy_worker+0x70/0x130 [<ffffff80080b8c64>] process_one_work+0x1dc/0x460 [<ffffff80080b8f38>] worker_thread+0x50/0x4b0 [<ffffff80080bf5a0>] kthread+0xd8/0xf0 [<ffffff8008083680>] ret_from_fork+0x10/0x50 Address this by placing a set_fs(USER_DS) before of the copy operation and revert it again once the copy operation has finished. This patch is analogous to commit d7ffde3 ("vhost: use USER_DS in vhost_worker thread") which addresses the same underlying issue. Signed-off-by: Lars-Peter Clausen <[email protected]>

When using a AIO read() operation on the function FS gadget driver a URB is submitted asynchronously and on URB completion the received data is copied to the userspace buffer associated with the read operation. This is done from a kernel worker thread invoking copy_to_user() (through copy_to_iter()). And while the user space process memory is made available to the kernel thread using use_mm(), some architecture require in addition to this that the operation runs with USER_DS set. Otherwise the userspace memory access will fail. For example on ARM64 with Privileged Access Never (PAN) and User Access Override (UAO) enabled the following crash occurs. Internal error: Accessing user space memory with fs=KERNEL_DS: 9600004f [#1] SMP Modules linked in: CPU: 2 PID: 1636 Comm: kworker/2:1 Not tainted 4.9.0-04081-g8ab2dfb-dirty torvalds#487 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) Workqueue: events ffs_user_copy_worker task: ffffffc87afc8080 task.stack: ffffffc87a00c000 PC is at __arch_copy_to_user+0x190/0x220 LR is at copy_to_iter+0x78/0x3c8 [...] [<ffffff800847b790>] __arch_copy_to_user+0x190/0x220 [<ffffff80086f25d8>] ffs_user_copy_worker+0x70/0x130 [<ffffff80080b8c64>] process_one_work+0x1dc/0x460 [<ffffff80080b8f38>] worker_thread+0x50/0x4b0 [<ffffff80080bf5a0>] kthread+0xd8/0xf0 [<ffffff8008083680>] ret_from_fork+0x10/0x50 Address this by placing a set_fs(USER_DS) before of the copy operation and revert it again once the copy operation has finished. This patch is analogous to commit d7ffde3 ("vhost: use USER_DS in vhost_worker thread") which addresses the same underlying issue. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]>

[ Upstream commit 4058ebf ] When using a AIO read() operation on the function FS gadget driver a URB is submitted asynchronously and on URB completion the received data is copied to the userspace buffer associated with the read operation. This is done from a kernel worker thread invoking copy_to_user() (through copy_to_iter()). And while the user space process memory is made available to the kernel thread using use_mm(), some architecture require in addition to this that the operation runs with USER_DS set. Otherwise the userspace memory access will fail. For example on ARM64 with Privileged Access Never (PAN) and User Access Override (UAO) enabled the following crash occurs. Internal error: Accessing user space memory with fs=KERNEL_DS: 9600004f [#1] SMP Modules linked in: CPU: 2 PID: 1636 Comm: kworker/2:1 Not tainted 4.9.0-04081-g8ab2dfb-dirty torvalds#487 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) Workqueue: events ffs_user_copy_worker task: ffffffc87afc8080 task.stack: ffffffc87a00c000 PC is at __arch_copy_to_user+0x190/0x220 LR is at copy_to_iter+0x78/0x3c8 [...] [<ffffff800847b790>] __arch_copy_to_user+0x190/0x220 [<ffffff80086f25d8>] ffs_user_copy_worker+0x70/0x130 [<ffffff80080b8c64>] process_one_work+0x1dc/0x460 [<ffffff80080b8f38>] worker_thread+0x50/0x4b0 [<ffffff80080bf5a0>] kthread+0xd8/0xf0 [<ffffff8008083680>] ret_from_fork+0x10/0x50 Address this by placing a set_fs(USER_DS) before of the copy operation and revert it again once the copy operation has finished. This patch is analogous to commit d7ffde3 ("vhost: use USER_DS in vhost_worker thread") which addresses the same underlying issue. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 4058ebf ] When using a AIO read() operation on the function FS gadget driver a URB is submitted asynchronously and on URB completion the received data is copied to the userspace buffer associated with the read operation. This is done from a kernel worker thread invoking copy_to_user() (through copy_to_iter()). And while the user space process memory is made available to the kernel thread using use_mm(), some architecture require in addition to this that the operation runs with USER_DS set. Otherwise the userspace memory access will fail. For example on ARM64 with Privileged Access Never (PAN) and User Access Override (UAO) enabled the following crash occurs. Internal error: Accessing user space memory with fs=KERNEL_DS: 9600004f [jwrdegoede#1] SMP Modules linked in: CPU: 2 PID: 1636 Comm: kworker/2:1 Not tainted 4.9.0-04081-g8ab2dfb-dirty torvalds#487 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) Workqueue: events ffs_user_copy_worker task: ffffffc87afc8080 task.stack: ffffffc87a00c000 PC is at __arch_copy_to_user+0x190/0x220 LR is at copy_to_iter+0x78/0x3c8 [...] [<ffffff800847b790>] __arch_copy_to_user+0x190/0x220 [<ffffff80086f25d8>] ffs_user_copy_worker+0x70/0x130 [<ffffff80080b8c64>] process_one_work+0x1dc/0x460 [<ffffff80080b8f38>] worker_thread+0x50/0x4b0 [<ffffff80080bf5a0>] kthread+0xd8/0xf0 [<ffffff8008083680>] ret_from_fork+0x10/0x50 Address this by placing a set_fs(USER_DS) before of the copy operation and revert it again once the copy operation has finished. This patch is analogous to commit d7ffde3 ("vhost: use USER_DS in vhost_worker thread") which addresses the same underlying issue. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 4058ebf ] When using a AIO read() operation on the function FS gadget driver a URB is submitted asynchronously and on URB completion the received data is copied to the userspace buffer associated with the read operation. This is done from a kernel worker thread invoking copy_to_user() (through copy_to_iter()). And while the user space process memory is made available to the kernel thread using use_mm(), some architecture require in addition to this that the operation runs with USER_DS set. Otherwise the userspace memory access will fail. For example on ARM64 with Privileged Access Never (PAN) and User Access Override (UAO) enabled the following crash occurs. Internal error: Accessing user space memory with fs=KERNEL_DS: 9600004f [#1] SMP Modules linked in: CPU: 2 PID: 1636 Comm: kworker/2:1 Not tainted 4.9.0-04081-g8ab2dfb-dirty torvalds#487 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) Workqueue: events ffs_user_copy_worker task: ffffffc87afc8080 task.stack: ffffffc87a00c000 PC is at __arch_copy_to_user+0x190/0x220 LR is at copy_to_iter+0x78/0x3c8 [...] [<ffffff800847b790>] __arch_copy_to_user+0x190/0x220 [<ffffff80086f25d8>] ffs_user_copy_worker+0x70/0x130 [<ffffff80080b8c64>] process_one_work+0x1dc/0x460 [<ffffff80080b8f38>] worker_thread+0x50/0x4b0 [<ffffff80080bf5a0>] kthread+0xd8/0xf0 [<ffffff8008083680>] ret_from_fork+0x10/0x50 Address this by placing a set_fs(USER_DS) before of the copy operation and revert it again once the copy operation has finished. This patch is analogous to commit d7ffde3 ("vhost: use USER_DS in vhost_worker thread") which addresses the same underlying issue. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…ckpatch-fixes WARNING: 'lenght' may be misspelled - perhaps 'length'? torvalds#7: things. It was discussed in lenght here, WARNING: line over 80 characters torvalds#252: FILE: drivers/md/dm-crypt.c:2161: + unsigned long pages = (totalram_pages() - totalhigh_pages()) * DM_CRYPT_MEMORY_PERCENT / 100; WARNING: line over 80 characters torvalds#263: FILE: drivers/md/dm-integrity.c:2846: + if (journal_pages >= totalram_pages() - totalhigh_pages() || journal_desc_size > ULONG_MAX) { WARNING: line over 80 characters torvalds#307: FILE: drivers/parisc/ccio-dma.c:1254: + iova_space_size = (u32) (totalram_pages() / count_parisc_driver(&ccio_driver)); WARNING: please, no spaces at the start of a line torvalds#472: FILE: include/linux/highmem.h:47: + atomic_long_inc(&_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#477: FILE: include/linux/highmem.h:52: + atomic_long_dec(&_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#482: FILE: include/linux/highmem.h:57: + atomic_long_add(count, &_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#487: FILE: include/linux/highmem.h:62: + atomic_long_set(&_totalhigh_pages, val);$ WARNING: please, no spaces at the start of a line torvalds#511: FILE: include/linux/mm.h:54: + return (unsigned long)atomic_long_read(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#516: FILE: include/linux/mm.h:59: + atomic_long_inc(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#521: FILE: include/linux/mm.h:64: + atomic_long_dec(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#526: FILE: include/linux/mm.h:69: + atomic_long_add(count, &_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#531: FILE: include/linux/mm.h:74: + atomic_long_set(&_totalram_pages, val);$ WARNING: line over 80 characters torvalds#722: FILE: mm/page_alloc.c:7288: + (physpages - totalram_pages() - totalcma_pages) << (PAGE_SHIFT - 10), WARNING: Missing a blank line after declarations torvalds#745: FILE: mm/shmem.c:118: + unsigned long nr_pages = totalram_pages(); + return min(nr_pages - totalhigh_pages(), nr_pages / 2); total: 0 errors, 15 warnings, 656 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Arun KS <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>

…ckpatch-fixes WARNING: 'lenght' may be misspelled - perhaps 'length'? torvalds#7: things. It was discussed in lenght here, WARNING: line over 80 characters torvalds#252: FILE: drivers/md/dm-crypt.c:2161: + unsigned long pages = (totalram_pages() - totalhigh_pages()) * DM_CRYPT_MEMORY_PERCENT / 100; WARNING: line over 80 characters torvalds#263: FILE: drivers/md/dm-integrity.c:2846: + if (journal_pages >= totalram_pages() - totalhigh_pages() || journal_desc_size > ULONG_MAX) { WARNING: line over 80 characters torvalds#307: FILE: drivers/parisc/ccio-dma.c:1254: + iova_space_size = (u32) (totalram_pages() / count_parisc_driver(&ccio_driver)); WARNING: please, no spaces at the start of a line torvalds#472: FILE: include/linux/highmem.h:47: + atomic_long_inc(&_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#477: FILE: include/linux/highmem.h:52: + atomic_long_dec(&_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#482: FILE: include/linux/highmem.h:57: + atomic_long_add(count, &_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#487: FILE: include/linux/highmem.h:62: + atomic_long_set(&_totalhigh_pages, val);$ WARNING: please, no spaces at the start of a line torvalds#511: FILE: include/linux/mm.h:54: + return (unsigned long)atomic_long_read(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#516: FILE: include/linux/mm.h:59: + atomic_long_inc(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#521: FILE: include/linux/mm.h:64: + atomic_long_dec(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#526: FILE: include/linux/mm.h:69: + atomic_long_add(count, &_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#531: FILE: include/linux/mm.h:74: + atomic_long_set(&_totalram_pages, val);$ WARNING: line over 80 characters torvalds#722: FILE: mm/page_alloc.c:7288: + (physpages - totalram_pages() - totalcma_pages) << (PAGE_SHIFT - 10), WARNING: Missing a blank line after declarations torvalds#745: FILE: mm/shmem.c:118: + unsigned long nr_pages = totalram_pages(); + return min(nr_pages - totalhigh_pages(), nr_pages / 2); total: 0 errors, 15 warnings, 656 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Arun KS <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

…ckpatch-fixes WARNING: 'lenght' may be misspelled - perhaps 'length'? torvalds#7: things. It was discussed in lenght here, WARNING: line over 80 characters torvalds#252: FILE: drivers/md/dm-crypt.c:2161: + unsigned long pages = (totalram_pages() - totalhigh_pages()) * DM_CRYPT_MEMORY_PERCENT / 100; WARNING: line over 80 characters torvalds#263: FILE: drivers/md/dm-integrity.c:2846: + if (journal_pages >= totalram_pages() - totalhigh_pages() || journal_desc_size > ULONG_MAX) { WARNING: line over 80 characters torvalds#307: FILE: drivers/parisc/ccio-dma.c:1254: + iova_space_size = (u32) (totalram_pages() / count_parisc_driver(&ccio_driver)); WARNING: please, no spaces at the start of a line torvalds#472: FILE: include/linux/highmem.h:47: + atomic_long_inc(&_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#477: FILE: include/linux/highmem.h:52: + atomic_long_dec(&_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#482: FILE: include/linux/highmem.h:57: + atomic_long_add(count, &_totalhigh_pages);$ WARNING: please, no spaces at the start of a line torvalds#487: FILE: include/linux/highmem.h:62: + atomic_long_set(&_totalhigh_pages, val);$ WARNING: please, no spaces at the start of a line torvalds#511: FILE: include/linux/mm.h:54: + return (unsigned long)atomic_long_read(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#516: FILE: include/linux/mm.h:59: + atomic_long_inc(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#521: FILE: include/linux/mm.h:64: + atomic_long_dec(&_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#526: FILE: include/linux/mm.h:69: + atomic_long_add(count, &_totalram_pages);$ WARNING: please, no spaces at the start of a line torvalds#531: FILE: include/linux/mm.h:74: + atomic_long_set(&_totalram_pages, val);$ WARNING: line over 80 characters torvalds#722: FILE: mm/page_alloc.c:7288: + (physpages - totalram_pages() - totalcma_pages) << (PAGE_SHIFT - 10), WARNING: Missing a blank line after declarations torvalds#745: FILE: mm/shmem.c:118: + unsigned long nr_pages = totalram_pages(); + return min(nr_pages - totalhigh_pages(), nr_pages / 2); total: 0 errors, 15 warnings, 656 lines checked NOTE: For some of the reported defects, checkpatch may be able to mechanically convert to the typical style using --fix or --fix-inplace. ./patches/mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch has style problems, please review. NOTE: If any of the errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Arun KS <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]>

add PCI device interface and a vfio backend driver

This commit fixes the following checkpatch.pl warnings: WARNING: do not add new typedefs torvalds#47: FILE: hal/HalBtcOutSrc.h:47: +typedef enum _BTC_POWERSAVE_TYPE { WARNING: do not add new typedefs torvalds#54: FILE: hal/HalBtcOutSrc.h:54: +typedef enum _BTC_BT_REG_TYPE { WARNING: do not add new typedefs torvalds#63: FILE: hal/HalBtcOutSrc.h:63: +typedef enum _BTC_CHIP_INTERFACE { WARNING: do not add new typedefs torvalds#71: FILE: hal/HalBtcOutSrc.h:71: +typedef enum _BTC_CHIP_TYPE { WARNING: do not add new typedefs torvalds#81: FILE: hal/HalBtcOutSrc.h:81: +typedef enum _BTC_MSG_TYPE { WARNING: do not add new typedefs torvalds#167: FILE: hal/HalBtcOutSrc.h:167: +typedef struct _BTC_BOARD_INFO { WARNING: do not add new typedefs torvalds#177: FILE: hal/HalBtcOutSrc.h:177: +typedef enum _BTC_DBG_OPCODE { WARNING: do not add new typedefs torvalds#187: FILE: hal/HalBtcOutSrc.h:187: +typedef enum _BTC_RSSI_STATE { WARNING: do not add new typedefs torvalds#200: FILE: hal/HalBtcOutSrc.h:200: +typedef enum _BTC_WIFI_ROLE { WARNING: do not add new typedefs torvalds#208: FILE: hal/HalBtcOutSrc.h:208: +typedef enum _BTC_WIFI_BW_MODE { WARNING: do not add new typedefs torvalds#215: FILE: hal/HalBtcOutSrc.h:215: +typedef enum _BTC_WIFI_TRAFFIC_DIR { WARNING: do not add new typedefs torvalds#221: FILE: hal/HalBtcOutSrc.h:221: +typedef enum _BTC_WIFI_PNP { WARNING: do not add new typedefs torvalds#228: FILE: hal/HalBtcOutSrc.h:228: +typedef enum _BT_WIFI_COEX_STATE { WARNING: do not add new typedefs torvalds#239: FILE: hal/HalBtcOutSrc.h:239: +typedef enum _BTC_GET_TYPE { WARNING: do not add new typedefs torvalds#281: FILE: hal/HalBtcOutSrc.h:281: +typedef enum _BTC_SET_TYPE { WARNING: do not add new typedefs torvalds#321: FILE: hal/HalBtcOutSrc.h:321: +typedef enum _BTC_DBG_DISP_TYPE { WARNING: do not add new typedefs torvalds#328: FILE: hal/HalBtcOutSrc.h:328: +typedef enum _BTC_NOTIFY_TYPE_IPS { WARNING: do not add new typedefs torvalds#334: FILE: hal/HalBtcOutSrc.h:334: +typedef enum _BTC_NOTIFY_TYPE_LPS { WARNING: do not add new typedefs torvalds#340: FILE: hal/HalBtcOutSrc.h:340: +typedef enum _BTC_NOTIFY_TYPE_SCAN { WARNING: do not add new typedefs torvalds#346: FILE: hal/HalBtcOutSrc.h:346: +typedef enum _BTC_NOTIFY_TYPE_ASSOCIATE { WARNING: do not add new typedefs torvalds#352: FILE: hal/HalBtcOutSrc.h:352: +typedef enum _BTC_NOTIFY_TYPE_MEDIA_STATUS { WARNING: do not add new typedefs torvalds#358: FILE: hal/HalBtcOutSrc.h:358: +typedef enum _BTC_NOTIFY_TYPE_SPECIAL_PACKET { WARNING: do not add new typedefs torvalds#366: FILE: hal/HalBtcOutSrc.h:366: +typedef enum _BTC_NOTIFY_TYPE_STACK_OPERATION { WARNING: do not add new typedefs torvalds#374: FILE: hal/HalBtcOutSrc.h:374: +typedef enum _BTC_ANTENNA_POS { WARNING: do not add new typedefs torvalds#412: FILE: hal/HalBtcOutSrc.h:412: +typedef struct _BTC_BT_INFO { WARNING: do not add new typedefs torvalds#440: FILE: hal/HalBtcOutSrc.h:440: +typedef struct _BTC_STACK_INFO { WARNING: do not add new typedefs torvalds#455: FILE: hal/HalBtcOutSrc.h:455: +typedef struct _BTC_BT_LINK_INFO { WARNING: do not add new typedefs torvalds#468: FILE: hal/HalBtcOutSrc.h:468: +typedef struct _BTC_STATISTICS { WARNING: do not add new typedefs torvalds#487: FILE: hal/HalBtcOutSrc.h:487: +typedef struct _BTC_COEXIST { Signed-off-by: Marco Cesati <[email protected]>

This commit fixes the following checkpatch.pl warnings: WARNING: do not add new typedefs torvalds#47: FILE: hal/HalBtcOutSrc.h:47: +typedef enum _BTC_POWERSAVE_TYPE { WARNING: do not add new typedefs torvalds#54: FILE: hal/HalBtcOutSrc.h:54: +typedef enum _BTC_BT_REG_TYPE { WARNING: do not add new typedefs torvalds#63: FILE: hal/HalBtcOutSrc.h:63: +typedef enum _BTC_CHIP_INTERFACE { WARNING: do not add new typedefs torvalds#71: FILE: hal/HalBtcOutSrc.h:71: +typedef enum _BTC_CHIP_TYPE { WARNING: do not add new typedefs torvalds#81: FILE: hal/HalBtcOutSrc.h:81: +typedef enum _BTC_MSG_TYPE { WARNING: do not add new typedefs torvalds#167: FILE: hal/HalBtcOutSrc.h:167: +typedef struct _BTC_BOARD_INFO { WARNING: do not add new typedefs torvalds#177: FILE: hal/HalBtcOutSrc.h:177: +typedef enum _BTC_DBG_OPCODE { WARNING: do not add new typedefs torvalds#187: FILE: hal/HalBtcOutSrc.h:187: +typedef enum _BTC_RSSI_STATE { WARNING: do not add new typedefs torvalds#200: FILE: hal/HalBtcOutSrc.h:200: +typedef enum _BTC_WIFI_ROLE { WARNING: do not add new typedefs torvalds#208: FILE: hal/HalBtcOutSrc.h:208: +typedef enum _BTC_WIFI_BW_MODE { WARNING: do not add new typedefs torvalds#215: FILE: hal/HalBtcOutSrc.h:215: +typedef enum _BTC_WIFI_TRAFFIC_DIR { WARNING: do not add new typedefs torvalds#221: FILE: hal/HalBtcOutSrc.h:221: +typedef enum _BTC_WIFI_PNP { WARNING: do not add new typedefs torvalds#228: FILE: hal/HalBtcOutSrc.h:228: +typedef enum _BT_WIFI_COEX_STATE { WARNING: do not add new typedefs torvalds#239: FILE: hal/HalBtcOutSrc.h:239: +typedef enum _BTC_GET_TYPE { WARNING: do not add new typedefs torvalds#281: FILE: hal/HalBtcOutSrc.h:281: +typedef enum _BTC_SET_TYPE { WARNING: do not add new typedefs torvalds#321: FILE: hal/HalBtcOutSrc.h:321: +typedef enum _BTC_DBG_DISP_TYPE { WARNING: do not add new typedefs torvalds#328: FILE: hal/HalBtcOutSrc.h:328: +typedef enum _BTC_NOTIFY_TYPE_IPS { WARNING: do not add new typedefs torvalds#334: FILE: hal/HalBtcOutSrc.h:334: +typedef enum _BTC_NOTIFY_TYPE_LPS { WARNING: do not add new typedefs torvalds#340: FILE: hal/HalBtcOutSrc.h:340: +typedef enum _BTC_NOTIFY_TYPE_SCAN { WARNING: do not add new typedefs torvalds#346: FILE: hal/HalBtcOutSrc.h:346: +typedef enum _BTC_NOTIFY_TYPE_ASSOCIATE { WARNING: do not add new typedefs torvalds#352: FILE: hal/HalBtcOutSrc.h:352: +typedef enum _BTC_NOTIFY_TYPE_MEDIA_STATUS { WARNING: do not add new typedefs torvalds#358: FILE: hal/HalBtcOutSrc.h:358: +typedef enum _BTC_NOTIFY_TYPE_SPECIAL_PACKET { WARNING: do not add new typedefs torvalds#366: FILE: hal/HalBtcOutSrc.h:366: +typedef enum _BTC_NOTIFY_TYPE_STACK_OPERATION { WARNING: do not add new typedefs torvalds#374: FILE: hal/HalBtcOutSrc.h:374: +typedef enum _BTC_ANTENNA_POS { WARNING: do not add new typedefs torvalds#412: FILE: hal/HalBtcOutSrc.h:412: +typedef struct _BTC_BT_INFO { WARNING: do not add new typedefs torvalds#440: FILE: hal/HalBtcOutSrc.h:440: +typedef struct _BTC_STACK_INFO { WARNING: do not add new typedefs torvalds#455: FILE: hal/HalBtcOutSrc.h:455: +typedef struct _BTC_BT_LINK_INFO { WARNING: do not add new typedefs torvalds#468: FILE: hal/HalBtcOutSrc.h:468: +typedef struct _BTC_STATISTICS { WARNING: do not add new typedefs torvalds#487: FILE: hal/HalBtcOutSrc.h:487: +typedef struct _BTC_COEXIST { Signed-off-by: Marco Cesati <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

rust: kernel: ioctl returns ENOTTY if not implemented

I got a null-ptr-deref report when doing fault injection test: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP KASAN PTI CPU: 2 PID: 925 Comm: 29 Not tainted 5.15.0-rc3-00111-gf5dad42ed4fe-dirty torvalds#487 5b4d17fc3275713934c1a9cb26349fbabf82adbf Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:strcmp+0xc/0x20 Code: 17 48 83 c6 01 44 0f b6 46 ff 48 83 c1 01 44 88 41 ff 45 84 c0 75 e5 c3 c6 01 00 c3 66 90 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 85 RSP: 0018:ffffc900025af368 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff920004b5e6f RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8ebcf680 RDI: 0000000000000000 RBP: ffff888014746000 R08: ffffed102097e3fa R09: ffffed102097e3fa R10: ffff888104bf1fcb R11: ffffed102097e3f9 R12: ffff888014746040 R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880147468c0 FS: 00007f783e6d5500(0000) GS:ffff888104a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000008cee002 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __devm_rtc_register_device.cold.7+0x16a/0x2df ? rtc_suspend+0x330/0x330 ? irqentry_exit+0x32/0x80 ? __sanitizer_cov_trace_pc+0x1d/0x50 ? irqentry_exit+0x32/0x80 ? trace_hardirqs_on+0x63/0x2d0 ? rtc_ktime_to_tm+0x120/0x120 ? tracer_hardirqs_on+0x36/0x530 ? _raw_spin_unlock_irqrestore+0x4b/0x5d ? _raw_spin_unlock_irqrestore+0x54/0x5d ? __sanitizer_cov_trace_pc+0x1d/0x50 ? write_comp_data+0x2a/0x90 ? __sanitizer_cov_trace_pc+0x1d/0x50 rv3029_probe+0x4b1/0x770 [rtc_rv3029c2] ? rv3029_hwmon_show_update_interval+0x160/0x160 [rtc_rv3029c2] ? write_comp_data+0x2a/0x90 ? _raw_spin_unlock_irqrestore+0x4b/0x5d ? tracer_hardirqs_on+0x36/0x530 ? rv3029_nvram_write+0x40/0x40 [rtc_rv3029c2] ? rv3029_set_time+0x350/0x350 [rtc_rv3029c2] ? __sanitizer_cov_trace_pc+0x1d/0x50 rv3029_i2c_probe+0x141/0x180 [rtc_rv3029c2] ? rv3029_probe+0x770/0x770 [rtc_rv3029c2] i2c_device_probe+0xa07/0xbb0 ? i2c_device_match+0x110/0x110 really_probe+0x285/0xc30 If dev_set_name() fails, dev_name() is null, it causes null-ptr-deref, we need check the return value of dev_set_name(). Reported-by: Hulk Robot <[email protected]> Signed-off-by: Yang Yingliang <[email protected]>

Jordy reported issue against XSKMAP which also applies to DEVMAP - the index used for accessing map entry, due to being a signed integer, causes the OOB writes. Fix is simple as changing the type from int to u32, however, when compared to XSKMAP case, one more thing needs to be addressed. When map is released from system via dev_map_free(), we iterate through all of the entries and an iterator variable is also an int, which implies OOB accesses. Again, change it to be u32. Example splat below: [ 160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000 [ 160.731662] #PF: supervisor read access in kernel mode [ 160.736876] #PF: error_code(0x0000) - not-present page [ 160.742095] PGD 0 P4D 0 [ 160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP [ 160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ torvalds#487 [ 160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 160.767642] Workqueue: events_unbound bpf_map_free_deferred [ 160.773308] RIP: 0010:dev_map_free+0x77/0x170 [ 160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff [ 160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202 [ 160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024 [ 160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000 [ 160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001 [ 160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122 [ 160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000 [ 160.838310] FS: 0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000 [ 160.846528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0 [ 160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 160.874092] PKRU: 55555554 [ 160.876847] Call Trace: [ 160.879338] <TASK> [ 160.881477] ? __die+0x20/0x60 [ 160.884586] ? page_fault_oops+0x15a/0x450 [ 160.888746] ? search_extable+0x22/0x30 [ 160.892647] ? search_bpf_extables+0x5f/0x80 [ 160.896988] ? exc_page_fault+0xa9/0x140 [ 160.900973] ? asm_exc_page_fault+0x22/0x30 [ 160.905232] ? dev_map_free+0x77/0x170 [ 160.909043] ? dev_map_free+0x58/0x170 [ 160.912857] bpf_map_free_deferred+0x51/0x90 [ 160.917196] process_one_work+0x142/0x370 [ 160.921272] worker_thread+0x29e/0x3b0 [ 160.925082] ? rescuer_thread+0x4b0/0x4b0 [ 160.929157] kthread+0xd4/0x110 [ 160.932355] ? kthread_park+0x80/0x80 [ 160.936079] ret_from_fork+0x2d/0x50 [ 160.943396] ? kthread_park+0x80/0x80 [ 160.950803] ret_from_fork_asm+0x11/0x20 [ 160.958482] </TASK> Fixes: 546ac1f ("bpf: add devmap, a map for storing net device references") Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]>

Jordy reported issue against XSKMAP which also applies to DEVMAP - the index used for accessing map entry, due to being a signed integer, causes the OOB writes. Fix is simple as changing the type from int to u32, however, when compared to XSKMAP case, one more thing needs to be addressed. When map is released from system via dev_map_free(), we iterate through all of the entries and an iterator variable is also an int, which implies OOB accesses. Again, change it to be u32. Example splat below: [ 160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000 [ 160.731662] #PF: supervisor read access in kernel mode [ 160.736876] #PF: error_code(0x0000) - not-present page [ 160.742095] PGD 0 P4D 0 [ 160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP [ 160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ torvalds#487 [ 160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 160.767642] Workqueue: events_unbound bpf_map_free_deferred [ 160.773308] RIP: 0010:dev_map_free+0x77/0x170 [ 160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff [ 160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202 [ 160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024 [ 160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000 [ 160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001 [ 160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122 [ 160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000 [ 160.838310] FS: 0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000 [ 160.846528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0 [ 160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 160.874092] PKRU: 55555554 [ 160.876847] Call Trace: [ 160.879338] <TASK> [ 160.881477] ? __die+0x20/0x60 [ 160.884586] ? page_fault_oops+0x15a/0x450 [ 160.888746] ? search_extable+0x22/0x30 [ 160.892647] ? search_bpf_extables+0x5f/0x80 [ 160.896988] ? exc_page_fault+0xa9/0x140 [ 160.900973] ? asm_exc_page_fault+0x22/0x30 [ 160.905232] ? dev_map_free+0x77/0x170 [ 160.909043] ? dev_map_free+0x58/0x170 [ 160.912857] bpf_map_free_deferred+0x51/0x90 [ 160.917196] process_one_work+0x142/0x370 [ 160.921272] worker_thread+0x29e/0x3b0 [ 160.925082] ? rescuer_thread+0x4b0/0x4b0 [ 160.929157] kthread+0xd4/0x110 [ 160.932355] ? kthread_park+0x80/0x80 [ 160.936079] ret_from_fork+0x2d/0x50 [ 160.943396] ? kthread_park+0x80/0x80 [ 160.950803] ret_from_fork_asm+0x11/0x20 [ 160.958482] </TASK> Fixes: 546ac1f ("bpf: add devmap, a map for storing net device references") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]>

Jordy reported issue against XSKMAP which also applies to DEVMAP - the index used for accessing map entry, due to being a signed integer, causes the OOB writes. Fix is simple as changing the type from int to u32, however, when compared to XSKMAP case, one more thing needs to be addressed. When map is released from system via dev_map_free(), we iterate through all of the entries and an iterator variable is also an int, which implies OOB accesses. Again, change it to be u32. Example splat below: [ 160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000 [ 160.731662] #PF: supervisor read access in kernel mode [ 160.736876] #PF: error_code(0x0000) - not-present page [ 160.742095] PGD 0 P4D 0 [ 160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP [ 160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ torvalds#487 [ 160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 160.767642] Workqueue: events_unbound bpf_map_free_deferred [ 160.773308] RIP: 0010:dev_map_free+0x77/0x170 [ 160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff [ 160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202 [ 160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024 [ 160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000 [ 160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001 [ 160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122 [ 160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000 [ 160.838310] FS: 0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000 [ 160.846528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0 [ 160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 160.874092] PKRU: 55555554 [ 160.876847] Call Trace: [ 160.879338] <TASK> [ 160.881477] ? __die+0x20/0x60 [ 160.884586] ? page_fault_oops+0x15a/0x450 [ 160.888746] ? search_extable+0x22/0x30 [ 160.892647] ? search_bpf_extables+0x5f/0x80 [ 160.896988] ? exc_page_fault+0xa9/0x140 [ 160.900973] ? asm_exc_page_fault+0x22/0x30 [ 160.905232] ? dev_map_free+0x77/0x170 [ 160.909043] ? dev_map_free+0x58/0x170 [ 160.912857] bpf_map_free_deferred+0x51/0x90 [ 160.917196] process_one_work+0x142/0x370 [ 160.921272] worker_thread+0x29e/0x3b0 [ 160.925082] ? rescuer_thread+0x4b0/0x4b0 [ 160.929157] kthread+0xd4/0x110 [ 160.932355] ? kthread_park+0x80/0x80 [ 160.936079] ret_from_fork+0x2d/0x50 [ 160.943396] ? kthread_park+0x80/0x80 [ 160.950803] ret_from_fork_asm+0x11/0x20 [ 160.958482] </TASK> Fixes: 546ac1f ("bpf: add devmap, a map for storing net device references") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

d9j approved these changes Jan 22, 2018

View reviewed changes

upa pushed a commit to upa/linux that referenced this pull request Nov 9, 2020

Merge pull request torvalds#487 from liva/x86

1844b34

add PCI device interface and a vfio backend driver

nbdd0121 pushed a commit to nbdd0121/linux that referenced this pull request Aug 24, 2021

Merge pull request torvalds#487 from ojeda/enotty

c4d17fc

rust: kernel: ioctl returns ENOTTY if not implemented

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

up (#2) #487

up (#2) #487

addstone commented Nov 12, 2017

KernelPRBot commented Nov 12, 2017

Xaleth commented Nov 12, 2017

up (#2) #487

Are you sure you want to change the base?

up (#2) #487

Conversation

addstone commented Nov 12, 2017

ethtool --ncsi eth0 swstats

RESPONSE OK TIMEOUT ERROR

ethtool --ncsi eth0 swstats

RESPONSE OK TIMEOUT ERROR

KernelPRBot commented Nov 12, 2017

How do I format my contribution?

Who do I send my contribution to?

How do I send my contribution?

How do I get help if I'm stuck?

I sent my patch - now what?

Further information

Xaleth commented Nov 12, 2017