Optimized kernel config for better general kernel support and saner defaults #115

WSLUser · 2020-06-01T20:14:20Z

This configuration was originally based on Clear Linux's kernel config options toned down for general kernel usage and enabled a number of scenarios some of which that are addressed in other simpler PRs. This should make it easier to troubleshoot the Hyper-V side for bringing support to some features. USBIP has been enabled as this is a known working scenario with use of extra software installed on both Windows and WSL2. Adds ExFAT, NTFS, and IPv6 support (still needs work on the VM side). This is part of the resolution to https://github.com/microsoft/WSL2-Linux-Kernel/issues/114. attn: @craigloewen-msft and @sashalevin

Update: Fixed OP to reflect current config changes.

This brings in the optimizations from the Clear Linux kernel config file. This also requires the kernel itself to be switched to the https://github.com/clearlinux-pkgs/linux-hyperv kernel as they have .patches applied during compilations for the optimization beyond the config. Tried to align as close to the original config file by ensuring all options that were previously enabled were still enabled. This also brings better USB, IPv6, and other support within the kernel that was previously lacking. This should make it easier to troubleshoot the Hyper-V side for bringing support for these features.

tycho

First of all, I do like some of the proposed changes (in particular enabling NO_HZ, VIRT_CPU_ACCOUNTING, NUMA, the BBR TCP congestion control, etc are all very good things). But there are a few problems in here.

First, the WSL kernel is monolithic. It supports loadable modules in theory (CONFIG_MODULES=y), but in practice nothing is built as a module in the current WSL kernel:

$ zgrep =m /proc/config.gz
$

This is mainly because Microsoft doesn't currently provide any way to expose loadable kernel module objects to the guests. Eventually they could introduce loadable module support by mounting a vhdx with modules in it at /lib/modules/$(uname -r) (probably a read-only disk image, or snapshotted and copy-on-write from a read-only source to allow for external modules to be added).

But for now, anything defined as =m would not be loadable in a WSL guest. And you need to be very conservative with what you enable with =y because that module will effectively be loaded permanently and consume more resources even when not in use. In some cases, that's probably harmless, but there's no point in keeping code loaded that won't ever be run.

That brings me to my second point: there's a lot of physical hardware being enabled in this PR which shouldn't be. I comment on some of these individually, but there are some others I didn't call out. Some of it just won't ever appear on a Hyper-V guest, and even fewer are likely to ever be implemented for WSL.

WSL2 is basically a Hyper-V "Generation 2" guest, so it does everything through the Hyper-V bus (as opposed to USB/ISA/PCI/PCIe). The Hyper-V device bus consists entirely of paravirtual devices, including drivers for storage, keyboard/mouse, framebuffer, ballooning, networking, etc. So enabling a bunch of physical hardware drivers doesn't enable anything new. This is different from other hypervisors like KVM or Xen HVM, which both use an emulated PCI bus for compatibility plus emulated or paravirtual PCI devices for the critical paths (e.g. see the virtio or xen frontend drivers).

In your PR you explicitly call out "better USB [...] support", but as configured this won't give you anything useful yet. The guest doesn't have an emulated USB host controller, and given how Microsoft has implemented other hardware in Hyper-V, it's most likely that any USB host controller they implement would be a new device attached to the Hyper-V device bus (i.e. not EHCI or xHCI, but a new Hyper-V specific HCI) -- but such a driver doesn't exist yet. There's no harm in enabling CONFIG_USB but drivers like CONFIG_USB_XHCI_HCD and CONFIG_USB_EHCI_HCD wouldn't do anything.

One option to attach a USB device to a Hyper-V guest right now would be a virtual USB host controller using USB/IP, for which you'd want to enable CONFIG_USBIP_CORE and USB_VHCI_HCD. In order for those to be useful, you would need a USB/IP host driver on the Windows host and assign devices to it or you would need a hardware USB/IP hub on your network. Another option to add USB support would be a userspace libusb proxy that talks to the Windows host, but AFAIK nobody's working on that right now -- I could be wrong.

Microsoft/config-wsl

WSLUser · 2020-06-24T00:02:52Z

Thanks for the review. I actually plan on toning this down a bit so it could be used with a normal kernel. Getting your feedback is especially helpful. I would assume Clear Linux kernel devs have a reason for things being as they are but as you said, some stuff won't be applicable in WSL2 though I do prefer to future proof any changes made on the Hyper-V side if possible. Also I thought the plan was to shift to stable. If you check the Release page, there are archives for the 5.6 kernel. That's going EOL in favor of 5.7. I'm not sure what the hold up is but I'm assuming it's due to this kernel being used by other teams in MS for things other than simply WSL2 (Azure Sphere could be using it for all we know)

tycho · 2020-06-24T00:09:19Z

The v5.6 stuff you see on the releases page is just automatically generated from the upstream Linux kernel tags. Looks like the last time they pushed tags from Linus' tree was around v5.6-rc3 and doesn't necessarily reflect their intent to move to the v5.6 series -- the ones to pay attention to are the ones containing "microsoft" or "msft", but they probably won't push any new tags like those until it's already released in production (they're already behind pushing release tags for WSL kernels as it is, too).

WSLUser · 2020-06-24T00:36:09Z

They're not pushing from master unfortunately. I'm betting that's what's used by other teams. Maybe once the DirectX code is fleshed out some more, they'll submit upstream and bump versions then.

miketheitguy · 2020-06-24T05:39:45Z

I'd strongly prefer to see actual performance data for making the changes you're recommending here beyond "it's used by Clear Linux". But that's just my personal preference.

Addressed PR feedback.

WSLUser · 2020-07-13T15:01:32Z

@tycho config is updated. Can you give a fresh look (and dismiss stale review)?

Got removed for some reason

WSLUser · 2020-07-13T18:27:06Z

So I can confirm there's less RAM usage and more CPU usage instead. Also can confirm trying to build this on FedoraRemixforWSL acts like regular Fedora for kernel compilations and strips stuff out that's necessary to networking. Using this config on Clear Linux allows updates to occur. I did have to manually readd CONFIG_HYPERV_TSCPAGE=y after running make prepare scripts on Clear Linux. I ran it again with it and then did a fresh compilation. I checked if it kept that config option and it was removed. I can only deduce it was superceded by another config option, possibly CONFIG_HYPERV_TIMER=y which is a new addition to the config or became obsolete.

tycho · 2020-07-13T19:10:42Z

You can run make menuconfig and figure out the dependencies for a config option (in menuconfig, hit /, type HYPERV_TSCPAGE and hit enter, and it'll show you what it needs).

WSLUser · 2020-07-13T19:33:58Z

so the option is indeed gone from that kernel.

microhobby · 2020-07-14T15:22:59Z

I tried to use a build with your config and Linux does not even load:

The RPC server is unavailable.

[process exited with code 4294967295]

Could you confirm which version of Windows you are using?

WSLUser · 2020-07-14T16:23:23Z

19041 and this currently requires the 5.7.8 kernel as specified on the config file. When this repo switches to 5.4, unsupported options will be removed. For now this is mostly PoC. There may be more changes needed.

Allows full performance analysis and network traffic control with bcc: https://github.com/iovisor/bcc

microhobby · 2020-07-19T03:42:38Z

These were the guys I was waiting for: WSLUser@ddedd2f

Now I can use your .config

microhobby · 2020-07-19T03:44:24Z

@WSLUser what tools are you using to benchmark these changes?

microhobby · 2020-07-19T04:23:03Z

About the WSLUser@ddedd2f the CONFIG_DM_MULTIPATH is not needed for solve the 9P issue.
Are you using it for any other purpose?

WSLUser · 2020-07-19T16:59:27Z

Nothing yet as I have a few more things to enable but fortunately Clear Linux provides bundles that run many of the same tests used by Phoronix. It does make a noticeable difference though if you watch Task Manager even upon "bootup". I set my RAM to 4 GB and allowed 3 cores for CPU and does decently well. Some config options such as io_uring specifically were created for boosting perf. As far as DM_Multipath goes, I was informed by @nunix that it was needed for Ubuntu. Not sure if other distros affected but if he says it's needed when he conducts his demos, I'm going to take his word for it.

miketheitguy · 2020-07-19T17:44:21Z

WSLUser,

No offense, but I’m not entirely sure you know what you’re doing or getting into. Could you just fork this project into your own repo, make your own changes, test them, and then do another PR based on empirical data that explains why you think these options should be enabled? From what I’m gathering based on your PR and comments, this all effectively amounts to “Clear Linux does it so it’s good and should be done.”

I’m not knocking on the value of making changes for specific types of workloads; but I really do recommend significantly more actual testing and data with every change you make and its impact on the WSL2 Guest and Host.

WSLUser · 2020-07-19T18:35:53Z

I simply want to enable a better WSL2 experience. I am by no means a kernel expert. As far as testing goes, I have done some manual testing to prevent issues from occurring. My main focus is simply optimizing everything and preventing new issues from arising. I have done some research in order to do this and will continue to research as things continue along. I already have a fork and thats where this PR stems from. Ideally, I'll create a optimized kernel to match. As far as I'm aware, this config is as good as can be and simply needs review from kernel experts. Otherwise its just a matter of updating for every new kernel release. I only plan on doing that when I update my kernel, which isn't always going to be be when a new release is out.

miketheitguy · 2020-07-19T19:33:45Z

Yes. But I’m saying you really should quantify “a better experience.” Run some compiles, do some python benchmarking, do some network tests. Before tweaking changes like this you should understand what you’re doing.

Testing ought to be objective and reproducible. You can’t really go poking around making changes you don’t understand because some random Linux distribution does it.

I’d strongly recommend putting together a testing suite for the changes you are making in your fork.

* f2fs: fix NULL pointer dereference in f2fs_write_begin() [ Upstream commit 62f63eea291b50a5677ae7503ac128803174698a ] BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:f2fs_write_begin+0x823/0xb90 [f2fs] Call Trace: f2fs_quota_write+0x139/0x1d0 [f2fs] write_blk+0x36/0x80 [quota_tree] get_free_dqblk+0x42/0xa0 [quota_tree] do_insert_tree+0x235/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] qtree_write_dquot+0x70/0x190 [quota_tree] v2_write_dquot+0x43/0x90 [quota_v2] dquot_acquire+0x77/0x100 f2fs_dquot_acquire+0x2f/0x60 [f2fs] dqget+0x310/0x450 dquot_transfer+0x7e/0x120 f2fs_setattr+0x11a/0x4a0 [f2fs] notify_change+0x349/0x480 chown_common+0x168/0x1c0 do_fchownat+0xbc/0xf0 __x64_sys_fchownat+0x20/0x30 do_syscall_64+0x5f/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Passing fsdata parameter to .write_{begin,end} in f2fs_quota_write(), so that if quota file is compressed one, we can avoid above NULL pointer dereference when updating quota content. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * drm/vc4: Fix HDMI mode validation [ Upstream commit b1e7396a1d0e6af6806337fdaaa44098d6b3343c ] Current mode validation impedes setting up some video modes which should be supported otherwise. Namely 1920x1200@60Hz. Fix this by lowering the minimum HDMI state machine clock to pixel clock ratio allowed. Fixes: 32e823c63e90 ("drm/vc4: Reject HDMI modes with too high of clocks.") Reported-by: Stefan Wahren <[email protected]> Suggested-by: Dave Stevenson <[email protected]> Signed-off-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Sasha Levin <[email protected]> * iommu/vt-d: Fix mm reference leak [ Upstream commit 902baf61adf6b187f0a6b789e70d788ea71ff5bc ] Move canonical address check before mmget_not_zero() to avoid mm reference leak. Fixes: 9d8c3af31607 ("iommu/vt-d: IOMMU Page Request needs to check if address is canonical.") Signed-off-by: Jacob Pan <[email protected]> Acked-by: Lu Baolu <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ext2: fix empty body warnings when -Wextra is used [ Upstream commit 44a52022e7f15cbaab957df1c14f7a4f527ef7cf ] When EXT2_ATTR_DEBUG is not defined, modify the 2 debug macros to use the no_printk() macro instead of <nothing>. This fixes gcc warnings when -Wextra is used: ../fs/ext2/xattr.c:252:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:258:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:330:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] ../fs/ext2/xattr.c:872:45: warning: suggest braces around empty body in an ‘else’ statement [-Wempty-body] I have verified that the only object code change (with gcc 7.5.0) is the reversal of some instructions from 'cmp a,b' to 'cmp b,a'. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Jan Kara <[email protected]> Cc: [email protected] Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ext2: fix debug reference to ext2_xattr_cache [ Upstream commit 32302085a8d90859c40cf1a5e8313f575d06ec75 ] Fix a debug-only build error in ext2/xattr.c: When building without extra debugging, (and with another patch that uses no_printk() instead of <empty> for the ext2-xattr debug-print macros, this build error happens: ../fs/ext2/xattr.c: In function ‘ext2_xattr_cache_insert’: ../fs/ext2/xattr.c:869:18: error: ‘ext2_xattr_cache’ undeclared (first use in this function); did you mean ‘ext2_xattr_list’? atomic_read(&ext2_xattr_cache->c_entry_count)); Fix the problem by removing cached entry count from the debug message since otherwise we'd have to export the mbcache structure just for that. Fixes: be0726d33cb8 ("ext2: convert to mbcache2") Reported-by: Randy Dunlap <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * power: supply: axp288_fuel_gauge: Broaden vendor check for Intel Compute Sticks. [ Upstream commit e42fe5b29ac07210297e75f36deefe54edbdbf80 ] The Intel Compute Stick `STK1A32SC` can have a system vendor of "Intel(R) Client Systems". Broaden the Intel Compute Stick DMI checks so that they match "Intel Corporation" as well as "Intel(R) Client Systems". This fixes an issue where the STK1A32SC compute sticks were still exposing a battery with the existing blacklist entry. Signed-off-by: Jeffery Miller <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * libnvdimm: Out of bounds read in __nd_ioctl() [ Upstream commit f84afbdd3a9e5e10633695677b95422572f920dc ] The "cmd" comes from the user and it can be up to 255. It it's more than the number of bits in long, it results out of bounds read when we check test_bit(cmd, &cmd_mask). The highest valid value for "cmd" is ND_CMD_CALL (10) so I added a compare against that. Fixes: 62232e45f4a2 ("libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices") Signed-off-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * iommu/amd: Fix the configuration of GCR3 table root pointer [ Upstream commit c20f36534666e37858a14e591114d93cc1be0d34 ] The SPA of the GCR3 table root pointer[51:31] masks 20 bits. However, this requires 21 bits (Please see the AMD IOMMU specification). This leads to the potential failure when the bit 51 of SPA of the GCR3 table root pointer is 1'. Signed-off-by: Adrian Huang <[email protected]> Fixes: 52815b75682e2 ("iommu/amd: Add support for IOMMUv2 domain mode") Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * f2fs: fix to wait all node page writeback [ Upstream commit dc5a941223edd803f476a153abd950cc3a83c3e1 ] There is a race condition that we may miss to wait for all node pages writeback, fix it. - fsync() - shrink - f2fs_do_sync_file - __write_node_page - set_page_writeback(page#0) : remove DIRTY/TOWRITE flag - f2fs_fsync_node_pages : won't find page #0 as TOWRITE flag was removeD - f2fs_wait_on_node_pages_writeback : wont' wait page #0 writeback as it was not in fsync_node_list list. - f2fs_add_fsync_node_entry Fixes: 50fa53eccf9f ("f2fs: fix to avoid broken of dnode block list") Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * net: dsa: bcm_sf2: Fix overflow checks commit d0802dc411f469569a537283b6f3833af47aece9 upstream. Commit f949a12fd697 ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc") tried to fix the some user controlled buffer overflows in bcm_sf2_cfp_rule_set() and bcm_sf2_cfp_rule_del() but the fix was using CFP_NUM_RULES, which while it is correct not to overflow the bitmaps, is not representative of what the device actually supports. Correct that by using bcm_sf2_cfp_rule_size() instead. The latter subtracts the number of rules by 1, so change the checks from greater than or equal to greater than accordingly. Fixes: f949a12fd697 ("net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * fbdev: potential information leak in do_fb_ioctl() commit d3d19d6fc5736a798b118971935ce274f7deaa82 upstream. The "fix" struct has a 2 byte hole after ->ywrapstep and the "fix = info->fix;" assignment doesn't necessarily clear it. It depends on the compiler. The solution is just to replace the assignment with an memcpy(). Fixes: 1f5e31d7e55a ("fbmem: don't call copy_from/to_user() with mutex held") Signed-off-by: Dan Carpenter <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Andrea Righi <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Sam Ravnborg <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Daniel Thompson <[email protected]> Cc: Peter Rosin <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Gerd Hoffmann <[email protected]> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> * iio: si1133: read 24-bit signed integer for measurement commit 328b50e9a0ad1fe8accdf8c19923deebab5e0c01 upstream. The chip is configured in 24 bit mode. The values read from it must always be treated as is. This fixes the issue by replacing the previous 16 bits value by a 24 bits buffer. This changes affects the value output by previous version of the driver, since the least significant byte was missing. The upper half of 16 bit values previously output are now the upper half of a 24 bit value. Fixes: e01e7eaf37d8 ("iio: light: introduce si1133") Reported-by: Simon Goyette <[email protected]> Co-authored-by: Guillaume Champagne <[email protected]> Signed-off-by: Maxime Roussin-Bélanger <[email protected]> Signed-off-by: Guillaume Champagne <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * tty: evh_bytechan: Fix out of bounds accesses commit 3670664b5da555a2a481449b3baafff113b0ac35 upstream. ev_byte_channel_send() assumes that its third argument is a 16 byte array. Some places where it is called it may not be (or we can't easily tell if it is). Newer compilers have started producing warnings about this, so make sure we actually pass a 16 byte array. There may be more elegant solutions to this, but the driver is quite old and hasn't been updated in many years. The warnings (from a powerpc allyesconfig build) are: In file included from include/linux/byteorder/big_endian.h:5, from arch/powerpc/include/uapi/asm/byteorder.h:14, from include/asm-generic/bitops/le.h:6, from arch/powerpc/include/asm/bitops.h:250, from include/linux/bitops.h:29, from include/linux/kernel.h:12, from include/asm-generic/bug.h:19, from arch/powerpc/include/asm/bug.h:109, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/gfp.h:5, from include/linux/slab.h:15, from drivers/tty/ehv_bytechan.c:24: drivers/tty/ehv_bytechan.c: In function ‘ehv_bc_udbg_putc’: arch/powerpc/include/asm/epapr_hcalls.h:298:20: warning: array subscript 1 is outside array bounds of ‘const char[1]’ [-Warray-bounds] 298 | r6 = be32_to_cpu(p[1]); include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of macro ‘__be32_to_cpu’ 40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x)) | ^ arch/powerpc/include/asm/epapr_hcalls.h:298:7: note: in expansion of macro ‘be32_to_cpu’ 298 | r6 = be32_to_cpu(p[1]); | ^~~~~~~~~~~ drivers/tty/ehv_bytechan.c:166:13: note: while referencing ‘data’ 166 | static void ehv_bc_udbg_putc(char c) | ^~~~~~~~~~~~~~~~ Fixes: dcd83aaff1c8 ("tty/powerpc: introduce the ePAPR embedded hypervisor byte channel driver") Signed-off-by: Stephen Rothwell <[email protected]> Tested-by: Laurentiu Tudor <[email protected]> [mpe: Trim warnings from change log] Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> * locktorture: Print ratio of acquisitions, not failures commit 80c503e0e68fbe271680ab48f0fe29bc034b01b7 upstream. The __torture_print_stats() function in locktorture.c carefully initializes local variable "min" to statp[0].n_lock_acquired, but then compares it to statp[i].n_lock_fail. Given that the .n_lock_fail field should normally be zero, and given the initialization, it seems reasonable to display the maximum and minimum number acquisitions instead of miscomputing the maximum and minimum number of failures. This commit therefore switches from failures to acquisitions. And this turns out to be not only a day-zero bug, but entirely my own fault. I hate it when that happens! Fixes: 0af3fe1efa53 ("locktorture: Add a lock-torture kernel module") Reported-by: Will Deacon <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * mtd: spinand: Explicitly use MTD_OPS_RAW to write the bad block marker to OOB commit 621a7b780bd8b7054647d53d5071961f2c9e0873 upstream. When writing the bad block marker to the OOB area the access mode should be set to MTD_OPS_RAW as it is done for reading the marker. Currently this only works because req.mode is initialized to MTD_OPS_PLACE_OOB (0) and spinand_write_to_cache_op() checks for req.mode != MTD_OPS_AUTO_OOB. Fix this by explicitly setting req.mode to MTD_OPS_RAW. Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs") Signed-off-by: Frieder Schrempf <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/linux-mtd/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> * mtd: lpddr: Fix a double free in probe() commit 4da0ea71ea934af18db4c63396ba2af1a679ef02 upstream. This function is only called from lpddr_probe(). We free "lpddr" both here and in the caller, so it's a double free. The best place to free "lpddr" is in lpddr_probe() so let's delete this one. Fixes: 8dc004395d5e ("[MTD] LPDDR qinfo probing.") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/linux-mtd/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> * mtd: phram: fix a double free issue in error path commit 49c64df880570034308e4a9a49c4bc95cf8cdb33 upstream. The variable 'name' is released multiple times in the error path, which may cause double free issues. This problem is avoided by adding a goto label to release the memory uniformly. And this change also makes the code a bit more cleaner. Fixes: 4f678a58d335 ("mtd: fix memory leaks in phram_setup") Signed-off-by: Wen Yang <[email protected]> Cc: Joern Engel <[email protected]> Cc: Miquel Raynal <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Vignesh Raghavendra <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Miquel Raynal <[email protected]> Link: https://lore.kernel.org/linux-mtd/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> * KEYS: Don't write out to userspace while holding key semaphore commit d3ec10aa95819bff18a0d936b18884c7816d0914 upstream. A lockdep circular locking dependency report was seen when running a keyutils test: [12537.027242] ====================================================== [12537.059309] WARNING: possible circular locking dependency detected [12537.088148] 4.18.0-147.7.1.el8_1.x86_64+debug #1 Tainted: G OE --------- - - [12537.125253] ------------------------------------------------------ [12537.153189] keyctl/25598 is trying to acquire lock: [12537.175087] 000000007c39f96c (&mm->mmap_sem){++++}, at: __might_fault+0xc4/0x1b0 [12537.208365] [12537.208365] but task is already holding lock: [12537.234507] 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12537.270476] [12537.270476] which lock already depends on the new lock. [12537.270476] [12537.307209] [12537.307209] the existing dependency chain (in reverse order) is: [12537.340754] [12537.340754] -> #3 (&type->lock_class){++++}: [12537.367434] down_write+0x4d/0x110 [12537.385202] __key_link_begin+0x87/0x280 [12537.405232] request_key_and_link+0x483/0xf70 [12537.427221] request_key+0x3c/0x80 [12537.444839] dns_query+0x1db/0x5a5 [dns_resolver] [12537.468445] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.496731] cifs_reconnect+0xe04/0x2500 [cifs] [12537.519418] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.546263] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.573551] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.601045] kthread+0x30c/0x3d0 [12537.617906] ret_from_fork+0x3a/0x50 [12537.636225] [12537.636225] -> #2 (root_key_user.cons_lock){+.+.}: [12537.664525] __mutex_lock+0x105/0x11f0 [12537.683734] request_key_and_link+0x35a/0xf70 [12537.705640] request_key+0x3c/0x80 [12537.723304] dns_query+0x1db/0x5a5 [dns_resolver] [12537.746773] dns_resolve_server_name_to_ip+0x1e1/0x4d0 [cifs] [12537.775607] cifs_reconnect+0xe04/0x2500 [cifs] [12537.798322] cifs_readv_from_socket+0x461/0x690 [cifs] [12537.823369] cifs_read_from_socket+0xa0/0xe0 [cifs] [12537.847262] cifs_demultiplex_thread+0x311/0x2db0 [cifs] [12537.873477] kthread+0x30c/0x3d0 [12537.890281] ret_from_fork+0x3a/0x50 [12537.908649] [12537.908649] -> #1 (&tcp_ses->srv_mutex){+.+.}: [12537.935225] __mutex_lock+0x105/0x11f0 [12537.954450] cifs_call_async+0x102/0x7f0 [cifs] [12537.977250] smb2_async_readv+0x6c3/0xc90 [cifs] [12538.000659] cifs_readpages+0x120a/0x1e50 [cifs] [12538.023920] read_pages+0xf5/0x560 [12538.041583] __do_page_cache_readahead+0x41d/0x4b0 [12538.067047] ondemand_readahead+0x44c/0xc10 [12538.092069] filemap_fault+0xec1/0x1830 [12538.111637] __do_fault+0x82/0x260 [12538.129216] do_fault+0x419/0xfb0 [12538.146390] __handle_mm_fault+0x862/0xdf0 [12538.167408] handle_mm_fault+0x154/0x550 [12538.187401] __do_page_fault+0x42f/0xa60 [12538.207395] do_page_fault+0x38/0x5e0 [12538.225777] page_fault+0x1e/0x30 [12538.243010] [12538.243010] -> #0 (&mm->mmap_sem){++++}: [12538.267875] lock_acquire+0x14c/0x420 [12538.286848] __might_fault+0x119/0x1b0 [12538.306006] keyring_read_iterator+0x7e/0x170 [12538.327936] assoc_array_subtree_iterate+0x97/0x280 [12538.352154] keyring_read+0xe9/0x110 [12538.370558] keyctl_read_key+0x1b9/0x220 [12538.391470] do_syscall_64+0xa5/0x4b0 [12538.410511] entry_SYSCALL_64_after_hwframe+0x6a/0xdf [12538.435535] [12538.435535] other info that might help us debug this: [12538.435535] [12538.472829] Chain exists of: [12538.472829] &mm->mmap_sem --> root_key_user.cons_lock --> &type->lock_class [12538.472829] [12538.524820] Possible unsafe locking scenario: [12538.524820] [12538.551431] CPU0 CPU1 [12538.572654] ---- ---- [12538.595865] lock(&type->lock_class); [12538.613737] lock(root_key_user.cons_lock); [12538.644234] lock(&type->lock_class); [12538.672410] lock(&mm->mmap_sem); [12538.687758] [12538.687758] *** DEADLOCK *** [12538.687758] [12538.714455] 1 lock held by keyctl/25598: [12538.732097] #0: 000000003de5b58d (&type->lock_class){++++}, at: keyctl_read_key+0x15a/0x220 [12538.770573] [12538.770573] stack backtrace: [12538.790136] CPU: 2 PID: 25598 Comm: keyctl Kdump: loaded Tainted: G [12538.844855] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 12/27/2015 [12538.881963] Call Trace: [12538.892897] dump_stack+0x9a/0xf0 [12538.907908] print_circular_bug.isra.25.cold.50+0x1bc/0x279 [12538.932891] ? save_trace+0xd6/0x250 [12538.948979] check_prev_add.constprop.32+0xc36/0x14f0 [12538.971643] ? keyring_compare_object+0x104/0x190 [12538.992738] ? check_usage+0x550/0x550 [12539.009845] ? sched_clock+0x5/0x10 [12539.025484] ? sched_clock_cpu+0x18/0x1e0 [12539.043555] __lock_acquire+0x1f12/0x38d0 [12539.061551] ? trace_hardirqs_on+0x10/0x10 [12539.080554] lock_acquire+0x14c/0x420 [12539.100330] ? __might_fault+0xc4/0x1b0 [12539.119079] __might_fault+0x119/0x1b0 [12539.135869] ? __might_fault+0xc4/0x1b0 [12539.153234] keyring_read_iterator+0x7e/0x170 [12539.172787] ? keyring_read+0x110/0x110 [12539.190059] assoc_array_subtree_iterate+0x97/0x280 [12539.211526] keyring_read+0xe9/0x110 [12539.227561] ? keyring_gc_check_iterator+0xc0/0xc0 [12539.249076] keyctl_read_key+0x1b9/0x220 [12539.266660] do_syscall_64+0xa5/0x4b0 [12539.283091] entry_SYSCALL_64_after_hwframe+0x6a/0xdf One way to prevent this deadlock scenario from happening is to not allow writing to userspace while holding the key semaphore. Instead, an internal buffer is allocated for getting the keys out from the read method first before copying them out to userspace without holding the lock. That requires taking out the __user modifier from all the relevant read methods as well as additional changes to not use any userspace write helpers. That is, 1) The put_user() call is replaced by a direct copy. 2) The copy_to_user() call is replaced by memcpy(). 3) All the fault handling code is removed. Compiling on a x86-64 system, the size of the rxrpc_read() function is reduced from 3795 bytes to 2384 bytes with this patch. Fixes: ^1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * bpf: fix buggy r0 retval refinement for tracing helpers [ no upstream commit ] See the glory details in 100605035e15 ("bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly") for why 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") is buggy. The whole series however is not suitable for stable since it adds significant amount [0] of verifier complexity in order to add 32bit subreg tracking. Something simpler is needed. Unfortunately, reverting 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") or just cherry-picking 100605035e15 ("bpf: Verifier, do_refine_retval_range may clamp umin to 0 incorrectly") is not an option since it will break existing tracing programs badly (at least those that are using bpf_get_stack() and bpf_probe_read_str() helpers). Not fixing it in stable is also not an option since on 4.19 kernels an error will cause a soft-lockup due to hitting dead-code sanitized branch since we don't hard-wire such branches in old kernels yet. But even then for 5.x 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") would cause wrong bounds on the verifier simluation when an error is hit. In one of the earlier iterations of mentioned patch series for upstream there was the concern that just using smax_value in do_refine_retval_range() would nuke bounds by subsequent <<32 >>32 shifts before the comparison against 0 [1] which eventually led to the 32bit subreg tracking in the first place. While I initially went for implementing the idea [1] to pattern match the two shift operations, it turned out to be more complex than actually needed, meaning, we could simply treat do_refine_retval_range() similarly to how we branch off verification for conditionals or under speculation, that is, pushing a new reg state to the stack for later verification. This means, instead of verifying the current path with the ret_reg in [S32MIN, msize_max_value] interval where later bounds would get nuked, we split this into two: i) for the success case where ret_reg can be in [0, msize_max_value], and ii) for the error case with ret_reg known to be in interval [S32MIN, -1]. Latter will preserve the bounds during these shift patterns and can match reg < 0 test. test_progs also succeed with this approach. [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/ [1] https://lore.kernel.org/bpf/158015334199.28573.4940395881683556537.stgit@john-XPS-13-9370/T/#m2e0ad1d5949131014748b6daa48a3495e7f0456d Fixes: 849fa50662fb ("bpf/verifier: refine retval R0 state for bpf_get_stack helper") Reported-by: Lorenzo Fontana <[email protected]> Reported-by: Leonardo Di Donato <[email protected]> Reported-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Tested-by: John Fastabend <[email protected]> Tested-by: Lorenzo Fontana <[email protected]> Tested-by: Leonardo Di Donato <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * Linux 4.19.118 * ext4: fix extent_status fragmentation for plain files Extents are cached in read_extent_tree_block(); as a result, extents are not cached for inodes with depth == 0 when we try to find the extent using ext4_find_extent(). The result of the lookup is cached in ext4_map_blocks() but is only a subset of the extent on disk. As a result, the contents of extents status cache can get very badly fragmented for certain workloads, such as a random 4k read workload. File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 8191: 40960.. 49151: 8192: last,eof $ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test $ perf script | grep ext4_es_insert_extent | head -n 10 fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W Fix this by caching the extents for inodes with depth == 0 in ext4_find_extent(). [ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this newly added function is not in extents_cache.c, and to avoid potential visual confusion with ext4_es_cache_extent(). -TYT ] Signed-off-by: Dmitry Monakhov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ext4: fix extent_status fragmentation for plain files [ Upstream commit 4068664e3cd2312610ceac05b74c4cf1853b8325 ] Extents are cached in read_extent_tree_block(); as a result, extents are not cached for inodes with depth == 0 when we try to find the extent using ext4_find_extent(). The result of the lookup is cached in ext4_map_blocks() but is only a subset of the extent on disk. As a result, the contents of extents status cache can get very badly fragmented for certain workloads, such as a random 4k read workload. File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 8191: 40960.. 49151: 8192: last,eof $ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test $ perf script | grep ext4_es_insert_extent | head -n 10 fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W Fix this by caching the extents for inodes with depth == 0 in ext4_find_extent(). [ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this newly added function is not in extents_cache.c, and to avoid potential visual confusion with ext4_es_cache_extent(). -TYT ] Signed-off-by: Dmitry Monakhov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * drm/msm: Use the correct dma_sync calls harder commit 9f614197c744002f9968e82c649fdf7fe778e1e7 upstream. Looks like the dma_sync calls don't do what we want on armv7 either. Fixes: Unable to handle kernel paging request at virtual address 50001000 pgd = (ptrval) [50001000] *pgd=00000000 Internal error: Oops: 805 [#1] SMP ARM Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc6-00271-g9f159ae07f07 #4 Hardware name: Freescale i.MX53 (Device Tree Support) PC is at v7_dma_clean_range+0x20/0x38 LR is at __dma_page_cpu_to_dev+0x28/0x90 pc : [<c011c76c>] lr : [<c01181c4>] psr: 20000013 sp : d80b5a88 ip : de96c000 fp : d840ce6c r10: 00000000 r9 : 00000001 r8 : d843e010 r7 : 00000000 r6 : 00008000 r5 : ddb6c000 r4 : 00000000 r3 : 0000003f r2 : 00000040 r1 : 50008000 r0 : 50001000 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 70004019 DAC: 00000051 Process swapper/0 (pid: 1, stack limit = 0x(ptrval)) Signed-off-by: Rob Clark <[email protected]> Fixes: 3de433c5b38a ("drm/msm: Use the correct dma_sync calls in msm_gem") Tested-by: Fabio Estevam <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * bpftool: Fix printing incorrect pointer in btf_dump_ptr commit 555089fdfc37ad65e0ee9b42ca40c238ff546f83 upstream. For plain text output, it incorrectly prints the pointer value "void *data". The "void *data" is actually pointing to memory that contains a bpf-map's value. The intention is to print the content of the bpf-map's value instead of printing the pointer pointing to the bpf-map's value. In this case, a member of the bpf-map's value is a pointer type. Thus, it should print the "*(void **)data". Fixes: 22c349e8db89 ("tools: bpftool: fix format strings and arguments for jsonw_printf()") Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Cc: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * crypto: mxs-dcp - make symbols 'sha1_null_hash' and 'sha256_null_hash' static commit ce4e45842de3eb54b8dd6e081765d741f5b92b56 upstream. Fixes the following sparse warnings: drivers/crypto/mxs-dcp.c:39:15: warning: symbol 'sha1_null_hash' was not declared. Should it be static? drivers/crypto/mxs-dcp.c:43:15: warning: symbol 'sha256_null_hash' was not declared. Should it be static? Fixes: c709eebaf5c5 ("crypto: mxs-dcp - Fix SHA null hashes and output length") Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * vti4: removed duplicate log message. commit 01ce31c57b3f07c91c9d45bbaf126124cce83a5d upstream. Removed info log-message if ipip tunnel registration fails during module-initialization: it adds nothing to the error message that is written on all failures. Fixes: dd9ee3444014e ("vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel") Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Steffen Klassert <[email protected]> Cc: Guenter Roeck <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> * arm64: Add part number for Neoverse N1 [ Upstream commit 0cf57b86859c49381addb3ce47be70aadf5fd2c0 ] New CPU, new part number. You know the drill. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * arm64: errata: Hide CTR_EL0.DIC on systems affected by Neoverse-N1 #1542419 [ Upstream commit 05460849c3b51180d5ada3373d0449aea19075e4 ] Cores affected by Neoverse-N1 #1542419 could execute a stale instruction when a branch is updated to point to freshly generated instructions. To workaround this issue we need user-space to issue unnecessary icache maintenance that we can trap. Start by hiding CTR_EL0.DIC. Reviewed-by: Suzuki K Poulose <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> [ Removed cpu_enable_trap_ctr_access() hunk due to no 4afe8e79da92] Signed-off-by: James Morse <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * arm64: Fake the IminLine size on systems affected by Neoverse-N1 #1542419 [ Upstream commit ee9d90be9ddace01b7fb126567e4b539fbe1f82f ] Systems affected by Neoverse-N1 #1542419 support DIC so do not need to perform icache maintenance once new instructions are cleaned to the PoU. For the errata workaround, the kernel hides DIC from user-space, so that the unnecessary cache maintenance can be trapped by firmware. To reduce the number of traps, produce a fake IminLine value based on PAGE_SIZE. Signed-off-by: James Morse <[email protected]> Reviewed-by: Suzuki K Poulose <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space [ Upstream commit: 222fc0c8503d98cec3cb2bac2780cdd21a6e31c0 ] Compat user-space is unable to perform ICIMVAU instructions from user-space. Instead it uses a compat-syscall. Add the workaround for Neoverse-N1 #1542419 to this code path. Signed-off-by: James Morse <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * arm64: Silence clang warning on mismatched value/register sizes [ Upstream commit: 27a22fbdeedd6c5c451cf5f830d51782bf50c3a2 ] Clang reports a warning on the __tlbi(aside1is, 0) macro expansion since the value size does not match the register size specified in the inline asm. Construct the ASID value using the __TLBI_VADDR() macro. Fixes: 222fc0c8503d ("arm64: compat: Workaround Neoverse-N1 #1542419 for compat user-space") Reported-by: Nathan Chancellor <[email protected]> Cc: James Morse <[email protected]> Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: James Morse <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * watchdog: reset last_hw_keepalive time at start [ Upstream commit 982bb70517aef2225bad1d802887b733db492cc0 ] Currently the watchdog core does not initialize the last_hw_keepalive time during watchdog startup. This will cause the watchdog to be pinged immediately if enough time has passed from the system boot-up time, and some types of watchdogs like K3 RTI does not like this. To avoid the issue, setup the last_hw_keepalive time during watchdog startup. Signed-off-by: Tero Kristo <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Wim Van Sebroeck <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login [ Upstream commit 38503943c89f0bafd9e3742f63f872301d44cbea ] The following kasan bug was called out: BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc] Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676 ... Call Trace: dump_stack+0x96/0xe0 ? lpfc_unreg_login+0x7c/0xc0 [lpfc] print_address_description.constprop.6+0x1b/0x220 ? lpfc_unreg_login+0x7c/0xc0 [lpfc] ? lpfc_unreg_login+0x7c/0xc0 [lpfc] __kasan_report.cold.9+0x37/0x7c ? lpfc_unreg_login+0x7c/0xc0 [lpfc] kasan_report+0xe/0x20 lpfc_unreg_login+0x7c/0xc0 [lpfc] lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc] ... When processing the completion of a "Reg Rpi" login mailbox command in lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is extracted from the completing mailbox context and passed as an input for the next. However, the vpi stored in the mailbox command context is an absolute vpi, which for SLI4 represents both base + offset. When used with a non-zero base component, (function id > 0) this results in an out-of-range access beyond the allocated phba->vpi_ids array. Fix by subtracting the function's base value to get an accurate vpi number. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Smart <[email protected]> Signed-off-by: Dick Kennedy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * scsi: lpfc: Fix crash in target side cable pulls hitting WAIT_FOR_UNREG [ Upstream commit 807e7353d8a7105ce884d22b0dbc034993c6679c ] Kernel is crashing with the following stacktrace: BUG: unable to handle kernel NULL pointer dereference at 00000000000005bc IP: lpfc_nvme_register_port+0x1a8/0x3a0 [lpfc] ... Call Trace: lpfc_nlp_state_cleanup+0x2b2/0x500 [lpfc] lpfc_nlp_set_state+0xd7/0x1a0 [lpfc] lpfc_cmpl_prli_prli_issue+0x1f7/0x450 [lpfc] lpfc_disc_state_machine+0x7a/0x1e0 [lpfc] lpfc_cmpl_els_prli+0x16f/0x1e0 [lpfc] lpfc_sli_sp_handle_rspiocb+0x5b2/0x690 [lpfc] lpfc_sli_handle_slow_ring_event_s4+0x182/0x230 [lpfc] lpfc_do_work+0x87f/0x1570 [lpfc] kthread+0x10d/0x130 ret_from_fork+0x35/0x40 During target side fault injections, it is possible to hit the NLP_WAIT_FOR_UNREG case in lpfc_nvme_remoteport_delete. A prior commit fixed a rebind and delete race condition, but called lpfc_nlp_put unconditionally. This triggered a deletion and the crash. Fix by movng nlp_put to inside the NLP_WAIT_FOR_UNREG case, where the nlp will be being unregistered/removed. Leave the reference if the flag isn't set. Link: https://lore.kernel.org/r/[email protected] Fixes: b15bd3e6212e ("scsi: lpfc: Fix nvme remoteport registration race conditions") Signed-off-by: James Smart <[email protected]> Signed-off-by: Dick Kennedy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ceph: return ceph_mdsc_do_request() errors from __get_parent() [ Upstream commit c6d50296032f0b97473eb2e274dc7cc5d0173847 ] Return the error returned by ceph_mdsc_do_request(). Otherwise, r_target_inode ends up being NULL this ends up returning ENOENT regardless of the error. Signed-off-by: Qiujun Huang <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ceph: don't skip updating wanted caps when cap is stale [ Upstream commit 0aa971b6fd3f92afef6afe24ef78d9bb14471519 ] 1. try_get_cap_refs() fails to get caps and finds that mds_wanted does not include what it wants. It returns -ESTALE. 2. ceph_get_caps() calls ceph_renew_caps(). ceph_renew_caps() finds that inode has cap, so it calls ceph_check_caps(). 3. ceph_check_caps() finds that issued caps (without checking if it's stale) already includes caps wanted by open file, so it skips updating wanted caps. Above events can cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * pwm: rcar: Fix late Runtime PM enablement [ Upstream commit 1451a3eed24b5fd6a604683f0b6995e0e7e16c79 ] Runtime PM should be enabled before calling pwmchip_add(), as PWM users can appear immediately after the PWM chip has been added. Likewise, Runtime PM should be disabled after the removal of the PWM chip. Fixes: ed6c1476bf7f16d5 ("pwm: Add support for R-Car PWM Timer") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Uwe Kleine-König <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Signed-off-by: Thierry Reding <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * scsi: iscsi: Report unbind session event when the target has been removed [ Upstream commit 13e60d3ba287d96eeaf1deaadba51f71578119a3 ] If the daemon is restarted or crashes while logging out of a session, the unbind session event sent by the kernel is not processed and is lost. When the daemon starts again, the session can't be unbound because the daemon is waiting for the event message. However, the kernel has already logged out and the event will not be resent. When iscsid restart is complete, logout session reports error: Logging out of session [sid: 6, target: iqn.xxxxx, portal: xx.xx.xx.xx,3260] iscsiadm: Could not logout of [sid: 6, target: iscsiadm -m node iqn.xxxxx, portal: xx.xx.xx.xx,3260]. iscsiadm: initiator reported error (9 - internal error) iscsiadm: Could not logout of all requested sessions Make sure the unbind event is emitted. [mkp: commit desc and applied by hand since patch was mangled] Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Lee Duncan <[email protected]> Signed-off-by: Wu Bo <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map() [ Upstream commit 81630dc042af998b9f58cd8e2c29dab9777ea176 ] sst_send_slot_map() uses sst_fill_and_send_cmd_unlocked() because in some places it is called with the drv->lock mutex already held. So it must always be called with the mutex locked. This commit adds missing locking in the sst_set_be_modules() code-path. Fixes: 24c8d14192cc ("ASoC: Intel: mrfld: add DSP core controls") Signed-off-by: Hans de Goede <[email protected]> Acked-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * nvme: fix deadlock caused by ANA update wrong locking [ Upstream commit 657f1975e9d9c880fa13030e88ba6cc84964f1db ] The deadlock combines 4 flows in parallel: - ns scanning (triggered from reconnect) - request timeout - ANA update (triggered from reconnect) - I/O coming into the mpath device (1) ns scanning triggers disk revalidation -> update disk info -> freeze queue -> but blocked, due to (2) (2) timeout handler reference the g_usage_counter - > but blocks in the transport .timeout() handler, due to (3) (3) the transport timeout handler (indirectly) calls nvme_stop_queue() -> which takes the (down_read) namespaces_rwsem - > but blocks, due to (4) (4) ANA update takes the (down_write) namespaces_rwsem -> calls nvme_mpath_set_live() -> which synchronize the ns_head srcu (see commit 504db087aacc) -> but blocks, due to (5) (5) I/O came into nvme_mpath_make_request -> took srcu_read_lock -> direct_make_request > blk_queue_enter -> but blocked, due to (1) ==> the request queue is under freeze -> deadlock. The fix is making ANA update take a read lock as the namespaces list is not manipulated, it is just the ns and ns->head that are being updated (which is protected with the ns->head lock). Fixes: 0d0b660f214dc ("nvme: add ANA support") Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * kernel/gcov/fs.c: gcov_seq_next() should increase position index [ Upstream commit f4d74ef6220c1eda0875da30457bef5c7111ab06 ] If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Peter Oberparleiter <[email protected]> Cc: Al Viro <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: NeilBrown <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * selftests: kmod: fix handling test numbers above 9 [ Upstream commit 6d573a07528308eb77ec072c010819c359bebf6e ] get_test_count() and get_test_enabled() were broken for test numbers above 9 due to awk interpreting a field specification like '$0010' as octal rather than decimal. Fix it by stripping the leading zeroes. Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Kees Cook <[email protected]> Cc: NeilBrown <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * ipc/util.c: sysvipc_find_ipc() should increase position index [ Upstream commit 89163f93c6f969da5811af5377cc10173583123b ] If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Al Viro <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: NeilBrown <[email protected]> Cc: Peter Oberparleiter <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * kconfig: qconf: Fix a few alignment issues [ Upstream commit 60969f02f07ae1445730c7b293c421d179da729c ] There are a few items with wrong alignments. Solve them. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * s390/cio: avoid duplicated 'ADD' uevents [ Upstream commit 05ce3e53f375295c2940390b2b429e506e07655c ] The common I/O layer delays the ADD uevent for subchannels and delegates generating this uevent to the individual subchannel drivers. The io_subchannel driver will do so when the associated ccw_device has been registered -- but unconditionally, so more ADD uevents will be generated if a subchannel has been unbound from the io_subchannel driver and later rebound. To fix this, only generate the ADD event if uevents were still suppressed for the device. Fixes: fa1a8c23eb7d ("s390: cio: Delay uevents for subchannels") Message-Id: <[email protected]> Reported-by: Boris Fiuczynski <[email protected]> Reviewed-by: Peter Oberparleiter <[email protected]> Reviewed-by: Boris Fiuczynski <[email protected]> Signed-off-by: Cornelia Huck <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * loop: Better discard support for block devices [ Upstream commit c52abf563049e787c1341cdf15c7dbe1bfbc951b ] If the backing device for a loop device is itself a block device, then mirror the "write zeroes" capabilities of the underlying block device into the loop device. Copy this capability into both max_write_zeroes_sectors and max_discard_sectors of the loop device. The reason for this is that REQ_OP_DISCARD on a loop device translates into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This presents a consistent interface for loop devices (that discarded data is zeroed), regardless of the backing device type of the loop device. There should be no behavior change for loop devices backed by regular files. This change fixes blktest block/003, and removes an extraneous error print in block/013 when testing on a loop device backed by a block device that does not support discard. Signed-off-by: Evan Green <[email protected]> Reviewed-by: Gwendal Grignou <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> [used updated version of Evan's comment in loop_config_discard()] [moved backingq to local scope, removed redundant braces] Signed-off-by: Andrzej Pietrasiewicz <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * Revert "powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled" [ Upstream commit abc3fce76adbdfa8f87272c784b388cd20b46049 ] This reverts commit ebb37cf3ffd39fdb6ec5b07111f8bb2f11d92c5f. That commit does not play well with soft-masked irq state manipulations in idle, interrupt replay, and possibly others due to tracing code sometimes using irq_work_queue (e.g., in trace_hardirqs_on()). That can cause PACA_IRQ_DEC to become set when it is not expected, and be ignored or cleared or cause warnings. The net result seems to be missing an irq_work until the next timer interrupt in the worst case which is usually not going to be noticed, however it could be a long time if the tick is disabled, which is against the spirit of irq_work and might cause real problems. The idea is still solid, but it would need more work. It's not really clear if it would be worth added complexity, so revert this for now (not a straight revert, but replace with a comment explaining why we might see interrupts happening, and gives git blame something to find). Fixes: ebb37cf3ffd3 ("powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled") Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]> * pwm: renesas-tpu: Fix late Runtime PM enablement [ Upstream commit d5a3c7a4536e1329a758e14340efd0e65252bd3d ] Runtime PM should be enabled before calling pwmchip_add(), as PWM users can appear immediately after the PWM chip has been added. Likewise, Runtime PM should always be disabled after the removal of the PWM chip, even if the latter failed. Fixes: 99b82abb0a35b073 ("pwm: Add Renesas TPU PWM driver") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Thierry Reding <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * pwm: bcm2835: Dynamically allocate base [ Upstream commit 2c25b07e5ec119cab609e41407a1fb3fa61442f5 ] The newer 2711 and 7211 chips have two PWM controllers and failure to dynamically allocate the PWM base would prevent the second PWM controller instance being probed for succeeding with an -EEXIST error from alloc_pwms(). Fixes: e5a06dc5ac1f ("pwm: Add BCM2835 PWM driver") Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Uwe Kleine-König <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Signed-off-by: Thierry Reding <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * perf/core: Disable page faults when getting phys address [ Upstream commit d3296fb372bf7497b0e5d0478c4e7a677ec6f6e9 ] We hit following warning when running tests on kernel compiled with CONFIG_DEBUG_ATOMIC_SLEEP=y: WARNING: CPU: 19 PID: 4472 at mm/gup.c:2381 __get_user_pages_fast+0x1a4/0x200 CPU: 19 PID: 4472 Comm: dummy Not tainted 5.6.0-rc6+ #3 RIP: 0010:__get_user_pages_fast+0x1a4/0x200 ... Call Trace: perf_prepare_sample+0xff1/0x1d90 perf_event_output_forward+0xe8/0x210 __perf_event_overflow+0x11a/0x310 __intel_pmu_pebs_event+0x657/0x850 intel_pmu_drain_pebs_nhm+0x7de/0x11d0 handle_pmi_common+0x1b2/0x650 intel_pmu_handle_irq+0x17b/0x370 perf_event_nmi_handler+0x40/0x60 nmi_handle+0x192/0x590 default_do_nmi+0x6d/0x150 do_nmi+0x2f9/0x3c0 nmi+0x8e/0xd7 While __get_user_pages_fast() is IRQ-safe, it calls access_ok(), which warns on: WARN_ON_ONCE(!in_task() && !pagefault_disabled()) Peter suggested disabling page faults around __get_user_pages_fast(), which gets rid of the warning in access_ok() call. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]> * ASoC: Intel: bytcr_rt5640: Add quirk for MPMAN MPWIN895CL tablet [ Upstream commit c8b78f24c1247b7bd0882885c672d9dec5800bc6 ] The MPMAN MPWIN895CL tablet almost fully works with out default settings. The only problem is that it has only 1 speaker so any sounds only playing on the right channel get lost. Add a quirk for this model using the default settings + MONO_SPEAKER. Signed-off-by: Hans de Goede <[email protected]> Acked-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * xhci: Ensure link state is U3 after setting USB_SS_PORT_LS_U3 [ Upstream commit eb002726fac7cefb98ff39ddb89e150a1c24fe85 ] The xHCI spec doesn't specify the upper bound of U3 transition time. For some devices 20ms is not enough, so we need to make sure the link state is in U3 before further actions. I've tried to use U3 Entry Capability by setting U3 Entry Enable in config register, however the port change event for U3 transition interrupts the system suspend process. For now let's use the less ideal method by polling PLS. [use usleep_range(), and shorten the delay time while polling -Mathias] Signed-off-by: Kai-Heng Feng <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * drm/amd/display: Not doing optimize bandwidth if flip pending. [ Upstream commit 9941b8129030c9202aaf39114477a0e58c0d6ffc ] [Why] In some scenario like 1366x768 VSR enabled connected with a 4K monitor and playing 4K video in clone mode, underflow will be observed due to decrease dppclk when previouse surface scan isn't finished [How] In this use case, surface flip is switching between 4K and 1366x768, 1366x768 needs smaller dppclk, and when decrease the clk and previous surface scan is for 4K and scan isn't done, underflow will happen. Not doing optimize bandwidth in case of flip pending. Signed-off-by: Yongqiang Sun <[email protected]> Reviewed-by: Tony Cheng <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * tracing/selftests: Turn off timeout setting [ Upstream commit b43e78f65b1d35fd3e13c7b23f9b64ea83c9ad3a ] As the ftrace selftests can run for a long period of time, disable the timeout that the general selftests have. If a selftest hangs, then it probably means the machine will hang too. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Miroslav Benes <[email protected]> Tested-by: Miroslav Benes <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * virtio-blk: improve virtqueue error to BLK_STS [ Upstream commit 3d973b2e9a625996ee997c7303cd793b9d197c65 ] Let's change the mapping between virtqueue_add errors to BLK_STS statuses, so that -ENOSPC, which indicates virtqueue full is still mapped to BLK_STS_DEV_RESOURCE, but -ENOMEM which indicates non-device specific resource outage is mapped to BLK_STS_RESOURCE. Signed-off-by: Halil Pasic <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * scsi: smartpqi: fix call trace in device discovery [ Upstream commit b969261134c1b990b96ea98fe5e0fcf8ec937c04 ] Use sas_phy_delete rather than sas_phy_free which, according to comments, should not be called for PHYs that have been set up successfully. Link: https://lore.kernel.org/r/157048748876.11757.17773443136670011786.stgit@brunhilda Reviewed-by: Scott Benesh <[email protected]> Reviewed-by: Scott Teel <[email protected]> Reviewed-by: Kevin Barnett <[email protected]> Signed-off-by: Murthy Bhat <[email protected]> Signed-off-by: Don Brace <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * PCI/ASPM: Allow re-enabling Clock PM [ Upstream commit 35efea32b26f9aacc99bf07e0d2cdfba2028b099 ] Previously Clock PM could not be re-enabled after being disabled by pci_disable_link_state() because clkpm_capable was reset. Change this by adding a clkpm_disable field similar to aspm_disable. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * net: ipv6: add net argument to ip6_dst_lookup_flow commit c4e85f73afb6384123e5ef1bba3315b2e3ad031e upstream. This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow, as some modules currently pass a net argument without a socket to ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument"). Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]> [bwh: Backported to 4.19: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup commit 6c8991f41546c3c472503dff1ea9daaddf9331c2 upstream. ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]> [bwh: Backported to 4.19: - Drop change in lwt_bpf.c - Delete now-unused "ret" in mlx5e_route_lookup_ipv6() - Initialise "out_dev" in mlx5e_create_encap_header_ipv6() to avoid introducing a spurious "may be used uninitialised" warning - Adjust filenames, context, indentation] Signed-off-by: Ben Hutchings <[email protected]> Signed-off-by: Sasha Levin <[email protected]> * blktrace: Protect q->blk_trace with RCU commit c780e86dd48ef6467a1146cf7d0fe1e05a635039 upstream. KASAN is reporting that __blk_add_trace() has a use-after-free issue when accessing q->blk_trace. Indeed the switching of block tracing (and thus eventual freeing of q->blk_trace) is completely unsynchronized with the currently running tracing and thus it can happen that the blk_trace structure is being freed just while __blk_add_trace() works on it. Protect accesses to q->blk_trace by RCU during tracing and make sure we …

WSLUser · 2020-07-31T15:59:07Z

I did some digging and discovered torvalds/linux@7ac8707#diff-8702464a90b6e9aa5d339c88a4345950 killed the HYPERV_TSCPAGE option along with every other custom option. Due to the age of it, bumping to the 5.4x series will remove this option. Supposedly vclock_gettime.c should be doing an even better job than before.

This reverts commit 2714567.

WSLUser · 2020-09-01T18:59:59Z

Closing in favor of #176 to reduce clutter and target 5.4 branch.

bwarden mentioned this pull request Jun 9, 2020

Add a modified wslconfig to kernel for WSL2 support clearlinux/distribution#2004

Closed

This was referenced Jun 9, 2020

Add CONFIG_BPF_JIT to kernel config #76

Closed

Add iptables conntrack extension to config-wsl #70

Closed

enable CONFIG_SECURITY_YAMA=y #53

Closed

enable cryptsetup usage for luks #87

Closed

Support for strict confinement of snaps #60

Closed

WSLUser added 3 commits June 9, 2020 14:01

Change name of #Compiler to MSFT

2c8b92a

Update config-wsl

6aaaaec

Updated for 5.7.2 kernel. Adds ExFat support.

1f407b1

tycho reviewed Jun 23, 2020

View reviewed changes

Fixed config to remove errors

67289fc

Addressed PR feedback.

WSLUser changed the title ~~Clear Linux kernel optimizations~~ Optimized kernel config for better general kernel support and saner defaults Jul 13, 2020

WSLUser added 3 commits July 13, 2020 15:15

Returns CONFIG PPP back to wsl conig

1180e43

Got removed for some reason

Fixed more reversions.

b2d1559

Removed thermal stuff and last round of reversions

69f0fda

Fix another reversion that causes issues with 9P file server

ddedd2f

WSLUser mentioned this pull request Jul 17, 2020

fail after install on wsl2 - std::experimental::filesystem::v1::__cxx11::filesystem_error microsoft/ProcMon-for-Linux#4

Closed

Support BCC fully (includes networking)

f33fff6

Allows full performance analysis and network traffic control with bcc: https://github.com/iovisor/bcc

Revert "Linux msft wsl 4.19.y (#3)" (#6)

67af18a

This reverts commit 2714567.

WSLUser closed this Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized kernel config for better general kernel support and saner defaults #115

Optimized kernel config for better general kernel support and saner defaults #115

WSLUser commented Jun 1, 2020 •

edited

Loading

tycho left a comment

WSLUser commented Jun 24, 2020

tycho commented Jun 24, 2020

WSLUser commented Jun 24, 2020

miketheitguy commented Jun 24, 2020

WSLUser commented Jul 13, 2020

WSLUser commented Jul 13, 2020 •

edited

Loading

tycho commented Jul 13, 2020

WSLUser commented Jul 13, 2020 •

edited

Loading

microhobby commented Jul 14, 2020

WSLUser commented Jul 14, 2020

microhobby commented Jul 19, 2020

microhobby commented Jul 19, 2020

microhobby commented Jul 19, 2020 •

edited

Loading

WSLUser commented Jul 19, 2020

miketheitguy commented Jul 19, 2020

WSLUser commented Jul 19, 2020

miketheitguy commented Jul 19, 2020

WSLUser commented Jul 31, 2020 •

edited

Loading

WSLUser commented Sep 1, 2020

Optimized kernel config for better general kernel support and saner defaults #115

Optimized kernel config for better general kernel support and saner defaults #115

Conversation

WSLUser commented Jun 1, 2020 • edited Loading

tycho left a comment

Choose a reason for hiding this comment

WSLUser commented Jun 24, 2020

tycho commented Jun 24, 2020

WSLUser commented Jun 24, 2020

miketheitguy commented Jun 24, 2020

WSLUser commented Jul 13, 2020

WSLUser commented Jul 13, 2020 • edited Loading

tycho commented Jul 13, 2020

WSLUser commented Jul 13, 2020 • edited Loading

microhobby commented Jul 14, 2020

WSLUser commented Jul 14, 2020

microhobby commented Jul 19, 2020

microhobby commented Jul 19, 2020

microhobby commented Jul 19, 2020 • edited Loading

WSLUser commented Jul 19, 2020

miketheitguy commented Jul 19, 2020

WSLUser commented Jul 19, 2020

miketheitguy commented Jul 19, 2020

WSLUser commented Jul 31, 2020 • edited Loading

WSLUser commented Sep 1, 2020

WSLUser commented Jun 1, 2020 •

edited

Loading

WSLUser commented Jul 13, 2020 •

edited

Loading

WSLUser commented Jul 13, 2020 •

edited

Loading

microhobby commented Jul 19, 2020 •

edited

Loading

WSLUser commented Jul 31, 2020 •

edited

Loading