Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README Version #176

Closed
wants to merge 1 commit into from
Closed

Update README Version #176

wants to merge 1 commit into from

Conversation

MAFLO321
Copy link

Update Linux Kernel version in README file

@MAFLO321 MAFLO321 closed this Apr 13, 2015
@MAFLO321 MAFLO321 deleted the patch-1 branch April 13, 2015 19:50
ddstreet referenced this pull request in ddstreet/linux Jun 20, 2015
GIT c5901c20eb0341722035975a272a2a7b647fbb32

commit eeb64c14275e52740d6410632e62e0ad9b88ca70
Author: Samuel Thibault <[email protected]>
Date:   Sat Jun 6 11:44:39 2015 -0700

    tty/vt/keyboard: define LED triggers for VT keyboard lock states
    
    In addition to defining triggers for VT LED states, let's define triggers
    for VT keyboard lock states, such as "kbd-shiftlock", "kbd-altgrlock", etc.
    
    This permits to fix #7063 from userland by using a modifier to implement
    proper CapsLock behavior and have the keyboard caps lock led show that
    modifier state.
    
    Signed-off-by: Samuel Thibault <[email protected]>
    Tested-by: Pavel Machek <[email protected]>
    Acked-by: Pavel Machek <[email protected]>
    Signed-off-by: Dmitry Torokhov <[email protected]>

commit 5235552273e6b68abbed3b3047af6344e2e60c2c
Author: Samuel Thibault <[email protected]>
Date:   Mon Mar 16 21:19:44 2015 -0700

    tty/vt/keyboard: define LED triggers for VT LED states
    
    Now that input core allows controlling keyboards LEDs via standard LED
    subsystem triggers let's switch VT keyboard code to make use of this
    feature. We will define the following standard triggers: "kbd-scrollock",
    "kbd-numlock", "kbd-capslock", and "kbd-kanalock" which are default
    triggers for respective LEDs on keyboards.
    
    Signed-off-by: Samuel Thibault <[email protected]>
    Tested-by: Pavel Machek <[email protected]>
    Acked-by: Pavel Machek <[email protected]>
    Signed-off-by: Dmitry Torokhov <[email protected]>

commit 10e87dc42a086c256b25334b6c1c89214feba9a7
Author: Andrew Duggan <[email protected]>
Date:   Tue Jun 16 14:08:41 2015 -0700

    HID: rmi: Disable populating F30 when the touchpad has physical buttons
    
    Physical buttons do not use F30 to report their state and in some cases the
    data reported in F30 is incorrect and inconsistent with what is reported by
    the HID descriptor. When physical buttons are present, ignore F30 and let
    hid-input report buttons based on what is defined in the HID descriptor.
    
    Signed-off-by: Andrew Duggan <[email protected]>
    Reviewed-by: Benjamin Tissoires <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>

commit ba8d134e75deb1904b146a4decb0bc6a217333cd
Author: Nishanth Menon <[email protected]>
Date:   Mon Apr 20 19:51:34 2015 -0500

    rtc: ds1307: Enable the mcp794xx alarm after programming time
    
    Alarm interrupt enable register is at offset 0x7, while the time
    registers for the alarm follow that. When we program Alarm interrupt
    enable prior to programming the time, it is possible that previous
    time value could be close or match at the time of alarm enable
    resulting in interrupt trigger which is unexpected (and does not match
    the time we expect it to trigger).
    
    To prevent this scenario from occuring, program the ALM0_EN bit only
    after the alarm time is appropriately programmed.
    
    Ofcourse, I2C programming is non-atomic, so there are loopholes where
    the interrupt wont trigger if the time requested is in the past at
    the time of programming the ALM0_EN bit. However, we will not have
    unexpected interrupts while the time is programmed after the interrupt
    are enabled.
    
    Signed-off-by: Nishanth Menon <[email protected]>
    Reviewed-by: Grygorii Strashko <[email protected]>
    Signed-off-by: Alexandre Belloni <[email protected]>

commit b7ae128d728c42583dac9db48dce9a44bc0fb900
Author: Robert Richter <[email protected]>
Date:   Fri Jun 5 19:49:26 2015 +0200

    ahci: Add support for Cavium's ThunderX host controller
    
    This patch adds support for Cavium's ThunderX host controller. The
    controller resides on the SoC and is a AHCI compatible SATA controller
    with one port, compliant with Serial ATA 3.1 and AHCI Revision 1.31.
    There can exists multiple SATA controllers on the SoC.
    
    The controller depends on MSI-X support since the PCI ECAM controller
    on the SoC does not implement MSI nor lagacy intx interrupt support.
    Thus, during device initialization, if MSI fails MSI-X will be used to
    enable the device's interrupts.
    
    The controller uses non-standard BAR0 for its register range. The
    already existing device lookup (vendor and device id) that is already
    implemented for other host controllers is used to change the PCI BAR.
    
    Signed-off-by: Robert Richter <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>

commit ee2aad42e4b6eaa9721196f07f7d5d8d049e6530
Author: Robert Richter <[email protected]>
Date:   Fri Jun 5 19:49:25 2015 +0200

    ahci: Add generic MSI-X support for single interrupts to SATA PCI driver
    
    This patch adds generic MSI-X support for single interrupts to the
    SATA PCI driver. MSI-X support is needed for host controller that only
    have MSI-X support implemented, but no MSI or intx. This patch only
    adds support for single interrupts, multiple per-port MSI-X interrupts
    are not yet implemented.
    
    The new implementation still initializes MSIs first. Only if that
    fails, the code tries to enable MSI-X. If that fails too, setup is
    continued with intx interrupts.
    
    To not break other chips by this generic code change, there are the
    following precautions:
    
     * Interrupt ranges are not enabled at all.
    
     * Only single interrupt mode is enabled for msix cap devices. Thus,
       only one interrupt will be setup.
    
     * During the discussion with Tejun we agreed to change the init
       sequence from msix-msi-intx to msi-msix-intx. Thus, if a device
       offers msi and init does not fail, the msix init code will not be
       executed. This is equivalent to current code.
    
    With this, the code only setups single mode msix as a last resort if
    msi fails. No interrupt range is enabled at all. Only one interrupt
    will be enabled.
    
    tj: comment edits.
    
    Changes of the patch series:
    
    v5:
     * updated patch subject that the patch only implements single IRQ
     * moved Cavium specific code to a separate patch
     * detect Cavium ThunderX device with PCI_CLASS_STORAGE_SATA_AHCI
       instead of vendor/dev id
     * added more comments to the code
     * enable single msix support for all kind of devices (removing strict
       check)
     * rebased onto update libata/for-4.2 with patch 1, 2 applied
    
    v4:
     * removed implementation of ahci_init_intx()
     * improved patch descriptions
     * rebased onto libata/for-4.2
    
    v3:
     * store irq number in struct ahci_host_priv
     * change initialization order from msix-msi-intx to msi-msix-intx
     * improve comments in ahci_init_msix()
     * improve error message in ahci_init_msix()
     * do not enable MSI-X if MSI is actively disabled for the device
    
    v2:
     * determine irq vector from pci_dev->msi_list
    
    Based on a patch from Sunil Goutham <[email protected]>.
    
    Signed-off-by: Robert Richter <[email protected]>
    Signed-off-by: Tejun Heo <[email protected]>

commit e0dd268a2c983acf2b52130b489b3b5724e26b39
Author: Uwe Kleine-König <[email protected]>
Date:   Fri Jun 12 00:35:43 2015 -0700

    leds: aat1290: pass flags parameter to devm_gpiod_get
    
    Since 39b2bbe3d715 (gpio: add flags argument to gpiod_get*() functions)
    which appeared in v3.17-rc1, the gpiod_get* functions take an additional
    parameter that allows to specify direction and initial value for output.
    
    In this case the driver cannot easily be simplified but as the flags
    parameter will become mandatory soon this change is necessary
    beforehand.
    
    Signed-off-by: Uwe Kleine-König <[email protected]>
    Acked-by: Jacek Anaszewski <[email protected]>
    Signed-off-by: Bryan Wu <[email protected]>

commit c8e27605c687d2d628217bef721e955d4baa1ce1
Author: Uwe Kleine-König <[email protected]>
Date:   Fri Jun 12 00:32:23 2015 -0700

    leds: ktd2692: pass flags parameter to devm_gpiod_get
    
    Since 39b2bbe3d715 (gpio: add flags argument to gpiod_get*() functions)
    which appeared in v3.17-rc1, the gpiod_get* functions take an additional
    parameter that allows to specify direction and initial value for output.
    
    In this case the driver cannot easily be simplified but as the flags
    parameter will become mandatory soon this change is necessary
    beforehand.
    
    Signed-off-by: Uwe Kleine-König <[email protected]>
    Acked-by: Jacek Anaszewski <[email protected]>
    Signed-off-by: Bryan Wu <[email protected]>

commit bc3e452003d02b8ec21546490aaed36003a83864
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:13:42 2015 -0400

    module: relocate module_init from init.h to module.h
    
    Modular users will always be users of init functionality, but
    users of init functionality are not necessarily always modules.
    
    Hence any functionality like module_init and module_exit would
    be more at home in the module.h file.  And module.h should
    explicitly include init.h to make the dependency clear.
    
    We've already done all the legwork needed to ensure that this
    move does not cause any build regressions due to implicit
    header file include assumptions about where module_init lives.
    
    Cc: Rusty Russell <[email protected]>
    Acked-by: Rusty Russell <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit b0c6d93014c8f2f53b70e9362b9fbec13b8e3aa0
Author: Paul Gortmaker <[email protected]>
Date:   Mon Jun 15 09:56:26 2015 -0400

    MIPS: don't use module_init in non-modular cobalt/mtd.c file
    
    As of commit 34b1252bd91851f77f89fbb6829a04efad900f41 ("MIPS:
    Cobalt: Do not build MTD platform device registration code as module.")
    this file became built-in instead of modular.  So we should also
    stop using module_init as an alias for __initcall as that can be
    rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.
    
    Cc: Ralf Baechle <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 33d69ca12b44ef3c7be8f948ffa5a35652e1f2ff
Author: Paul Gortmaker <[email protected]>
Date:   Mon Jun 15 16:48:22 2015 -0500

    drivers/leds: don't use module_init in non-modular leds-cobalt-raq.c
    
    This file is built for a bool Kconfig variable, and hence this
    code is either present or absent.  It currently can never be
    modular, so using module_init as an alias for __initcall can be
    somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    And since it can't be modular, we remove all the __exitcall
    stuff related to module_exit() -- it is dead code that won't
    ever be executed.
    
    Cc: Bryan Wu <[email protected]>
    Cc: Richard Purdie <[email protected]>
    Cc: Jacek Anaszewski <[email protected]>
    Acked-by: Jacek Anaszewski <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 32e805e7c6a343894c95a3431973e8ddad4aa2cf
Author: Paul Gortmaker <[email protected]>
Date:   Fri Jun 5 09:37:19 2015 -0400

    tile: add init.h to usb.c to avoid compile failure
    
    Pending header cleanups will reveal this file is using the
    init.h content implicitly with the following fail:
    
    arch/tile/kernel/usb.c:69:1: warning: data definition has no type or storage class [enabled by default]
    arch/tile/kernel/usb.c:69:1: error: type defaults to 'int' in declaration of 'arch_initcall'
    arch/tile/kernel/usb.c:69:1: warning: parameter names (without types) in function declaration [enabled by default]
    arch/tile/kernel/usb.c:62:19: warning: 'tilegx_usb_init' defined but not used
    
    Explicitly add init.h to get arch_initcall and avoid this.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: Chris Metcalf <[email protected]>
    Acked-by: Chris Metcalf <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 9b9cf81a2d1f5336de2bebae71a9f2b8d5f1a8de
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:13:42 2015 -0400

    arm: fix implicit #include <linux/init.h> in entry asm.
    
    They use the "_INIT" macro and friends, and hence need to
    source this header file, vs. relying on getting it implicitly.
    
    Cc: Russell King <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 70c4f78b23c69013c908222d55a07c96fea4bba1
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:13:42 2015 -0400

    x86: replace __init_or_module with __init in non-modular vsmp_64.c
    
    The __init_or_module is from commit 05e12e1c4c09cd35ac9f4e6af1e
    ("x86: fix 27-rc crash on vsmp due to paravirt during module load").
    
    But as of commit 70511134f61bd6e5eed19f767381f9fb3e762d49
    ("Revert "x86: don't compile vsmp_64 for 32bit") this file became
    obj-y and hence is now only for built-in.  That makes any
    "_or_module" support no longer necessary.
    
    We need to distinguish between the two in order to do some header
    reorganization between init.h and module.h and we don't want to
    be including module.h in non-modular code.
    
    Cc: Thomas Gleixner <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 77459a0feca4ae8757a905fd1791f039479e8e1e
Author: Paul Gortmaker <[email protected]>
Date:   Wed Jun 3 11:20:05 2015 -0400

    drivers/clk: convert sunxi/clk-mod0.c to use builtin_platform_driver
    
    This driver builds based on obj-y and hence will not ever be
    modular.  Change it to use the non-modular registration so that it
    won't suffer a compile fail once a header move places the modular
    registration within the module.h file.
    
    Cc: "Emilio López" <[email protected]>
    Cc: Mike Turquette <[email protected]>
    Cc: Stephen Boyd <[email protected]>
    Acked-by: Stephen Boyd <[email protected]>
    Cc: Maxime Ripard <[email protected]>
    Acked-by: Maxime Ripard <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit e35415e59f86d6b546a3681e2cda4f22b5b142c0
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:58 2015 -0400

    drivers/power: Convert non-modular syscon-reboot to use builtin_platform_driver
    
    This file depends on Kconfig options all of which are a bool, so
    we use the appropriate registration function, which avoids us
    relying on an implicit inclusion of <module.h> which we are
    doing currently.
    
    While this currently works, we really don't want to be including
    the module.h header in non-modular code, which we'd be forced
    to do, pending some upcoming code relocation from init.h into
    module.h.  So we fix it now by using the non-modular equivalent.
    
    Cc: Sebastian Reichel <[email protected]>
    Acked-By: Sebastian Reichel <[email protected]>
    Cc: Dmitry Eremin-Solenikov <[email protected]>
    Cc: David Woodhouse <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 0159ae95e6a923f565937f10518aa3c919527733
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:57 2015 -0400

    drivers/soc: Convert non-modular soc-realview to use builtin_platform_driver
    
    This file depends on Kconfig SOC_REALVIEW which is a bool, so
    we use the appropriate registration function, which avoids us
    relying on an implicit inclusion of <module.h> which we are
    doing currently.
    
    While this currently works, we really don't want to be including
    the module.h header in non-modular code, which we'd be forced
    to do, pending some upcoming code relocation from init.h into
    module.h.  So we fix it now by using the non-modular equivalent.
    
    Cc: Arnd Bergmann <[email protected]>
    Cc: Linus Walleij <[email protected]>
    Acked-by: Linus Walleij <[email protected]>
    Cc: Axel Lin <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 7d4d9ed6ef5219857865dd57d425f9729d0a39ff
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:57 2015 -0400

    drivers/soc: Convert non-modular tegra/pmc to use builtin_platform_driver
    
    This file depends on Kconfig ARCH_TEGRA which is a bool, so
    we use the appropriate registration function, which avoids us
    relying on an implicit inclusion of <module.h> which we are
    doing currently.
    
    While this currently works, we really don't want to be including
    the module.h header in non-modular code, which we'd be forced
    to do, pending some upcoming code relocation from init.h into
    module.h.  So we fix it now by using the non-modular equivalent.
    
    Cc: Stephen Warren <[email protected]>
    Cc: Thierry Reding <[email protected]>
    Cc: Alexandre Courbot <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 5b64127e0529387d4538ecc3dfd49248baf619c5
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:57 2015 -0400

    drivers/cpufreq: Convert non-modular s5pv210-cpufreq.c to use builtin_platform_driver
    
    This file depends on a Kconfig option which is a bool, so
    we use the appropriate registration function, which avoids us
    relying on an implicit inclusion of <module.h> which we are
    doing currently.
    
    While this currently works, we really don't want to be including
    the module.h header in non-modular code, which we'd be forced
    to do, pending some upcoming code relocation from init.h into
    module.h.  So we fix it now by using the non-modular equivalent.
    
    Cc: "Rafael J. Wysocki" <[email protected]>
    Cc: Viresh Kumar <[email protected]>
    Acked-by: Viresh Kumar <[email protected]>
    Cc: Kukjin Kim <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 090d1cf103725f583b3f41fc3185698ae5a7aa64
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:57 2015 -0400

    drivers/cpuidle: Convert non-modular drivers to use builtin_platform_driver
    
    All these drivers are configured with Kconfig options that are
    declared as bool.  Hence it is not possible for the code
    to be built as modular.  However the code is currently using the
    module_platform_driver() macro for driver registration.
    
    While this currently works, we really don't want to be including
    the module.h header in non-modular code, which we'll be forced
    to do, pending some upcoming code relocation from init.h into
    module.h.  So we fix it now by using the non-modular equivalent.
    
    Cc: "Rafael J. Wysocki" <[email protected]>
    Cc: Daniel Lezcano <[email protected]>
    Cc: Michal Simek <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 1dda2b42db1bbc788bf6de0a8141a305484f963b
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:57 2015 -0400

    drivers/platform: Convert non-modular pdev_bus to use builtin_platform_driver
    
    This driver is configured with a Kconfig option that is
    declared as a bool.  Hence it is not possible for the code
    to be built as modular.  However the code is currently using
    the module_platform_driver() macro for driver registration.
    
    While this currently works, we really don't want to be including
    the module.h header in non-modular code, which we'll be forced
    to do, pending some upcoming code relocation from init.h into
    module.h.  So we fix it now by using the non-modular equivalent.
    And since we've already established that the code is non-modular,
    we can completely drop any code relating to module_exit.
    
    Signed-off-by: Paul Gortmaker <[email protected]>

commit f309d4443130bf814e991f836e919dca22df37ae
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:10:57 2015 -0400

    platform_device: better support builtin boilerplate avoidance
    
    We have macros that help reduce the boilerplate for modules
    that register with no extra init/exit complexity other than the
    most standard use case.  However we see an increasing number of
    non-modular drivers using these modular_driver() type register
    functions.
    
    There are several downsides to this:
    1) The code can appear modular to a reader of the code, and they
       won't know if the code really is modular without checking the
       Makefile and Kconfig to see if compilation is governed by a
       bool or tristate.
    2) Coders of drivers may be tempted to code up an __exit function
       that is never used, just in order to satisfy the required three
       args of the modular registration function.
    3) Non-modular code ends up including the <module.h> which increases
       CPP overhead that they don't need.
    4) It hinders us from performing better separation of the module
       init code and the generic init code.
    
    Here we introduce similar macros, with the mapping from module_driver
    to builtin_driver and similar, so that simple changes of:
    
      module_platform_driver()       --->  builtin_platform_driver()
      module_platform_driver_probe() --->  builtin_platform_driver_probe().
    
    can help us avoid #3 above, without having to code up the same
    __init functions and device_initcall() boilerplate.
    
    For non modular code, module_init becomes __initcall.  But direct use
    of __initcall is discouraged, vs. one of the priority categorized
    subgroups.  As __initcall gets mapped onto device_initcall, our
    use of device_initcall directly in this change means that the
    runtime impact is zero -- drivers will remain at level 6 in the
    initcall ordering.
    
    Cc: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 5b00c1eb94e5936e5bf5cdd9ad1ddfbed0c39159
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 21:57:34 2015 -0400

    x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
    
    This was using module_init, but the current Kconfig situation is
    as follows:
    
    In arch/x86/kernel/cpu/Makefile:
    
      obj-$(CONFIG_CPU_SUP_INTEL)    += perf_event_intel_pt.o perf_event_intel_bts.o
    
    and in arch/x86/Kconfig.cpu:
    
      config CPU_SUP_INTEL
            default y
            bool "Support Intel processors" if PROCESSOR_SELECT
    
    So currently, the end user can not build this code into a module.
    If in the future, there is desire for this to be modular, then
    it can be changed to include <linux/module.h> and use module_init.
    
    But currently, in the non-modular case, a module_init becomes a
    device_initcall.  But this really isn't a device, so we should
    choose a more appropriate initcall bucket to put it in.
    
    The obvious choice here seems to be arch_initcall, but that does
    make it earlier than it was currently through device_initcall.
    As long as perf_pmu_register() is functional, we should be OK.
    
    Cc: Peter Zijlstra <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Arnaldo Carvalho de Melo <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit ca41d24cf56458a699b44e918c5a19b7077df422
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 21:57:34 2015 -0400

    x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
    
    This was using module_init, but the current Kconfig situation is
    as follows:
    
    In arch/x86/kernel/cpu/Makefile:
    
      obj-$(CONFIG_CPU_SUP_INTEL)    += perf_event_intel_pt.o perf_event_intel_bts.o
    
    and in arch/x86/Kconfig.cpu:
    
      config CPU_SUP_INTEL
            default y
            bool "Support Intel processors" if PROCESSOR_SELECT
    
    So currently, the end user can not build this code into a module.
    If in the future, there is desire for this to be modular, then
    it can be changed to include <linux/module.h> and use module_init.
    
    But currently, in the non-modular case, a module_init becomes a
    device_initcall.  But this really isn't a device, so we should
    choose a more appropriate initcall bucket to put it in.
    
    The obvious choice here seems to be arch_initcall, but that does
    make it earlier than it was currently through device_initcall.
    As long as perf_pmu_register() is functional, we should be OK.
    
    Cc: Peter Zijlstra <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Arnaldo Carvalho de Melo <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 44c5af96de8230ff7268500f48995f9fea5cffe7
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 21:57:34 2015 -0400

    mm/page_owner.c: use late_initcall to hook in enabling
    
    This was using module_init, but there is no way this code can
    be modular.  In the non-modular case, a module_init becomes a
    device_initcall, but this really isn't a device.   So we should
    choose a more appropriate initcall bucket to put it in.
    
    In order of execution, our close choices are:
    
     fs_initcall(fn)
     rootfs_initcall(fn)
     device_initcall(fn)
     late_initcall(fn)
    
    ..and since the initcall here goes after debugfs, we really
    should be post-rootfs, which means late_initcall makes the
    most sense here.
    
    Cc: Andrew Morton <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 4c7217f1f0fe70af7b9e213ef16f1d2f4a4bacaf
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 21:57:34 2015 -0400

    lib/list_sort: use late_initcall to hook in self tests
    
    This was using module_init, but there is no way this code can
    be modular.  In the non-modular case, a module_init becomes a
    device_initcall, but this really isn't a device.   So we should
    choose a more appropriate initcall bucket to put it in.
    
    Assuming boot time self tests need to be observed over a console
    to be useful, and that the console device could possibly not be
    fully functional until after device_initcall, we move this to the
    late_initcall bucket, which is immediately after device_initcall.
    
    Cc: Andrew Morton <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 89f08f64408b630df7d559223f63e616d0814509
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:21 2015 -0400

    arm: use subsys_initcall in non-modular pl320 IPC code
    
    The drivers/mailbox/pl320-ipc.o is dependent on config PL320_MBOX
    which is declared as a bool.  Hence the code is never going to be
    modular.  So using module_init as an alias for __initcall can be
    somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.  Also add an inclusion of init.h, as
    that was previously implicit.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of subsys_initcall (which
    seems to make sense for IPC code) will thus change this
    registration from level 6-device to level 4-subsys (i.e. slightly
    earlier).  However no impact of that small difference is expected.
    
    Cc: Russell King <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 6f114281c4ad543392f5b7c8345e11e103675cee
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:21 2015 -0400

    powerpc: don't use module_init for non-modular core hugetlb code
    
    The hugetlbpage.o is obj-y (always built in).  It will never
    be modular, so using module_init as an alias for __initcall is
    somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of arch_initcall (which
    makes sense for arch code) will thus change this registration
    from level 6-device to level 3-arch (i.e. slightly earlier).
    However no observable impact of that small difference has
    been observed during testing, or is expected.
    
    Cc: Benjamin Herrenschmidt <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 383d14a5365879bc193d29ad2ed17ac5299753c3
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:21 2015 -0400

    powerpc: use subsys_initcall for Freescale Local Bus
    
    The FSL_SOC option is bool, and hence this code is either
    present or absent.  It will never be modular, so using
    module_init as an alias for __initcall is rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of subsys_initcall (which
    makes sense for bus code) will thus change this registration
    from level 6-device to level 4-subsys (i.e. slightly earlier).
    However no observable impact of that small difference has
    been observed during testing, or is expected.
    
    Cc: Benjamin Herrenschmidt <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: Scott Wood <[email protected]>
    Acked-by: Scott Wood <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 1206f53589237b7e00b9b0a4e42815f14aedad2d
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:21 2015 -0400

    x86: don't use module_init for non-modular core bootflag code
    
    The bootflag.o is obj-y (always built in).  It will never be
    modular, so using module_init as an alias for __initcall is
    somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of arch_initcall (which
    makes sense for arch code) will thus change this registration
    from level 6-device to level 3-arch (i.e. slightly earlier).
    However no observable impact of that small difference has
    been observed during testing, or is expected.
    
    Cc: Thomas Gleixner <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 55331060096f0e9a57356ec36476a49e4bf22bc1
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:20 2015 -0400

    netfilter: don't use module_init/exit in core IPV4 code
    
    The file net/ipv4/netfilter.o is created based on whether
    CONFIG_NETFILTER is set.  However that is defined as a bool, and
    hence this file with the core netfilter hooks will never be
    modular.  So using module_init as an alias for __initcall can be
    somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.  Also add an inclusion of init.h, as
    that was previously implicit here in the netfilter.c file.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of subsys_initcall (which
    seems to make sense for netfilter code) will thus change this
    registration from level 6-device to level 4-subsys (i.e. slightly
    earlier).  However no observable impact of that small difference
    has been observed during testing, or is expected. (i.e. the
    location of the netfilter messages in dmesg remains unchanged
    with respect to all the other surrounding messages.)
    
    As for the module_exit, rather than replace it with __exitcall,
    we simply remove it, since it appears only UML does anything
    with those, and even for UML, there is no relevant cleanup
    to be done here.
    
    Cc: Pablo Neira Ayuso <[email protected]>
    Acked-by: Pablo Neira Ayuso <[email protected]>
    Cc: Patrick McHardy <[email protected]>
    Cc: Jozsef Kadlecsik <[email protected]>
    Cc: "David S. Miller" <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit c013d5a4581203e074a1065e17378984544fcaef
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:20 2015 -0400

    fs/notify: don't use module_init for non-modular inotify_user code
    
    The INOTIFY_USER option is bool, and hence this code is either
    present or absent.  It will never be modular, so using
    module_init as an alias for __initcall is rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of fs_initcall (which
    makes sense for fs code) will thus change this registration
    from level 6-device to level 5-fs (i.e. slightly earlier).
    However no observable impact of that small difference has
    been observed during testing, or is expected.
    
    Cc: John McCutchan <[email protected]>
    Cc: Robert Love <[email protected]>
    Cc: Eric Paris <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit a4bc6fc79f94c5b4f850aabca9c5249adc597094
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:08:20 2015 -0400

    mm: replace module_init usages with subsys_initcall in nommu.c
    
    Compiling some arm/m68k configs with "# CONFIG_MMU is not set" reveals
    two more instances of module_init being used for code that can't
    possibly be modular, as CONFIG_MMU is either on or off.
    
    We replace them with subsys_initcall as per what was done in other
    mmu-enabled code.
    
    Note that direct use of __initcall is discouraged, vs.  one of the
    priority categorized subgroups.  As __initcall gets mapped onto
    device_initcall, our use of subsys_initcall (which makes sense for these
    files) will thus change this registration from level 6-device to level
    4-subsys (i.e.  slightly earlier).
    
    One might think that core_initcall (l2) or postcore_initcall (l3) would
    be more appropriate for anything in mm/ but if we look at the actual init
    functions themselves, we see they are just sysctl setup stuff, and
    hence the choice of subsys_initcall (l4) seems reasonable.  At the same
    time it minimizes the risk of changing the priority too drastically all
    at once.  We can adjust further in the future.
    
    Also, a couple instances of missing ";" at EOL are fixed.
    
    Cc: Andrew Morton <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 84c3e5bf1defc035d63869bbb0f5f80d276c1fc7
Author: Paul Gortmaker <[email protected]>
Date:   Sun Jun 14 16:55:25 2015 -0400

    cris: don't use module_init for non-modular core eeprom.c code
    
    The eeprom.c code is compiled based on the Kconfig setting
    ETRAX_I2C_EEPROM, which is bool.  So the code is either built in
    or absent.  It will never be modular, so using module_init as an
    alias for __initcall is rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.   Should someone with real hardware
    for boot testing want to change it later to arch_initcall or
    something different, they can do that at a later date.
    
    Cc: Mikael Starvik <[email protected]>
    Cc: Jesper Nilsson <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 4d38e5c48f4095be21343869ad741676ab4e518f
Author: James Hogan <[email protected]>
Date:   Fri Jun 5 22:17:18 2015 +0100

    tty/metag_da: Avoid module_init/module_exit in non-modular code
    
    The metag_da TTY driver can't get built as a module at the moment, but
    it still uses module_init() and module_exit(). Those macros are moving
    to module.h which isn't included by metag_da.c, which will result in the
    following build warnings (remarkably no build errors) and an apparent
    failure to boot as the TTY driver won't be loaded.
    
    drivers/tty/metag_da.c:660: warning: data definition has no type or storage class
    drivers/tty/metag_da.c:660: warning: type defaults to ‘int’ in declaration of ‘module_init’
    drivers/tty/metag_da.c:660: warning: parameter names (without types) in function declaration
    drivers/tty/metag_da.c:661: warning: data definition has no type or storage class
    drivers/tty/metag_da.c:661: warning: type defaults to ‘int’ in declaration of ‘module_exit’
    drivers/tty/metag_da.c:661: warning: parameter names (without types) in function declaration
    drivers/tty/metag_da.c:572: warning: ‘dashtty_init’ defined but not used
    drivers/tty/metag_da.c:645: warning: ‘dashtty_exit’ defined but not used
    drivers/tty/metag_da.c In function ‘dash_console_write’:
    drivers/tty/metag_da.c:670 : warning: passing argument 4 of ‘chancall’ discards qualifiers from pointer target type
    
    Instead of just adding the module.h include, now would be a good time to
    remove the use of these macros, replacing the module_init with
    device_initcall, and removing the exit function altogether since it
    isn't needed. If module support is added later the code can always be
    resurrected.
    
    Reported-by: Guenter Roeck <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Signed-off-by: James Hogan <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: Jiri Slaby <[email protected]>
    Cc: [email protected]
    Cc: Paul Gortmaker <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 791ed0bb5558dfdc4040563bd0b7dc24450fa732
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:51 2015 -0400

    drivers/clk: don't use module_init in clk-nomadik.c which is non-modular
    
    The clk-nomadik.o is built for ARCH_NOMADIK -- which is bool, and
    hence this code is either present or absent.  It will never be
    modular, so using module_init as an alias for __initcall can be
    somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Cc: Mike Turquette <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 30e3c6428f18b5b8e78602a5a7cc653aee3bfe99
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    xtensa: don't use module_init for non-modular core network.c code
    
    The network.c code is piggybacking off of the arch independent
    CONFIG_NET, which is bool.  So the code is either built in or
    absent.  It will never be modular, so using module_init as an
    alias for __initcall is rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.   Should someone with real hardware
    for boot testing want to change it later to arch_initcall or
    something different, they can do that at a later date.
    
    Cc: Chris Zankel <[email protected]>
    Cc: Max Filippov <[email protected]>
    Cc: Thomas Meyer <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit b205118bdb4b515b4b4f5058aa9f5a12668386c3
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    sh: don't use module_init in non-modular psw.c code
    
    The psw.o is built for obj-y -- and hence this code is always
    present.  It will never be modular, so using module_init as an alias
    for __initcall can be somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: Paul Mundt <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 1b4d5beecbeb4608a0fdb77c3b8ba182f0cfb4b6
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    mn10300: don't use module_init in non-modular flash.c code
    
    The flash.o is built for obj-y -- and hence this code is always
    present.  It will never be modular, so using module_init as an alias
    for __initcall can be somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: David Howells <[email protected]>
    Acked-by: David Howells <[email protected]>
    Cc: Koichi Yasutake <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 15becabd89fa3fec6aa864fbd1b50b5b1871eee2
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    parisc64: don't use module_init for non-modular core perf code
    
    The perf.c code depends on CONFIG_64BIT, so it is either built-in
    or absent.  It will never be modular, so using module_init as an
    alias for __initcall is rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.  Aside from it not making sense, it also
    causes a ~10% increase in CPP overhead due to module.h having a
    large list of headers itself -- for example compare line counts:
    
     device_initcall() and <linux/init.h>
    	20238 arch/parisc/kernel/perf.i
    
     module_init() and <linux/module.h>
    	22194 arch/parisc/kernel/perf.i
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.   Should someone with real hardware
    for boot testing want to change it later to arch_initcall or
    something different, they can do that at a later date.
    
    Cc: "James E.J. Bottomley" <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit aed6850a1390c2b208b91b2fae0199fc14b94a26
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    parisc: don't use module_init for non-modular core pdc_cons code
    
    The pdc_cons.c code is always built in.  It will never be modular,
    so using module_init as an alias for __initcall is rather
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.   Should someone with real hardware
    for boot testing want to change it later to arch_initcall or
    something different, they can do that at a later date.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: "James E.J. Bottomley" <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 73de14e8cdc733bbc8eda006f813d5aa51511139
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    cris: don't use module_init for non-modular core intmem.c code
    
    The intmem.c code is always built in.  It will never be modular,
    so using module_init as an alias for __initcall is rather
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.   Should someone with real hardware
    for boot testing want to change it later to arch_initcall or
    something different, they can do that at a later date.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: Mikael Starvik <[email protected]>
    Cc: Jesper Nilsson <[email protected]>
    Acked-by: Jesper Nilsson <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 2a177fd1d92f669f8f493a61e195ff4e3c50f95f
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:50 2015 -0400

    ia64: don't use module_init in non-modular sim/simscsi.c code
    
    The simscsi.o is built for HP_SIMSCSI -- which is bool, and hence
    this code is either present or absent.  It will never be modular,
    so using module_init as an alias for __initcall can be somewhat
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    And since it can't be modular, we remove all the __exitcall
    stuff related to module_exit() -- it is dead code that won't
    ever be executed.
    
    Cc: Tony Luck <[email protected]>
    Cc: Fenghua Yu <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 2e21fa2d11ab61e1827bd5bb1e0e2484931d68e1
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:49 2015 -0400

    ia64: don't use module_init for non-modular core kernel/mca.c code
    
    The mca.c code is always built in.  It will never be modular,
    so using module_init as an alias for __initcall is rather
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Direct use of __initcall is discouraged, vs prioritized ones.
    Use of device_initcall is consistent with what __initcall
    maps onto, and hence does not change the init order, making the
    impact of this change zero.   Should someone with real hardware
    for boot testing want to change it later to arch_initcall or
    something different, they can do that at a later date.
    
    Cc: Tony Luck <[email protected]>
    Cc: Fenghua Yu <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 4a0ece7ceceab251e92e7f98e7926642a065727b
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:49 2015 -0400

    arm: don't use module_init in non-modular mach-vexpress/spc.c code
    
    The spc.o is built for ARCH_VEXPRESS_SPC -- which is bool, and hence
    this code is either present or absent.  It will never be modular,
    so using module_init as an alias for __initcall can be somewhat
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Cc: Russell King <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit a390a2f18147533359d4e45cb13438d42580da84
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:49 2015 -0400

    powerpc: don't use module_init in non-modular 83xx suspend code
    
    The suspend.o is built for SUSPEND -- which is bool, and hence
    this code is either present or absent.  It will never be modular,
    so using module_init as an alias for __initcall can be somewhat
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Cc: Scott Wood <[email protected]>
    Cc: Benjamin Herrenschmidt <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 8f6b9512ceadc6bd52777c299111dc642b4c65b6
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:49 2015 -0400

    powerpc: use device_initcall for registering rtc devices
    
    Currently these two RTC devices are in core platform code
    where it is not possible for them to be modular.  It will
    never be modular, so using module_init as an alias for
    __initcall can be somewhat misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- they will remain at level 6 in initcall ordering.
    
    Cc: Benjamin Herrenschmidt <[email protected]>
    Cc: Paul Mackerras <[email protected]>
    Cc: Geoff Levand <[email protected]>
    Acked-by: Geoff Levand <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit d54b675a6b0007422dc13acbecdb1ca2b1a53aeb
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:49 2015 -0400

    x86: don't use module_init in non-modular devicetree.c code
    
    The devicetree.o is built for "OF" -- which is bool, and hence
    this code is either present or absent.  It will never be modular,
    so using module_init as an alias for __initcall can be somewhat
    misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: Thomas Gleixner <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 4711e2f9caedaa07e7cdcb5e058a18762d6be9b1
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:05:49 2015 -0400

    x86: don't use module_init in non-modular intel_mid_vrtc.c
    
    The X86_INTEL_MID option is bool, and hence this code is either
    present or absent.  It will never be modular, so using
    module_init as an alias for __initcall is rather misleading.
    
    Fix this up now, so that we can relocate module_init from
    init.h into module.h in the future.  If we don't do this, we'd
    have to add module.h to obviously non-modular code, and that
    would be a worse thing.
    
    Note that direct use of __initcall is discouraged, vs. one
    of the priority categorized subgroups.  As __initcall gets
    mapped onto device_initcall, our use of device_initcall
    directly in this change means that the runtime impact is
    zero -- it will remain at level 6 in initcall ordering.
    
    Cc: Thomas Gleixner <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: "H. Peter Anvin" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 7cac34370a4dde12e6430c2f0985926d4ef0f459
Author: Paul Gortmaker <[email protected]>
Date:   Fri Jun 5 11:25:18 2015 -0400

    frv: add module.h to mb93090-mb00/flash.c to avoid compile fail
    
    This file is built off of a tristate Kconfig option and also contains
    modular function calls so it should explicitly include module.h to
    avoid compile breakage during header shuffles done in the future.
    
    Reported-by: kbuild test robot <[email protected]>
    Cc: David Howells <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 743492ccd53008736f169f242479bac6245f8379
Author: Paul Gortmaker <[email protected]>
Date:   Wed Jun 3 15:45:21 2015 -0400

    drivers/cpufreq: include <module.h> for modular exynos-cpufreq.c code
    
    This file is built off of a tristate Kconfig option ("ARM_EXYNOS_CPUFREQ")
    and also contains modular function calls so it should explicitly include
    module.h to avoid compile breakage during pending header shuffles.
    
    Cc: "Rafael J. Wysocki" <[email protected]>
    Cc: Viresh Kumar <[email protected]>
    Acked-by: Viresh Kumar <[email protected]>
    Cc: Kukjin Kim <[email protected]>
    Cc: Krzysztof Kozlowski <[email protected]>
    Acked-by: Krzysztof Kozlowski <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit a7e9bc55cc144dc40e809e579bd932ef2ec324de
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:02:31 2015 -0400

    drivers/staging: include <module.h> for modular android tegra_ion code
    
    This file is built off of a tristate Kconfig option and also contains
    modular function calls so it should explicitly include module.h to
    avoid compile breakage during header shuffles done in the future.
    
    Cc: Greg Kroah-Hartman <[email protected]>
    Cc: "Arve Hj�nnev�g" <[email protected]>
    Cc: Riley Andrews <[email protected]>
    Cc: Stephen Warren <[email protected]>
    Cc: Thierry Reding <[email protected]>
    Cc: Alexandre Courbot <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 88775588b71d28a9020a7faa4ad95bbf76d8bb45
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 21:29:53 2015 -0400

    crypto/asymmetric_keys: pkcs7_key_type needs module.h
    
    This driver builds off of the tristate CONFIG_PKCS7_TEST_KEY and calls
    module_init and module_exit. So it should explicitly include module.h
    to avoid compile breakage during header shuffles done in the future.
    
    Cc: Herbert Xu <[email protected]>
    Cc: "David S. Miller" <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 0bbad249a6a4934203b50d574f5d5f9f480b389e
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:02:31 2015 -0400

    sh: mach-highlander/psw.c is tristate and should use module.h
    
    This file is controlled by a tristate Kconfig option, and hence
    needs to include module.h so that it can get module_init() once
    we relocate it from init.h into module.h in the future.
    
    Note that module_exit() appears to be missing from the driver, so
    it is questionable whether it would actually work for a removal
    and reload cycle if it was configured for a modular build.
    
    Cc: Paul Mundt <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit ca1c8e93c37e5a5e27e6149cd3612eb2247e0e4a
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:02:31 2015 -0400

    drivers/regulator: include <module.h> for modular max77802 code
    
    This file is built off of a tristate Kconfig option and also contains
    modular function calls so it should explicitly include module.h to
    avoid compile breakage during header shuffles done in the future.
    
    Cc: Liam Girdwood <[email protected]>
    Cc: Mark Brown <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 5468f887bc861b2fe2fa24a44bc6a616a5d33a73
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:02:31 2015 -0400

    drivers/pcmcia: include <module.h> for modular xxs1500_ss code
    
    This file is built off of a tristate Kconfig option and also contains
    modular function calls so it should explicitly include module.h to
    avoid compile breakage during header shuffles done in the future.
    
    Cc: Wolfram Sang <[email protected]>
    Acked-by: Wolfram Sang <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paul Gortmaker <[email protected]>

commit a1a0bec593623f49740d7900e4b862c534f219bf
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:02:30 2015 -0400

    drivers/hsi: include <module.h> for modular omap_ssi code
    
    These files are built off of a tristate Kconfig option and also contain
    modular function calls so they should explicitly include module.h to
    avoid compile breakage during header shuffles done in the future.
    
    We change the one header file wich gives us coverage on both files:
       drivers/hsi/controllers/omap_ssi.c
       drivers/hsi/controllers/omap_ssi_port.c
    
    Cc: Sebastian Reichel <[email protected]>
    Signed-off-by: Paul Gortmaker <[email protected]>

commit 00fe614863eed7ca39fc72a307c6dff57b690476
Author: Paul Gortmaker <[email protected]>
Date:   Fri May 1 20:02:30 2015 -0400

    drivers/gpu: in…
torvalds pushed a commit that referenced this pull request Jun 26, 2015
Asynchronous firmware loading copies the pointer to the
name passed as an argument only to be scheduled later and
used. This behaviour works well for synchronous calling
but in asynchronous mode there's a chance the caller could
immediately free the passed string after making the
asynchronous call. This could trigger a use after free
having the kernel look on disk for arbitrary file names.

In order to force-test the issue you can use a test-driver
designed to illustrate this issue on github [0], use the
next-20150505-fix-use-after-free branch.

With this patch applied you get:

[  283.512445] firmware name: test_module_stuff.bin
[  287.514020] firmware name: test_module_stuff.bin
[  287.532489] firmware found

Without this patch applied you can end up with something such as:

[  135.624216] firmware name: \xffffff80BJ
[  135.624249] platform fake-dev.0: Direct firmware load for \xffffff80Bi failed with error -2
[  135.624252] No firmware found
[  135.624252] firmware found

Unfortunatley in the worst and most common case however you
can typically crash your system with a page fault by trying to
free something which you cannot, and/or a NULL pointer
dereference [1].

The fix and issue using schedule_work() for asynchronous
runs is generalized in the following SmPL grammar patch,
when applied to next-20150505 only the firmware_class
code is affected. This grammar patch can and should further
be generalized to vet for for other kernel asynchronous
mechanisms.

@ calls_schedule_work @
type T;
T *priv_work;
identifier func, work_func;
identifier work;
identifier priv_name, name;
expression gfp;
@@

 func(..., const char *name, ...)
 {
 	...
 	priv_work = kzalloc(sizeof(T), gfp);
 	...
-	priv_work->priv_name = name;
+	priv_work->priv_name = kstrdup_const(name, gfp);
	...
(... when any
 	if (...)
 	{
 		...
+ 		kfree_const(priv_work->priv_name);
 		kfree(priv_work);
		...
 	}
) ... when any
 	INIT_WORK(&priv_work->work, work_func);
 	...
 	schedule_work(&priv_work->work);
 	...
 }

@ the_work_func depends on calls_schedule_work @
type calls_schedule_work.T;
T *priv_work;
identifier calls_schedule_work.work_func;
identifier calls_schedule_work.priv_name;
identifier calls_schedule_work.work;
identifier some_work;
@@

 work_func(...)
 {
 	...
 	priv_work = container_of(some_work, T, work);
 	...
+	kfree_const(priv_work->priv_name);
 	kfree(priv_work);
 	...
 }

[0] https://github.com/mcgrof/fake-firmware-test.git
[1] The following kernel ring buffer splat:

firmware name: test_module_stuff.bin
firmware name:
firmware found
general protection fault: 0000 [#1] SMP
Modules linked in: test(O) <...etc-it-does-not-matter>
 drm sr_mod cdrom xhci_pci xhci_hcd rtsx_pci mfd_core video button sg
CPU: 3 PID: 87 Comm: kworker/3:2 Tainted: G           O    4.0.0-00010-g22b5bb0-dirty #176
Hardware name: LENOVO 20AW000LUS/20AW000LUS, BIOS GLET43WW (1.18 ) 12/04/2013
Workqueue: events request_firmware_work_func
task: ffff8800c7f8e290 ti: ffff8800c7f94000 task.ti: ffff8800c7f94000
RIP: 0010:[<ffffffff814a586c>]  [<ffffffff814a586c>] fw_free_buf+0xc/0x40
RSP: 0000:ffff8800c7f97d78  EFLAGS: 00010286
RAX: ffffffff81ae3700 RBX: ffffffff816d1181 RCX: 0000000000000006
RDX: 0001ee850ff68500 RSI: 0000000000000246 RDI: c35d5f415e415d41
RBP: ffff8800c7f97d88 R08: 000000000000000a R09: 0000000000000000
R10: 0000000000000358 R11: ffff8800c7f97a7e R12: ffff8800c7ec1e80
R13: ffff88021e2d4cc0 R14: ffff88021e2dff00 R15: 00000000000000c0
FS:  0000000000000000(0000) GS:ffff88021e2c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000034b8cd8 CR3: 000000021073c000 CR4: 00000000001407e0
Stack:
 ffffffff816d1181 ffff8800c7ec1e80 ffff8800c7f97da8 ffffffff814a58f8
 000000000000000a ffffffff816d1181 ffff8800c7f97dc8 ffffffffa047002c
 ffff88021e2dff00 ffff8802116ac1c0 ffff8800c7f97df8 ffffffff814a65fe
Call Trace:
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffff814a58f8>] release_firmware+0x58/0x80
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffffa047002c>] test_mod_cb+0x2c/0x43 [test]
 [<ffffffff814a65fe>] request_firmware_work_func+0x5e/0x80
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffff8108d23a>] process_one_work+0x14a/0x3f0
 [<ffffffff8108d911>] worker_thread+0x121/0x460
 [<ffffffff8108d7f0>] ? rescuer_thread+0x310/0x310
 [<ffffffff810928f9>] kthread+0xc9/0xe0
 [<ffffffff81092830>] ? kthread_create_on_node+0x180/0x180
 [<ffffffff816d52d8>] ret_from_fork+0x58/0x90
 [<ffffffff81092830>] ? kthread_create_on_node+0x180/0x180
Code: c7 c6 dd ad a3 81 48 c7 c7 20 97 ce 81 31 c0 e8 0b b2 ed ff e9 78 ff ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <4c> 8b 67 38 48 89 fb 4c 89 e7 e8 85 f7 22 00 f0 83 2b 01 74 0f
RIP  [<ffffffff814a586c>] fw_free_buf+0xc/0x40
 RSP <ffff8800c7f97d78>
---[ end trace 4e62c56a58d0eac1 ]---
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff81093ee0>] kthread_data+0x10/0x20
PGD 1c13067 PUD 1c15067 PMD 0
Oops: 0000 [#2] SMP
Modules linked in: test(O) <...etc-it-does-not-matter>
 drm sr_mod cdrom xhci_pci xhci_hcd rtsx_pci mfd_core video button sg
CPU: 3 PID: 87 Comm: kworker/3:2 Tainted: G      D    O    4.0.0-00010-g22b5bb0-dirty #176
Hardware name: LENOVO 20AW000LUS/20AW000LUS, BIOS GLET43WW (1.18 ) 12/04/2013
task: ffff8800c7f8e290 ti: ffff8800c7f94000 task.ti: ffff8800c7f94000
RIP: 0010:[<ffffffff81092ee0>]  [<ffffffff81092ee0>] kthread_data+0x10/0x20
RSP: 0018:ffff8800c7f97b18  EFLAGS: 00010096
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000000d
RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff8800c7f8e290
RBP: ffff8800c7f97b18 R08: 000000000000bc00 R09: 0000000000007e76
R10: 0000000000000001 R11: 000000000000002f R12: ffff8800c7f8e290
R13: 00000000000154c0 R14: 0000000000000003 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88021e2c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 0000000210675000 CR4: 00000000001407e0
Stack:
 ffff8800c7f97b38 ffffffff8108dcd5 ffff8800c7f97b38 ffff88021e2d54c0
 ffff8800c7f97b88 ffffffff816d1500 ffff880213d42368 ffff8800c7f8e290
 ffff8800c7f97b88 ffff8800c7f97fd8 ffff8800c7f8e710 0000000000000246
Call Trace:
 [<ffffffff8108dcd5>] wq_worker_sleeping+0x15/0xa0
 [<ffffffff816d1500>] __schedule+0x6e0/0x940
 [<ffffffff816d1797>] schedule+0x37/0x90
 [<ffffffff810779bc>] do_exit+0x6bc/0xb40
 [<ffffffff8101898f>] oops_end+0x9f/0xe0
 [<ffffffff81018efb>] die+0x4b/0x70
 [<ffffffff81015622>] do_general_protection+0xe2/0x170
 [<ffffffff816d74e8>] general_protection+0x28/0x30
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffff814a586c>] ? fw_free_buf+0xc/0x40
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffff814a58f8>] release_firmware+0x58/0x80
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffffa047002c>] test_mod_cb+0x2c/0x43 [test]
 [<ffffffff814a65fe>] request_firmware_work_func+0x5e/0x80
 [<ffffffff816d1181>] ? __schedule+0x361/0x940
 [<ffffffff8108d23a>] process_one_work+0x14a/0x3f0
 [<ffffffff8108d911>] worker_thread+0x121/0x460
 [<ffffffff8108d7f0>] ? rescuer_thread+0x310/0x310
 [<ffffffff810928f9>] kthread+0xc9/0xe0
 [<ffffffff81092830>] ? kthread_create_on_node+0x180/0x180
 [<ffffffff816d52d8>] ret_from_fork+0x58/0x90
 [<ffffffff81092830>] ? kthread_create_on_node+0x180/0x180
Code: 00 48 89 e5 5d 48 8b 40 c8 48 c1 e8 02 83 e0 01 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 87 30 05 00 00 55 48 89 e5 <48> 8b 40 d8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
RIP  [<ffffffff81092ee0>] kthread_data+0x10/0x20
 RSP <ffff8800c7f97b18>
CR2: ffffffffffffffd8
---[ end trace 4e62c56a58d0eac2 ]---
Fixing recursive fault but reboot is needed!

Cc: Linus Torvalds <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: David Howells <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Seth Forshee <[email protected]>
Cc: Kyle McMartin <[email protected]>
Generated-by: Coccinelle SmPL
Signed-off-by: Luis R. Rodriguez <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 21, 2016
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at
the device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

- provision three zram devices via modprobe zram num_devices=3
- configure a size for each device
  + echo "1G" > /sys/block/$zram_name/disksize
- mkfs and mount zram0 only
- attempt to hot remove all three devices
  + echo 2 > /sys/class/zram-control/hot_remove
  + echo 1 > /sys/class/zram-control/hot_remove
  + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
- unmount zram0
- try zram0 hot remove again
  + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
- unload zram kernel module
  + completes successfully
- zram0 device node still exists
- attempt to mount /dev/zram0
  + mount command is killed
  + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: [<ffffffff812eead6>] get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 torvalds#176
 task: ffff88001a9f2800 task.stack: ffffc90000300000
 RIP: 0010:[<ffffffff812eead6>]  [<ffffffff812eead6>] get_disk+0x16/0x50
 Call Trace:
  [<ffffffff812eeb1c>] exact_lock+0xc/0x20
  [<ffffffff813b3e1c>] kobj_lookup+0xdc/0x160
  [<ffffffff812edce0>] ? disk_map_sector_rcu+0x70/0x70
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff812eef4f>] get_gendisk+0x2f/0x110
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff81126e2c>] __blkdev_get+0x10c/0x3c0
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff8112727d>] blkdev_get+0x19d/0x2e0
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff81127466>] blkdev_open+0x56/0x70
  [<ffffffff810f3e0f>] do_dentry_open.isra.19+0x1ff/0x310
  [<ffffffff810f4aa3>] vfs_open+0x43/0x60
  [<ffffffff81103009>] path_openat+0x2c9/0xf30
  [<ffffffff81023c00>] ? __save_stack_trace+0x40/0xd0
  [<ffffffff81104b79>] do_filp_open+0x79/0xd0
  [<ffffffff81538219>] ? kmemleak_alloc+0x49/0xa0
  [<ffffffff810f4e44>] do_sys_open+0x114/0x1e0
  [<ffffffff810f4f29>] SyS_open+0x19/0x20
  [<ffffffff8153c2e0>] entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to
call idr_remove() unconditionally.

Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Reported-and-tested-by: David Disseldorp <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 25, 2016
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is reloaded.

As described in the bug report below, the following procedure would cause
an Oops with zram:

- provision three zram devices via modprobe zram num_devices=3
- configure a size for each device
  + echo "1G" > /sys/block/$zram_name/disksize
- mkfs and mount zram0 only
- attempt to hot remove all three devices
  + echo 2 > /sys/class/zram-control/hot_remove
  + echo 1 > /sys/class/zram-control/hot_remove
  + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
- unmount zram0
- try zram0 hot remove again
  + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
- unload zram kernel module
  + completes successfully
- zram0 device node still exists
- attempt to mount /dev/zram0
  + mount command is killed
  + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: [<ffffffff812eead6>] get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 torvalds#176
 task: ffff88001a9f2800 task.stack: ffffc90000300000
 RIP: 0010:[<ffffffff812eead6>]  [<ffffffff812eead6>] get_disk+0x16/0x50
 Call Trace:
  [<ffffffff812eeb1c>] exact_lock+0xc/0x20
  [<ffffffff813b3e1c>] kobj_lookup+0xdc/0x160
  [<ffffffff812edce0>] ? disk_map_sector_rcu+0x70/0x70
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff812eef4f>] get_gendisk+0x2f/0x110
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff81126e2c>] __blkdev_get+0x10c/0x3c0
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff8112727d>] blkdev_get+0x19d/0x2e0
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff81127466>] blkdev_open+0x56/0x70
  [<ffffffff810f3e0f>] do_dentry_open.isra.19+0x1ff/0x310
  [<ffffffff810f4aa3>] vfs_open+0x43/0x60
  [<ffffffff81103009>] path_openat+0x2c9/0xf30
  [<ffffffff81023c00>] ? __save_stack_trace+0x40/0xd0
  [<ffffffff81104b79>] do_filp_open+0x79/0xd0
  [<ffffffff81538219>] ? kmemleak_alloc+0x49/0xa0
  [<ffffffff810f4e44>] do_sys_open+0x114/0x1e0
  [<ffffffff810f4f29>] SyS_open+0x19/0x20
  [<ffffffff8153c2e0>] entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to
call idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Cc: <[email protected]>    [4.4+]
Signed-off-by: Andrew Morton <[email protected]>
torvalds pushed a commit that referenced this pull request Dec 1, 2016
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

 - provision three zram devices via modprobe zram num_devices=3
 - configure a size for each device
   + echo "1G" > /sys/block/$zram_name/disksize
 - mkfs and mount zram0 only
 - attempt to hot remove all three devices
   + echo 2 > /sys/class/zram-control/hot_remove
   + echo 1 > /sys/class/zram-control/hot_remove
   + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
 - unmount zram0
 - try zram0 hot remove again
   + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
 - unload zram kernel module
   + completes successfully
 - zram0 device node still exists
 - attempt to mount /dev/zram0
   + mount command is killed
   + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
 Call Trace:
   exact_lock+0xc/0x20
   kobj_lookup+0xdc/0x160
   get_gendisk+0x2f/0x110
   __blkdev_get+0x10c/0x3c0
   blkdev_get+0x19d/0x2e0
   blkdev_open+0x56/0x70
   do_dentry_open.isra.19+0x1ff/0x310
   vfs_open+0x43/0x60
   path_openat+0x2c9/0xf30
   do_filp_open+0x79/0xd0
   do_sys_open+0x114/0x1e0
   SyS_open+0x19/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Cc: <[email protected]>    [4.4+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Dec 2, 2016
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is reloaded.

As described in the bug report below, the following procedure would cause
an Oops with zram:

- provision three zram devices via modprobe zram num_devices=3
- configure a size for each device
  + echo "1G" > /sys/block/$zram_name/disksize
- mkfs and mount zram0 only
- attempt to hot remove all three devices
  + echo 2 > /sys/class/zram-control/hot_remove
  + echo 1 > /sys/class/zram-control/hot_remove
  + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
- unmount zram0
- try zram0 hot remove again
  + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
- unload zram kernel module
  + completes successfully
- zram0 device node still exists
- attempt to mount /dev/zram0
  + mount command is killed
  + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: [<ffffffff812eead6>] get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 torvalds#176
 task: ffff88001a9f2800 task.stack: ffffc90000300000
 RIP: 0010:[<ffffffff812eead6>]  [<ffffffff812eead6>] get_disk+0x16/0x50
 Call Trace:
  [<ffffffff812eeb1c>] exact_lock+0xc/0x20
  [<ffffffff813b3e1c>] kobj_lookup+0xdc/0x160
  [<ffffffff812edce0>] ? disk_map_sector_rcu+0x70/0x70
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff812eef4f>] get_gendisk+0x2f/0x110
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff81126e2c>] __blkdev_get+0x10c/0x3c0
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff8112727d>] blkdev_get+0x19d/0x2e0
  [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
  [<ffffffff81127466>] blkdev_open+0x56/0x70
  [<ffffffff810f3e0f>] do_dentry_open.isra.19+0x1ff/0x310
  [<ffffffff810f4aa3>] vfs_open+0x43/0x60
  [<ffffffff81103009>] path_openat+0x2c9/0xf30
  [<ffffffff81023c00>] ? __save_stack_trace+0x40/0xd0
  [<ffffffff81104b79>] do_filp_open+0x79/0xd0
  [<ffffffff81538219>] ? kmemleak_alloc+0x49/0xa0
  [<ffffffff810f4e44>] do_sys_open+0x114/0x1e0
  [<ffffffff810f4f29>] SyS_open+0x19/0x20
  [<ffffffff8153c2e0>] entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to
call idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Cc: <[email protected]>    [4.4+]
Signed-off-by: Andrew Morton <[email protected]>
Noltari pushed a commit to Noltari/linux that referenced this pull request Dec 8, 2016
commit 529e71e upstream.

The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

 - provision three zram devices via modprobe zram num_devices=3
 - configure a size for each device
   + echo "1G" > /sys/block/$zram_name/disksize
 - mkfs and mount zram0 only
 - attempt to hot remove all three devices
   + echo 2 > /sys/class/zram-control/hot_remove
   + echo 1 > /sys/class/zram-control/hot_remove
   + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
 - unmount zram0
 - try zram0 hot remove again
   + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
 - unload zram kernel module
   + completes successfully
 - zram0 device node still exists
 - attempt to mount /dev/zram0
   + mount command is killed
   + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 torvalds#176
 Call Trace:
   exact_lock+0xc/0x20
   kobj_lookup+0xdc/0x160
   get_gendisk+0x2f/0x110
   __blkdev_get+0x10c/0x3c0
   blkdev_get+0x19d/0x2e0
   blkdev_open+0x56/0x70
   do_dentry_open.isra.19+0x1ff/0x310
   vfs_open+0x43/0x60
   path_openat+0x2c9/0xf30
   do_filp_open+0x79/0xd0
   do_sys_open+0x114/0x1e0
   SyS_open+0x19/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
heftig referenced this pull request in zen-kernel/zen-kernel Dec 8, 2016
commit 529e71e upstream.

The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

 - provision three zram devices via modprobe zram num_devices=3
 - configure a size for each device
   + echo "1G" > /sys/block/$zram_name/disksize
 - mkfs and mount zram0 only
 - attempt to hot remove all three devices
   + echo 2 > /sys/class/zram-control/hot_remove
   + echo 1 > /sys/class/zram-control/hot_remove
   + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
 - unmount zram0
 - try zram0 hot remove again
   + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
 - unload zram kernel module
   + completes successfully
 - zram0 device node still exists
 - attempt to mount /dev/zram0
   + mount command is killed
   + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
 Call Trace:
   exact_lock+0xc/0x20
   kobj_lookup+0xdc/0x160
   get_gendisk+0x2f/0x110
   __blkdev_get+0x10c/0x3c0
   blkdev_get+0x19d/0x2e0
   blkdev_open+0x56/0x70
   do_dentry_open.isra.19+0x1ff/0x310
   vfs_open+0x43/0x60
   path_openat+0x2c9/0xf30
   do_filp_open+0x79/0xd0
   do_sys_open+0x114/0x1e0
   SyS_open+0x19/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
bgly pushed a commit to powervm/ibmvscsis that referenced this pull request Jan 31, 2017
BugLink: http://bugs.launchpad.net/bugs/1650604

commit 529e71e upstream.

The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

 - provision three zram devices via modprobe zram num_devices=3
 - configure a size for each device
   + echo "1G" > /sys/block/$zram_name/disksize
 - mkfs and mount zram0 only
 - attempt to hot remove all three devices
   + echo 2 > /sys/class/zram-control/hot_remove
   + echo 1 > /sys/class/zram-control/hot_remove
   + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
 - unmount zram0
 - try zram0 hot remove again
   + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
 - unload zram kernel module
   + completes successfully
 - zram0 device node still exists
 - attempt to mount /dev/zram0
   + mount command is killed
   + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 torvalds#176
 Call Trace:
   exact_lock+0xc/0x20
   kobj_lookup+0xdc/0x160
   get_gendisk+0x2f/0x110
   __blkdev_get+0x10c/0x3c0
   blkdev_get+0x19d/0x2e0
   blkdev_open+0x56/0x70
   do_dentry_open.isra.19+0x1ff/0x310
   vfs_open+0x43/0x60
   path_openat+0x2c9/0xf30
   do_filp_open+0x79/0xd0
   do_sys_open+0x114/0x1e0
   SyS_open+0x19/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Luis Henriques <[email protected]>
lukenels pushed a commit to lukenels/linux that referenced this pull request Feb 12, 2017
BugLink: http://bugs.launchpad.net/bugs/1650581

commit 529e71e upstream.

The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY).  This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.

As described in the bug report below, the following procedure would
cause an Oops with zram:

 - provision three zram devices via modprobe zram num_devices=3
 - configure a size for each device
   + echo "1G" > /sys/block/$zram_name/disksize
 - mkfs and mount zram0 only
 - attempt to hot remove all three devices
   + echo 2 > /sys/class/zram-control/hot_remove
   + echo 1 > /sys/class/zram-control/hot_remove
   + echo 0 > /sys/class/zram-control/hot_remove
     - zram0 removal fails with EBUSY, as expected
 - unmount zram0
 - try zram0 hot remove again
   + echo 0 > /sys/class/zram-control/hot_remove
     - fails with ENODEV (unexpected)
 - unload zram kernel module
   + completes successfully
 - zram0 device node still exists
 - attempt to mount /dev/zram0
   + mount command is killed
   + following BUG is encountered

 BUG: unable to handle kernel paging request at ffffffffa0002ba0
 IP: get_disk+0x16/0x50
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 torvalds#176
 Call Trace:
   exact_lock+0xc/0x20
   kobj_lookup+0xdc/0x160
   get_gendisk+0x2f/0x110
   __blkdev_get+0x10c/0x3c0
   blkdev_get+0x19d/0x2e0
   blkdev_open+0x56/0x70
   do_dentry_open.isra.19+0x1ff/0x310
   vfs_open+0x43/0x60
   path_openat+0x2c9/0xf30
   do_filp_open+0x79/0xd0
   do_sys_open+0x114/0x1e0
   SyS_open+0x19/0x20
   entry_SYSCALL_64_fastpath+0x13/0x94

This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.

Fixes: 17ec4cd ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
Reviewed-by: David Disseldorp <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Signed-off-by: Tim Gardner <[email protected]>
Signed-off-by: Luis Henriques <[email protected]>
laijs pushed a commit to laijs/linux that referenced this pull request Feb 13, 2017
wzyy2 pushed a commit to wzyy2/linux that referenced this pull request Jun 19, 2017
(1) use cpu id from bl31 delivers;
(2) sp_el0 should point to kernel address in EL1 mode.

On ARM64, kernel uses sp_el0 to store current_thread_info(),
we see a problem: when fiq occurs, cpu is EL1 mode but sp_el0
point to userspace address. At this moment, if we read
'current_thread_info()->cpu' or other, it leads an error.

We find above situation happens when save/restore cpu context
between system mode and user mode under heavy load.
Like 'ret_fast_syscall()', kernel restore context of user mode,
but fiq occurs before the instruction 'eret', so this causes the
above situation.

Assembly code:

ffffff80080826c8 <ret_fast_syscall>:

...skipping...

ffffff80080826fc:       d503201f        nop
ffffff8008082700:       d5384100        mrs     x0, sp_el0
ffffff8008082704:       f9400c00        ldr     x0, [x0,torvalds#24]
ffffff8008082708:       d5182000        msr     ttbr0_el1, x0
ffffff800808270c:       d5033fdf        isb
ffffff8008082710:       f9407ff7        ldr     x23, [sp,torvalds#248]
ffffff8008082714:       d5184117        msr     sp_el0, x23
ffffff8008082718:       d503201f        nop
ffffff800808271c:       d503201f        nop
ffffff8008082720:       d5184035        msr     elr_el1, x21
ffffff8008082724:       d5184016        msr     spsr_el1, x22
ffffff8008082728:       a94007e0        ldp     x0, x1, [sp]
ffffff800808272c:       a9410fe2        ldp     x2, x3, [sp,torvalds#16]
ffffff8008082730:       a94217e4        ldp     x4, x5, [sp,torvalds#32]
ffffff8008082734:       a9431fe6        ldp     x6, x7, [sp,torvalds#48]
ffffff8008082738:       a94427e8        ldp     x8, x9, [sp,torvalds#64]
ffffff800808273c:       a9452fea        ldp     x10, x11, [sp,torvalds#80]
ffffff8008082740:       a94637ec        ldp     x12, x13, [sp,torvalds#96]
ffffff8008082744:       a9473fee        ldp     x14, x15, [sp,torvalds#112]
ffffff8008082748:       a94847f0        ldp     x16, x17, [sp,torvalds#128]
ffffff800808274c:       a9494ff2        ldp     x18, x19, [sp,torvalds#144]
ffffff8008082750:       a94a57f4        ldp     x20, x21, [sp,torvalds#160]
ffffff8008082754:       a94b5ff6        ldp     x22, x23, [sp,torvalds#176]
ffffff8008082758:       a94c67f8        ldp     x24, x25, [sp,torvalds#192]
ffffff800808275c:       a94d6ffa        ldp     x26, x27, [sp,torvalds#208]
ffffff8008082760:       a94e77fc        ldp     x28, x29, [sp,torvalds#224]
ffffff8008082764:       f9407bfe        ldr     x30, [sp,torvalds#240]
ffffff8008082768:       9104c3ff        add     sp, sp, #0x130
ffffff800808276c:       d69f03e0        eret

Change-Id: I071e899f8a407764e166ca0403199c9d87d6ce78
Signed-off-by: chenjh <[email protected]>
torvalds pushed a commit that referenced this pull request Feb 4, 2018
I ran into an issue on my laptop that triggered a bug on the
discard path:

WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
 Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
 CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
 RIP: 0010:nvme_setup_cmd+0x3d3/0x430
 RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
 RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
 RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
 RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
 R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
 R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
 FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  nvme_queue_rq+0x40/0xa00
  ? __sbitmap_queue_get+0x24/0x90
  ? blk_mq_get_tag+0xa3/0x250
  ? wait_woken+0x80/0x80
  ? blk_mq_get_driver_tag+0x97/0xf0
  blk_mq_dispatch_rq_list+0x7b/0x4a0
  ? deadline_remove_request+0x49/0xb0
  blk_mq_do_dispatch_sched+0x4f/0xc0
  blk_mq_sched_dispatch_requests+0x106/0x170
  __blk_mq_run_hw_queue+0x53/0xa0
  __blk_mq_delay_run_hw_queue+0x83/0xa0
  blk_mq_run_hw_queue+0x6c/0xd0
  blk_mq_sched_insert_request+0x96/0x140
  __blk_mq_try_issue_directly+0x3d/0x190
  blk_mq_try_issue_directly+0x30/0x70
  blk_mq_make_request+0x1a4/0x6a0
  generic_make_request+0xfd/0x2f0
  ? submit_bio+0x5c/0x110
  submit_bio+0x5c/0x110
  ? __blkdev_issue_discard+0x152/0x200
  submit_bio_wait+0x43/0x60
  ext4_process_freed_data+0x1cd/0x440
  ? account_page_dirtied+0xe2/0x1a0
  ext4_journal_commit_callback+0x4a/0xc0
  jbd2_journal_commit_transaction+0x17e2/0x19e0
  ? kjournald2+0xb0/0x250
  kjournald2+0xb0/0x250
  ? wait_woken+0x80/0x80
  ? commit_timeout+0x10/0x10
  kthread+0x111/0x130
  ? kthread_create_worker_on_cpu+0x50/0x50
  ? do_group_exit+0x3a/0xa0
  ret_from_fork+0x1f/0x30
 Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff
 ---[ end trace 50d361cc444506c8 ]---
 print_req_error: I/O error, dev nvme0n1, sector 847167488

Decoding the assembly, the request claims to have 0xffff segments,
while nvme counts two. This turns out to be because we don't check
for a data carrying request on the mq scheduler path, and since
blk_phys_contig_segment() returns true for a non-data request,
we decrement the initial segment count of 0 and end up with
0xffff in the unsigned short.

There are a few issues here:

1) We should initialize the segment count for a discard to 1.
2) The discard merging is currently using the data limits for
   segments and sectors.

Fix this up by having attempt_merge() correctly identify the
request, and by initializing the segment count correctly
for discards.

This can only be triggered with mq-deadline on discard capable
devices right now, which isn't a common configuration.

Signed-off-by: Jens Axboe <[email protected]>
ldu4 pushed a commit to ldu4/linux that referenced this pull request Feb 16, 2018
GIT e237f98a9c134c3d600353f21e07db915516875b

commit 140995c9762dafd3247ce232273fe19cf9d8b38b
Author: Thierry Reding <[email protected]>
Date:   Mon Feb 5 13:54:36 2018 +0100

    net: mediatek: Explicitly include pinctrl headers
    
    The Mediatek ethernet driver fails to build after commit 23c35f48f5fb
    ("pinctrl: remove include file from <linux/device.h>") because it relies
    on the pinctrl/consumer.h and pinctrl/devinfo.h being pulled in by the
    device.h header implicitly.
    
    Include these headers explicitly to avoid the build failure.
    
    Cc: Linus Walleij <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 8fb572acb2191bd14fc1363bf73461a994842e6f
Author: Thierry Reding <[email protected]>
Date:   Mon Feb 5 13:47:50 2018 +0100

    mmc: meson-gx-mmc: Explicitly include pinctr/consumer.h
    
    The Meson GX MMC driver fails to build after commit 23c35f48f5fb
    ("pinctrl: remove include file from <linux/device.h>") because it relies
    on the pinctrl/consumer.h being pulled in by the device.h header
    implicitly.
    
    Include the header explicitly to avoid the build failure.
    
    Cc: Linus Walleij <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 1c16a9ce01487a98052d37a94e4c411b4fd9617b
Author: Thierry Reding <[email protected]>
Date:   Mon Feb 5 13:47:49 2018 +0100

    drm/rockchip: lvds: Explicitly include pinctrl headers
    
    The Rockchip LVDS driver fails to build after commit 23c35f48f5fb
    ("pinctrl: remove include file from <linux/device.h>") because it relies
    on the pinctrl/consumer.h and pinctrl/devinfo.h being pulled in by the
    device.h header implicitly.
    
    Include these headers explicitly to avoid the build failure.
    
    Cc: Linus Walleij <[email protected]>
    Signed-off-by: Thierry Reding <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 567af7fc9d87df3228ef59864f77fe100ec0cee3
Author: Stephen Rothwell <[email protected]>
Date:   Mon Feb 5 09:24:30 2018 +1100

    pinctrl: files should directly include apis they use
    
    Fixes: 23c35f48f5fb ("pinctrl: remove include file from <linux/device.h>")
    Signed-off-by: Stephen Rothwell <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 9b6faee074702bbbc207e7027b9416c2d8fea9fe
Author: Amir Goldstein <[email protected]>
Date:   Tue Jan 30 13:54:45 2018 +0200

    ovl: check ERR_PTR() return value from ovl_encode_fh()
    
    Another fix for an issue reported by 0-day robot.
    
    Reported-by: Dan Carpenter <[email protected]>
    Fixes: 8ed5eec9d6c4 ("ovl: encode pure upper file handles")
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>

commit 2aed489d163a6559e07dbc238882c9970ae0f65b
Author: Amir Goldstein <[email protected]>
Date:   Sun Jan 28 02:35:48 2018 +0200

    ovl: fix regression in fsnotify of overlay merge dir
    
    A re-factoring patch in NFS export series has passed the wrong argument
    to ovl_get_inode() causing a regression in the very recent fix to
    fsnotify of overlay merge dir.
    
    The regression has caused merge directory inodes to be hashed by upper
    instead of lower real inode, when NFS export and directory indexing is
    disabled. That caused an inotify watch to become obsolete after directory
    copy up and drop caches.
    
    LTP test inotify07 was improved to catch this regression.
    The regression also caused multiple redirect dirs to same origin not to
    be detected on lookup with NFS export disabled. An xfstest was added to
    cover this case.
    
    Fixes: 0aceb53e73be ("ovl: do not pass overlay dentry to ovl_get_inode()")
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Miklos Szeredi <[email protected]>

commit 0ae7d327a64b262443b7d3ebee5831e4dde47b89
Author: Georgi Djakov <[email protected]>
Date:   Tue Dec 5 17:47:00 2017 +0200

    dt-bindings: mailbox: qcom: Document the APCS clock binding
    
    Update the binding documentation for APCS to mention that the APCS
    hardware block also expose a clock controller functionality.
    
    The APCS clock controller is a mux and half-integer divider. It has the
    main CPU PLL as an input and provides the clock for the application CPU.
    
    Signed-off-by: Georgi Djakov <[email protected]>
    Reviewed-by: Rob Herring <[email protected]>
    Acked-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Jassi Brar <[email protected]>

commit c815d769b598196bdbd104a7e049d07ae6fba0d2
Author: Georgi Djakov <[email protected]>
Date:   Tue Dec 5 17:46:57 2017 +0200

    mailbox: qcom: Create APCS child device for clock controller
    
    There is a clock controller functionality provided by the APCS hardware
    block of msm8916 devices. The device-tree would represent an APCS node
    with both mailbox and clock provider properties.
    Create a platform child device for the clock controller functionality so
    the driver can probe and use APCS as parent.
    
    Signed-off-by: Georgi Djakov <[email protected]>
    Acked-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Jassi Brar <[email protected]>

commit c6a8b171ca8e338a3012420041346f0e50f7f649
Author: Georgi Djakov <[email protected]>
Date:   Tue Dec 5 17:46:56 2017 +0200

    mailbox: qcom: Convert APCS IPC driver to use regmap
    
    This hardware block provides more functionalities that just IPC. Convert
    it to regmap to allow other child platform devices to use the same regmap.
    
    Signed-off-by: Georgi Djakov <[email protected]>
    Acked-by: Bjorn Andersson <[email protected]>
    Signed-off-by: Jassi Brar <[email protected]>

commit b2ac58f90540e39324e7a29a7ad471407ae0bf48
Author: KarimAllah Ahmed <[email protected]>
Date:   Sat Feb 3 15:56:23 2018 +0100

    KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
    
    [ Based on a patch from Paolo Bonzini <[email protected]> ]
    
    ... basically doing exactly what we do for VMX:
    
    - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
    - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
      actually used it.
    
    Signed-off-by: KarimAllah Ahmed <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Darren Kenny <[email protected]>
    Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Jun Nakajima <[email protected]>
    Cc: [email protected]
    Cc: Dave Hansen <[email protected]>
    Cc: Tim Chen <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Asit Mallick <[email protected]>
    Cc: Arjan Van De Ven <[email protected]>
    Cc: Greg KH <[email protected]>
    Cc: Paolo Bonzini <[email protected]>
    Cc: Dan Williams <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Ashok Raj <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit d28b387fb74da95d69d2615732f50cceb38e9a4d
Author: KarimAllah Ahmed <[email protected]>
Date:   Thu Feb 1 22:59:45 2018 +0100

    KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
    
    [ Based on a patch from Ashok Raj <[email protected]> ]
    
    Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
    guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
    be using a retpoline+IBPB based approach.
    
    To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
    guests that do not actually use the MSR, only start saving and restoring
    when a non-zero is written to it.
    
    No attempt is made to handle STIBP here, intentionally. Filtering STIBP
    may be added in a future patch, which may require trapping all writes
    if we don't want to pass it through directly to the guest.
    
    [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
    
    Signed-off-by: KarimAllah Ahmed <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Darren Kenny <[email protected]>
    Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
    Reviewed-by: Jim Mattson <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Jun Nakajima <[email protected]>
    Cc: [email protected]
    Cc: Dave Hansen <[email protected]>
    Cc: Tim Chen <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Asit Mallick <[email protected]>
    Cc: Arjan Van De Ven <[email protected]>
    Cc: Greg KH <[email protected]>
    Cc: Paolo Bonzini <[email protected]>
    Cc: Dan Williams <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Ashok Raj <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd
Author: KarimAllah Ahmed <[email protected]>
Date:   Thu Feb 1 22:59:44 2018 +0100

    KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
    
    Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
    (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
    contents will come directly from the hardware, but user-space can still
    override it.
    
    [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
    
    Signed-off-by: KarimAllah Ahmed <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Paolo Bonzini <[email protected]>
    Reviewed-by: Darren Kenny <[email protected]>
    Reviewed-by: Jim Mattson <[email protected]>
    Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Jun Nakajima <[email protected]>
    Cc: [email protected]
    Cc: Dave Hansen <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Asit Mallick <[email protected]>
    Cc: Arjan Van De Ven <[email protected]>
    Cc: Greg KH <[email protected]>
    Cc: Dan Williams <[email protected]>
    Cc: Tim Chen <[email protected]>
    Cc: Ashok Raj <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit 15d45071523d89b3fb7372e2135fbd72f6af9506
Author: Ashok Raj <[email protected]>
Date:   Thu Feb 1 22:59:43 2018 +0100

    KVM/x86: Add IBPB support
    
    The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
    control mechanism. It keeps earlier branches from influencing
    later ones.
    
    Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
    It's a command that ensures predicted branch targets aren't used after
    the barrier. Although IBRS and IBPB are enumerated by the same CPUID
    enumeration, IBPB is very different.
    
    IBPB helps mitigate against three potential attacks:
    
    * Mitigate guests from being attacked by other guests.
      - This is addressed by issing IBPB when we do a guest switch.
    
    * Mitigate attacks from guest/ring3->host/ring3.
      These would require a IBPB during context switch in host, or after
      VMEXIT. The host process has two ways to mitigate
      - Either it can be compiled with retpoline
      - If its going through context switch, and has set !dumpable then
        there is a IBPB in that path.
        (Tim's patch: https://patchwork.kernel.org/patch/10192871)
      - The case where after a VMEXIT you return back to Qemu might make
        Qemu attackable from guest when Qemu isn't compiled with retpoline.
      There are issues reported when doing IBPB on every VMEXIT that resulted
      in some tsc calibration woes in guest.
    
    * Mitigate guest/ring0->host/ring0 attacks.
      When host kernel is using retpoline it is safe against these attacks.
      If host kernel isn't using retpoline we might need to do a IBPB flush on
      every VMEXIT.
    
    Even when using retpoline for indirect calls, in certain conditions 'ret'
    can use the BTB on Skylake-era CPUs. There are other mitigations
    available like RSB stuffing/clearing.
    
    * IBPB is issued only for SVM during svm_free_vcpu().
      VMX has a vmclear and SVM doesn't.  Follow discussion here:
      https://lkml.org/lkml/2018/1/15/146
    
    Please refer to the following spec for more details on the enumeration
    and control.
    
    Refer here to get documentation about mitigations.
    
    https://software.intel.com/en-us/side-channel-security-support
    
    [peterz: rebase and changelog rewrite]
    [karahmed: - rebase
               - vmx: expose PRED_CMD if guest has it in CPUID
               - svm: only pass through IBPB if guest has it in CPUID
               - vmx: support !cpu_has_vmx_msr_bitmap()]
               - vmx: support nested]
    [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
            PRED_CMD is a write-only MSR]
    
    Signed-off-by: Ashok Raj <[email protected]>
    Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: KarimAllah Ahmed <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: [email protected]
    Cc: Asit Mallick <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Arjan Van De Ven <[email protected]>
    Cc: Greg KH <[email protected]>
    Cc: Jun Nakajima <[email protected]>
    Cc: Paolo Bonzini <[email protected]>
    Cc: Dan Williams <[email protected]>
    Cc: Tim Chen <[email protected]>
    Link: http://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]

commit b7b27aa011a1df42728d1768fc181d9ce69e6911
Author: KarimAllah Ahmed <[email protected]>
Date:   Thu Feb 1 22:59:42 2018 +0100

    KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
    
    [dwmw2: Stop using KF() for bits in it, too]
    Signed-off-by: KarimAllah Ahmed <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Paolo Bonzini <[email protected]>
    Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
    Reviewed-by: Jim Mattson <[email protected]>
    Cc: [email protected]
    Cc: Radim Krčmář <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit 23c35f48f5fbe33f68904138b23fee64df7d2f0f
Author: Linus Torvalds <[email protected]>
Date:   Fri Feb 2 16:44:14 2018 -0800

    pinctrl: remove include file from <linux/device.h>
    
    When pulling the recent pinctrl merge, I was surprised by how a
    pinctrl-only pull request ended up rebuilding basically the whole
    kernel.
    
    The reason for that ended up being that <linux/device.h> included
    <linux/pinctrl/devinfo.h>, so any change to that file ended up causing
    pretty much every driver out there to be rebuilt.
    
    The reason for that was because 'struct device' has this in it:
    
        #ifdef CONFIG_PINCTRL
            struct dev_pin_info     *pins;
        #endif
    
    but we already avoid header includes for these kinds of things in that
    header file, preferring to just use a forward-declaration of the
    structure instead.  Exactly to avoid this kind of header dependency.
    
    Since some drivers seem to expect that <linux/pinctrl/devinfo.h> header
    to come in automatically, move the include to <linux/pinctrl/pinctrl.h>
    instead.  It might be better to just make the includes more targeted,
    but I'm not going to review every driver.
    
    It would definitely be good to have a tool for finding and minimizing
    header dependencies automatically - or at least help with them.  Right
    now we almost certainly end up having way too many of these things, and
    it's hard to test every single configuration.
    
    FWIW, you can get a sense of the "hotness" of a header file with something
    like this after doing a full build:
    
        find . -name '.*.o.cmd' -print0 |
            xargs -0 tail --lines=+2 |
            grep -v 'wildcard ' |
            tr ' \\' '\n' |
            sort | uniq -c | sort -n | less -S
    
    which isn't exact (there are other things in those '*.o.cmd' than just
    the dependencies, and the "--lines=+2" only removes the header), but
    might a useful approximation.
    
    With this patch, <linux/pinctrl/devinfo.h> drops to "only" having 833
    users in the current x86-64 allmodconfig.  In contrast, <linux/device.h>
    has 14857 build files including it directly or indirectly.
    
    Of course, the headers that absolutely _everybody_ includes (things like
    <linux/types.h> etc) get a score of 23000+.
    
    Cc: Linus Walleij <[email protected]>
    Cc: Greg Kroah-Hartman <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit a81114d03e4a529c4b68293249f75438b3c1783f
Author: Ard Biesheuvel <[email protected]>
Date:   Sat Feb 3 11:25:20 2018 +0100

    firmware: dmi: handle missing DMI data gracefully
    
    Currently, when booting a kernel with DMI support on a platform that has
    no DMI tables, the following output is emitted into the kernel log:
    
      [    0.128818] DMI not present or invalid.
      ...
      [    1.306659] dmi: Firmware registration failed.
      ...
      [    2.908681] dmi-sysfs: dmi entry is absent.
    
    The first one is a pr_info(), but the subsequent ones are pr_err()s that
    complain about a condition that is not really an error to begin with.
    
    So let's clean this up, and give up silently if dma_available is not set.
    
    Signed-off-by: Ard Biesheuvel <[email protected]>
    Acked-by: Martin Hundebøll <[email protected]>
    Signed-off-by: Jean Delvare <[email protected]>

commit a7770ae194569e96a93c48aceb304edded9cc648
Author: Jean Delvare <[email protected]>
Date:   Sat Feb 3 11:25:20 2018 +0100

    firmware: dmi_scan: Fix handling of empty DMI strings
    
    The handling of empty DMI strings looks quite broken to me:
    * Strings from 1 to 7 spaces are not considered empty.
    * True empty DMI strings (string index set to 0) are not considered
      empty, and result in allocating a 0-char string.
    * Strings with invalid index also result in allocating a 0-char
      string.
    * Strings starting with 8 spaces are all considered empty, even if
      non-space characters follow (sounds like a weird thing to do, but
      I have actually seen occurrences of this in DMI tables before.)
    * Strings which are considered empty are reported as 8 spaces,
      instead of being actually empty.
    
    Some of these issues are the result of an off-by-one error in memcmp,
    the rest is incorrect by design.
    
    So let's get it square: missing strings and strings made of only
    spaces, regardless of their length, should be treated as empty and
    no memory should be allocated for them. All other strings are
    non-empty and should be allocated.
    
    Signed-off-by: Jean Delvare <[email protected]>
    Fixes: 79da4721117f ("x86: fix DMI out of memory problems")
    Cc: Parag Warudkar <[email protected]>
    Cc: Ingo Molnar <[email protected]>
    Cc: Thomas Gleixner <[email protected]>

commit 7117794feb1602ea5efca1c7bfd5b78c3278d29d
Author: Jean Delvare <[email protected]>
Date:   Sat Feb 3 11:25:20 2018 +0100

    firmware: dmi_scan: Drop dmi_initialized
    
    I don't think it makes sense to check for a possible bad
    initialization order at run time on every system when it is all
    decided at build time.
    
    A more efficient way to make sure developers do not introduce new
    calls to dmi_check_system() too early in the initialization sequence
    is to simply document the expected call order. That way, developers
    have a chance to get it right immediately, without having to
    test-boot their kernel, wonder why it does not work, and parse the
    kernel logs for a warning message. And we get rid of the run-time
    performance penalty as a nice side effect.
    
    Signed-off-by: Jean Delvare <[email protected]>
    Cc: Ingo Molnar <[email protected]>

commit 8cf4e6a04f734e831c2ac7f405071d1cde690ba8
Author: Jean Delvare <[email protected]>
Date:   Sat Feb 3 11:25:20 2018 +0100

    firmware: dmi: Optimize dmi_matches
    
    Function dmi_matches can me made a bit faster:
    
    * The documented purpose of dmi_initialized is to catch too early
      calls to dmi_check_system(). I'm not fully convinced it justifies
      slowing down the initialization of all systems out there, but at
      least the check should not have been moved from dmi_check_system()
      to dmi_matches(). dmi_matches() is being called for every entry of
      the table passed to dmi_check_system(), causing the same redundant
      check to be performed again and again. So move it back to
      dmi_check_system(), reverting this specific portion of commit
      d7b1956fed33 ("DMI: Introduce dmi_first_match to make the interface
      more flexible").
    
    * Don't check for the exact_match flag again when we already know its
      value.
    
    Signed-off-by: Jean Delvare <[email protected]>
    Fixes: d7b1956fed33 ("DMI: Introduce dmi_first_match to make the interface more flexible")
    Cc: Jani Nikula <[email protected]>
    Cc: Daniel Vetter <[email protected]>
    Cc: Rafael J. Wysocki <[email protected]>
    Cc: Jeff Garzik <[email protected]>

commit edbe69ef2c90fc86998a74b08319a01c508bd497
Author: Roman Gushchin <[email protected]>
Date:   Fri Feb 2 15:26:57 2018 +0000

    Revert "defer call to mem_cgroup_sk_alloc()"
    
    This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
    defer call to mem_cgroup_sk_alloc()").
    
    Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
    memcg socket memory accounting, as packets received before memcg
    pointer initialization are not accounted and are causing refcounting
    underflow on socket release.
    
    Actually the free-after-use problem was fixed by
    commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
    sk_clone_lock()") for the cgroup pointer.
    
    So, let's revert it and call mem_cgroup_sk_alloc() just before
    cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
    we're cloning, and it holds a reference to the memcg.
    
    Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
    mem_cgroup_sk_alloc(). I see no reasons why bumping the root
    memcg counter is a good reason to panic, and there are no realistic
    ways to hit it.
    
    Signed-off-by: Roman Gushchin <[email protected]>
    Cc: Eric Dumazet <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Tejun Heo <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 4db428a7c9ab07e08783e0fcdc4ca0f555da0567
Author: Eric Dumazet <[email protected]>
Date:   Fri Feb 2 10:27:27 2018 -0800

    soreuseport: fix mem leak in reuseport_add_sock()
    
    reuseport_add_sock() needs to deal with attaching a socket having
    its own sk_reuseport_cb, after a prior
    setsockopt(SO_ATTACH_REUSEPORT_?BPF)
    
    Without this fix, not only a WARN_ONCE() was issued, but we were also
    leaking memory.
    
    Thanks to sysbot and Eric Biggers for providing us nice C repros.
    
    ------------[ cut here ]------------
    socket already in reuseport group
    WARNING: CPU: 0 PID: 3496 at net/core/sock_reuseport.c:119  
    reuseport_add_sock+0x742/0x9b0 net/core/sock_reuseport.c:117
    Kernel panic - not syncing: panic_on_warn set ...
    
    CPU: 0 PID: 3496 Comm: syzkaller869503 Not tainted 4.15.0-rc6+ #245
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS  
    Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:17 [inline]
      dump_stack+0x194/0x257 lib/dump_stack.c:53
      panic+0x1e4/0x41c kernel/panic.c:183
      __warn+0x1dc/0x200 kernel/panic.c:547
      report_bug+0x211/0x2d0 lib/bug.c:184
      fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
      fixup_bug arch/x86/kernel/traps.c:247 [inline]
      do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
      do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
      invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
    
    Fixes: ef456144da8e ("soreuseport: define reuseport groups")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reported-by: [email protected]
    Acked-by: Craig Gallek <[email protected]>
    
    Signed-off-by: David S. Miller <[email protected]>

commit cfabb1779d725c6d719793e44f5c50382eae6227
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 16:45:44 2018 +0100

    net: qlge: use memmove instead of skb_copy_to_linear_data
    
    gcc-8 points out that the skb_copy_to_linear_data() argument points to
    the skb itself, which makes it run into a problem with overlapping
    memcpy arguments:
    
    In file included from include/linux/ip.h:20,
                     from drivers/net/ethernet/qlogic/qlge/qlge_main.c:26:
    drivers/net/ethernet/qlogic/qlge/qlge_main.c: In function 'ql_realign_skb':
    include/linux/skbuff.h:3378:2: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
      memcpy(skb->data, from, len);
    
    It's unclear to me what the best solution is, maybe it ought to use a
    different helper that adjusts the skb data in a safe way. Simply using
    memmove() here seems like the easiest workaround.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 11f711081af0eb54190dc0de96ba4a9cd494666b
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 16:44:47 2018 +0100

    net: qed: use correct strncpy() size
    
    passing the strlen() of the source string as the destination
    length is pointless, and gcc-8 now warns about it:
    
    drivers/net/ethernet/qlogic/qed/qed_debug.c: In function 'qed_grc_dump':
    include/linux/string.h:253: error: 'strncpy' specified bound depends on the length of the source argument [-Werror=stringop-overflow=]
    
    This changes qed_grc_dump_big_ram() to instead uses the length of
    the destination buffer, and use strscpy() to guarantee nul-termination.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 1a91649fd35ff53a646981e212496f1ae92a8487
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 16:18:37 2018 +0100

    net: cxgb4: avoid memcpy beyond end of source buffer
    
    Building with link-time-optimizations revealed that the cxgb4 driver does
    a fixed-size memcpy() from a variable-length constant string into the
    network interface name:
    
    In function 'memcpy',
        inlined from 'cfg_queues_uld.constprop' at drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:335:2,
        inlined from 'cxgb4_register_uld.constprop' at drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c:719:9:
    include/linux/string.h:350:3: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter
       __read_overflow2();
       ^
    
    I can see two equally workable solutions: either we use a strncpy() instead
    of the memcpy() to stop at the end of the input, or we make the source buffer
    fixed length as well. This implements the latter.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 058a6c033488494a6b1477b05fe8e1a16e344462
Author: Paolo Abeni <[email protected]>
Date:   Fri Feb 2 16:02:22 2018 +0100

    cls_u32: add missing RCU annotation.
    
    In a couple of points of the control path, n->ht_down is currently
    accessed without the required RCU annotation. The accesses are
    safe, but sparse complaints. Since we already held the
    rtnl lock, let use rtnl_dereference().
    
    Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs")
    Fixes: de5df63228fc ("net: sched: cls_u32 changes to knode must appear atomic to readers")
    Signed-off-by: Paolo Abeni <[email protected]>
    Acked-by: Cong Wang <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit aece4770fba62102951891c2f349a255c83eacb9
Author: Hayes Wang <[email protected]>
Date:   Fri Feb 2 16:43:36 2018 +0800

    r8152: set rx mode early when linking on
    
    Set rx mode before calling netif_wake_queue() when linking on to avoid
    the device missing the receiving packets.
    
    The transmission may start after calling netif_wake_queue(), and the
    packets of resopnse may reach before calling rtl8152_set_rx_mode()
    which let the device could receive packets. Then, the packets of
    response would be missed.
    
    Signed-off-by: Hayes Wang <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit ea6499e160a74ea813e53e7bef2ccb22df1e4929
Author: Hayes Wang <[email protected]>
Date:   Fri Feb 2 16:43:35 2018 +0800

    r8152: fix wrong checksum status for received IPv4 packets
    
    The device could only check the checksum of TCP and UDP packets. Therefore,
    for the IPv4 packets excluding TCP and UDP, the check of checksum is necessary,
    even though the IP checksum is correct.
    
    Take ICMP for example, The IP checksum may be correct, but the ICMP checksum
    may be wrong.
    
    Signed-off-by: Hayes Wang <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 1d8ef0c07664dc48f2ff19a90b62dd3f6f425547
Author: Edwin Peer <[email protected]>
Date:   Thu Feb 1 19:41:43 2018 -0800

    nfp: fix TLV offset calculation
    
    The data pointer in the config space TLV parser already includes
    NFP_NET_CFG_TLV_BASE, it should not be added again. Incorrect
    offset values were only used in printed user output, rendering
    the bug merely cosmetic.
    
    Fixes: 73a0329b057e ("nfp: add TLV capabilities to the BAR")
    Signed-off-by: Edwin Peer <[email protected]>
    Reviewed-by: Jakub Kicinski <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 328008a72d38b5bde6491e463405c34a81a65d3e
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 15:56:18 2018 +0100

    x86/power: Fix swsusp_arch_resume prototype
    
    The declaration for swsusp_arch_resume marks it as 'asmlinkage', but the
    definition in x86-32 does not, and it fails to include the header with the
    declaration. This leads to a warning when building with
    link-time-optimizations:
    
    kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch]
     extern asmlinkage int swsusp_arch_resume(void);
                           ^
    arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here
     int swsusp_arch_resume(void)
    
    This moves the declaration into a globally visible header file and fixes up
    both x86 definitions to match it.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: Len Brown <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Nicolas Pitre <[email protected]>
    Cc: [email protected]
    Cc: "Rafael J. Wysocki" <[email protected]>
    Cc: Pavel Machek <[email protected]>
    Cc: Bart Van Assche <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit ebfc15019cfa72496c674ffcb0b8ef10790dcddc
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 15:56:17 2018 +0100

    x86/dumpstack: Avoid uninitlized variable
    
    In some configurations, 'partial' does not get initialized, as shown by
    this gcc-8 warning:
    
    arch/x86/kernel/dumpstack.c: In function 'show_trace_log_lvl':
    arch/x86/kernel/dumpstack.c:156:4: error: 'partial' may be used uninitialized in this function [-Werror=maybe-uninitialized]
        show_regs_if_on_stack(&stack_info, regs, partial);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    This initializes it to false, to get the previous behavior in this case.
    
    Fixes: a9cdbe72c4e8 ("x86/dumpstack: Fix partial register dumps")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Nicolas Pitre <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Josh Poimboeuf <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit af189c95a371b59f493dbe0f50c0a09724868881
Author: Darren Kenny <[email protected]>
Date:   Fri Feb 2 19:12:20 2018 +0000

    x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
    
    Fixes: 117cc7a908c83 ("x86/retpoline: Fill return stack buffer on vmexit")
    Signed-off-by: Darren Kenny <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
    Cc: Tom Lendacky <[email protected]>
    Cc: Andi Kleen <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Arjan van de Ven <[email protected]>
    Cc: David Woodhouse <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit 4bf5d56d429cbc96c23d809a08f63cd29e1a702e
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 22:39:23 2018 +0100

    x86/pti: Mark constant arrays as __initconst
    
    I'm seeing build failures from the two newly introduced arrays that
    are marked 'const' and '__initdata', which are mutually exclusive:
    
    arch/x86/kernel/cpu/common.c:882:43: error: 'cpu_no_speculation' causes a section type conflict with 'e820_table_firmware_init'
    arch/x86/kernel/cpu/common.c:895:43: error: 'cpu_no_meltdown' causes a section type conflict with 'e820_table_firmware_init'
    
    The correct annotation is __initconst.
    
    Fixes: fec9434a12f3 ("x86/pti: Do not enable PTI on CPUs which are not vulnerable to Meltdown")
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: Ricardo Neri <[email protected]>
    Cc: Andy Lutomirski <[email protected]>
    Cc: Borislav Petkov <[email protected]>
    Cc: Thomas Garnier <[email protected]>
    Cc: David Woodhouse <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]

commit 1d51877578799bfe0fcfe189d8233c9fccf05931
Author: Arnd Bergmann <[email protected]>
Date:   Fri Feb 2 16:03:04 2018 +0100

    block: skd: fix incorrect linux/slab_def.h inclusion
    
    skd includes slab_def.h to get access to the slab cache object size.
    However, including this header breaks when we use SLUB or SLOB instead of
    the SLAB allocator, since the structure layout is completely different,
    as shown by this warning when we build this driver in one of the invalid
    configurations with link-time optimizations enabled:
    
    include/linux/slab.h:715:0: error: type of 'kmem_cache_size' does not match original declaration [-Werror=lto-type-mismatch]
     unsigned int kmem_cache_size(struct kmem_cache *s);
    
    mm/slab_common.c:77:14: note: 'kmem_cache_size' was previously declared here
     unsigned int kmem_cache_size(struct kmem_cache *s)
                  ^
    mm/slab_common.c:77:14: note: code may be misoptimized unless -fno-strict-aliasing is used
    include/linux/slab.h:147:0: error: type of 'kmem_cache_destroy' does not match original declaration [-Werror=lto-type-mismatch]
     void kmem_cache_destroy(struct kmem_cache *);
    
    mm/slab_common.c:858:6: note: 'kmem_cache_destroy' was previously declared here
     void kmem_cache_destroy(struct kmem_cache *s)
          ^
    mm/slab_common.c:858:6: note: code may be misoptimized unless -fno-strict-aliasing is used
    include/linux/slab.h:140:0: error: type of 'kmem_cache_create' does not match original declaration [-Werror=lto-type-mismatch]
     struct kmem_cache *kmem_cache_create(const char *name, size_t size,
    
    mm/slab_common.c:534:1: note: 'kmem_cache_create' was previously declared here
     kmem_cache_create(const char *name, size_t size, size_t align,
     ^
    
    This removes the header inclusion and instead uses the kmem_cache_size()
    interface to get the size in a reliable way.
    
    Signed-off-by: Arnd Bergmann <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit 60f91826ca62bcf85d6d5fc90941337282787671
Author: Kemi Wang <[email protected]>
Date:   Tue Oct 24 09:16:42 2017 +0800

    buffer: Avoid setting buffer bits that are already set
    
    It's expensive to set buffer flags that are already set, because that
    causes a costly cache line transition.
    
    A common case is setting the "verified" flag during ext4 writes.
    This patch checks for the flag being set first.
    
    With the AIM7/creat-clo benchmark testing on a 48G ramdisk based-on ext4
    file system, we see 3.3%(15431->15936) improvement of aim7.jobs-per-min on
    a 2-sockets broadwell platform.
    
    What the benchmark does is: it forks 3000 processes, and each  process do
    the following:
    a) open a new file
    b) close the file
    c) delete the file
    until loop=100*1000 times.
    
    The original patch is contributed by Andi Kleen.
    
    Signed-off-by: Andi Kleen <[email protected]>
    Tested-by: Kemi Wang <[email protected]>
    Signed-off-by: Kemi Wang <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit 9005c6834c0ffdfe46afa76656bd9276cca864f6
Author: KarimAllah Ahmed <[email protected]>
Date:   Thu Feb 1 11:27:21 2018 +0000

    x86/spectre: Simplify spectre_v2 command line parsing
    
    [dwmw2: Use ARRAY_SIZE]
    
    Signed-off-by: KarimAllah Ahmed <[email protected]>
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/[email protected]

commit 66f793099a636862a71c59d4a6ba91387b155e0c
Author: David Woodhouse <[email protected]>
Date:   Thu Feb 1 11:27:20 2018 +0000

    x86/retpoline: Avoid retpolines for built-in __init functions
    
    There's no point in building init code with retpolines, since it runs before
    any potentially hostile userspace does. And before the retpoline is actually
    ALTERNATIVEd into place, for much of it.
    
    Signed-off-by: David Woodhouse <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Cc: [email protected]
    Link: https://lkml.kernel.org/r/[email protected]

commit e2598077dc6a26c9644393e5c21f22a90dbdccdb
Author: Mimi Zohar <[email protected]>
Date:   Tue Jan 23 10:00:41 2018 -0500

    ima: re-initialize iint->atomic_flags
    
    Intermittently security.ima is not being written for new files.  This
    patch re-initializes the new slab iint->atomic_flags field before
    freeing it.
    
    Fixes: commit 0d73a55208e9 ("ima: re-introduce own integrity cache lock")
    Signed-off-by: Mimi Zohar <[email protected]>
    Signed-off-by: James Morris <[email protected]>

commit 7825cd83fad7a30328bc874062eb19bdb2fbb38b
Author: Mimi Zohar <[email protected]>
Date:   Wed Jan 31 22:14:36 2018 -0500

    maintainers: update trusted keys
    
    Adding James Bottomley as the new maintainer for trusted keys.
    
    Signed-off-by: Mimi Zohar <[email protected]>
    Signed-off-by: James Morris <[email protected]>

commit 76883f7988e6d06a97232e979bc7aaa7846a134b
Author: Darrick J. Wong <[email protected]>
Date:   Wed Jan 31 09:47:25 2018 -0800

    xfs: remove experimental tag for reverse mapping
    
    Reverse mapping has had a while to soak, so remove the experimental tag.
    Now that we've landed space metadata cross-referencing in scrub, the
    feature actually has a purpose.
    
    Reject rmap filesystems with an rt device until the code to support it
    is actually implemented.
    
    Signed-off-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>
    Reviewed-by: Bill O'Donnell <[email protected]>

commit c14632ddac98dca7ab1740461fae330d09909560
Author: Darrick J. Wong <[email protected]>
Date:   Wed Jan 31 16:38:18 2018 -0800

    xfs: don't allow reflink + realtime filesystems
    
    We don't support realtime filesystems with reflink either, so fail
    those mounts.
    
    Signed-off-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Bill O'Donnell <[email protected]>

commit b6e03c10bf3ff08c7678a946a2208b60e66f4426
Author: Darrick J. Wong <[email protected]>
Date:   Wed Jan 31 14:21:56 2018 -0800

    xfs: don't allow DAX on reflink filesystems
    
    Now that reflink is no longer experimental, reject attempts to mount
    with DAX until that whole mess gets sorted out.
    
    Signed-off-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Bill O'Donnell <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>

commit 494370ccaae891de0a99b3c23b2df482c95cab8c
Author: Eric Sandeen <[email protected]>
Date:   Wed Jan 31 11:31:10 2018 -0800

    xfs: add scrub to XFS_BUILD_OPTIONS
    
    Advertise this config option along with the others.
    
    Signed-off-by: Eric Sandeen <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Darrick J. Wong <[email protected]>

commit bea99a500773fdfdb16b7dbfbaa00af7a6f0dc3b
Author: Keith Busch <[email protected]>
Date:   Thu Feb 1 14:41:15 2018 -0700

    blk-mq-sched: Enable merging discard bio into request
    
    Signed-off-by: Keith Busch <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>

commit 445251d0f4d329aa061f323546cd6388a3bb7ab5
Author: Jens Axboe <[email protected]>
Date:   Thu Feb 1 14:01:02 2018 -0700

    blk-mq: fix discard merge with scheduler attached
    
    I ran into an issue on my laptop that triggered a bug on the
    discard path:
    
    WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3/0x430
     Modules linked in: rfcomm fuse ctr ccm bnep arc4 binfmt_misc snd_hda_codec_hdmi nls_iso8859_1 nls_cp437 vfat snd_hda_codec_conexant fat snd_hda_codec_generic iwlmvm snd_hda_intel snd_hda_codec snd_hwdep mac80211 snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq x86_pkg_temp_thermal intel_powerclamp kvm_intel uvcvideo iwlwifi btusb snd_seq_device videobuf2_vmalloc btintel videobuf2_memops kvm snd_timer videobuf2_v4l2 bluetooth irqbypass videobuf2_core aesni_intel aes_x86_64 crypto_simd cryptd snd glue_helper videodev cfg80211 ecdh_generic soundcore hid_generic usbhid hid i915 psmouse e1000e ptp pps_core xhci_pci xhci_hcd intel_gtt
     CPU: 2 PID: 207 Comm: jbd2/nvme0n1p7- Tainted: G     U           4.15.0+ #176
     Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET59W (1.33 ) 12/19/2017
     RIP: 0010:nvme_setup_cmd+0x3d3/0x430
     RSP: 0018:ffff880423e9f838 EFLAGS: 00010217
     RAX: 0000000000000000 RBX: ffff880423e9f8c8 RCX: 0000000000010000
     RDX: ffff88022b200010 RSI: 0000000000000002 RDI: 00000000327f0000
     RBP: ffff880421251400 R08: ffff88022b200000 R09: 0000000000000009
     R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000ffff
     R13: ffff88042341e280 R14: 000000000000ffff R15: ffff880421251440
     FS:  0000000000000000(0000) GS:ffff880441500000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000055b684795030 CR3: 0000000002e09006 CR4: 00000000001606e0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     Call Trace:
      nvme_queue_rq+0x40/0xa00
      ? __sbitmap_queue_get+0x24/0x90
      ? blk_mq_get_tag+0xa3/0x250
      ? wait_woken+0x80/0x80
      ? blk_mq_get_driver_tag+0x97/0xf0
      blk_mq_dispatch_rq_list+0x7b/0x4a0
      ? deadline_remove_request+0x49/0xb0
      blk_mq_do_dispatch_sched+0x4f/0xc0
      blk_mq_sched_dispatch_requests+0x106/0x170
      __blk_mq_run_hw_queue+0x53/0xa0
      __blk_mq_delay_run_hw_queue+0x83/0xa0
      blk_mq_run_hw_queue+0x6c/0xd0
      blk_mq_sched_insert_request+0x96/0x140
      __blk_mq_try_issue_directly+0x3d/0x190
      blk_mq_try_issue_directly+0x30/0x70
      blk_mq_make_request+0x1a4/0x6a0
      generic_make_request+0xfd/0x2f0
      ? submit_bio+0x5c/0x110
      submit_bio+0x5c/0x110
      ? __blkdev_issue_discard+0x152/0x200
      submit_bio_wait+0x43/0x60
      ext4_process_freed_data+0x1cd/0x440
      ? account_page_dirtied+0xe2/0x1a0
      ext4_journal_commit_callback+0x4a/0xc0
      jbd2_journal_commit_transaction+0x17e2/0x19e0
      ? kjournald2+0xb0/0x250
      kjournald2+0xb0/0x250
      ? wait_woken+0x80/0x80
      ? commit_timeout+0x10/0x10
      kthread+0x111/0x130
      ? kthread_create_worker_on_cpu+0x50/0x50
      ? do_group_exit+0x3a/0xa0
      ret_from_fork+0x1f/0x30
     Code: 73 89 c1 83 ce 10 c1 e1 10 09 ca 83 f8 04 0f 87 0f ff ff ff 8b 4d 20 48 8b 7d 00 c1 e9 09 48 01 8c c7 00 08 00 00 e9 f8 fe ff ff <0f> ff 4c 89 c7 41 bc 0a 00 00 00 e8 0d 78 d6 ff e9 a1 fc ff ff
     ---[ end trace 50d361cc444506c8 ]---
     print_req_error: I/O error, dev nvme0n1, sector 847167488
    
    Decoding the assembly, the request claims to have 0xffff segments,
    while nvme counts two. This turns out to be because we don't check
    for a data carrying request on the mq scheduler path, and since
    blk_phys_contig_segment() returns true for a non-data request,
    we decrement the initial segment count of 0 and end up with
    0xffff in the unsigned short.
    
    There are a few issues here:
    
    1) We should initialize the segment count for a discard to 1.
    2) The discard merging is currently using the data limits for
       segments and sectors.
    
    Fix this up by having attempt_merge() correctly identify the
    request, and by initializing the segment count correctly
    for discards.
    
    This can only be triggered with mq-deadline on discard capable
    devices right now, which isn't a common configuration.
    
    Signed-off-by: Jens Axboe <[email protected]>

commit babcbbc7c4e2fa7fa76417ece7c57083bee971f1
Author: Andrey Ryabinin <[email protected]>
Date:   Thu Feb 1 21:00:52 2018 +0300

    fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports"
    
    This reverts commit df4c0e36f1b1782b0611a77c52cc240e5c4752dd.
    
    It's no longer needed since dentry_string_cmp() now uses
    read_word_at_a_time() to avoid kasan's reports.
    
    Signed-off-by: Andrey Ryabinin <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit bfe7aa6c39b12a6ab1e95f50271c53e47d6dd060
Author: Andrey Ryabinin <[email protected]>
Date:   Thu Feb 1 21:00:51 2018 +0300

    fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()
    
    dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may
    read slightly more than it was requested in kmallac().  Normally this
    would make KASAN to report out-of-bounds access, but this was
    workarounded by commit df4c0e36f1b1 ("fs: dcache: manually unpoison
    dname after allocation to shut up kasan's reports").
    
    This workaround is not perfect, since it allows out-of-bounds access to
    dentry's name for all the code, not just in dentry_string_cmp().
    
    So it would be better to use read_word_at_a_time() instead and revert
    commit df4c0e36f1b1.
    
    Signed-off-by: Andrey Ryabinin <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 1a3241ff10d038ecd096d03380327f2a0b5840a6
Author: Andrey Ryabinin <[email protected]>
Date:   Thu Feb 1 21:00:50 2018 +0300

    lib/strscpy: Shut up KASAN false-positives in strscpy()
    
    strscpy() performs the word-at-a-time optimistic reads.  So it may may
    access the memory past the end of the object, which is perfectly fine
    since strscpy() doesn't use that (past-the-end) data and makes sure the
    optimistic read won't cross a page boundary.
    
    Use new read_word_at_a_time() to shut up the KASAN.
    
    Note that this potentially could hide some bugs.  In example bellow,
    stscpy() will copy more than we should (1-3 extra uninitialized bytes):
    
            char dst[8];
            char *src;
    
            src = kmalloc(5, GFP_KERNEL);
            memset(src, 0xff, 5);
            strscpy(dst, src, 8);
    
    Signed-off-by: Andrey Ryabinin <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 7f1e541fc8d57a143dd5df1d0a1276046e08c083
Author: Andrey Ryabinin <[email protected]>
Date:   Thu Feb 1 21:00:49 2018 +0300

    compiler.h: Add read_word_at_a_time() function.
    
    Sometimes we know that it's safe to do potentially out-of-bounds access
    because we know it won't cross a page boundary.  Still, KASAN will
    report this as a bug.
    
    Add read_word_at_a_time() function which is supposed to be used in such
    cases.  In read_word_at_a_time() KASAN performs relaxed check - only the
    first byte of access is validated.
    
    Signed-off-by: Andrey Ryabinin <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit bdb5ac801af3d81d36732c2f640d6a1d3df83826
Author: Andrey Ryabinin <[email protected]>
Date:   Thu Feb 1 21:00:48 2018 +0300

    compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
    
    Instead of having two identical __read_once_size_nocheck() functions
    with different attributes, consolidate all the difference in new macro
    __no_kasan_or_inline and use it. No functional changes.
    
    Signed-off-by: Andrey Ryabinin <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>

commit 743ffffefac1c670c6618742c923f6275d819604
Author: Alexander Monakov <[email protected]>
Date:   Thu Feb 1 22:45:17 2018 +0300

    net: pxa168_eth: add netconsole support
    
    This implements ndo_poll_controller callback which is necessary to
    enable netconsole.
    
    Signed-off-by: Alexander Monakov <[email protected]>
    Cc: Russell King <[email protected]>
    Cc: Sebastian Hesselbarth <[email protected]>
    Cc: Florian Fainelli <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit e7aadb27a5415e8125834b84a74477bfbee4eff5
Author: Eric Dumazet <[email protected]>
Date:   Thu Feb 1 10:26:57 2018 -0800

    net: igmp: add a missing rcu locking section
    
    Newly added igmpv3_get_srcaddr() needs to be called under rcu lock.
    
    Timer callbacks do not ensure this locking.
    
    =============================
    WARNING: suspicious RCU usage
    4.15.0+ #200 Not tainted
    -----------------------------
    ./include/linux/inetdevice.h:216 suspicious rcu_dereference_check() usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    3 locks held by syzkaller616973/4074:
     #0:  (&mm->mmap_sem){++++}, at: [<00000000bfce669e>] __do_page_fault+0x32d/0xc90 arch/x86/mm/fault.c:1355
     #1:  ((&im->timer)){+.-.}, at: [<00000000619d2f71>] lockdep_copy_map include/linux/lockdep.h:178 [inline]
     #1:  ((&im->timer)){+.-.}, at: [<00000000619d2f71>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1316
     #2:  (&(&im->lock)->rlock){+.-.}, at: [<000000005f833c5c>] spin_lock_bh include/linux/spinlock.h:315 [inline]
     #2:  (&(&im->lock)->rlock){+.-.}, at: [<000000005f833c5c>] igmpv3_send_report+0x98/0x5b0 net/ipv4/igmp.c:600
    
    stack backtrace:
    CPU: 0 PID: 4074 Comm: syzkaller616973 Not tainted 4.15.0+ #200
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Call Trace:
     <IRQ>
     __dump_stack lib/dump_stack.c:17 [inline]
     dump_stack+0x194/0x257 lib/dump_stack.c:53
     lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4592
     __in_dev_get_rcu include/linux/inetdevice.h:216 [inline]
     igmpv3_get_srcaddr net/ipv4/igmp.c:329 [inline]
     igmpv3_newpack+0xeef/0x12e0 net/ipv4/igmp.c:389
     add_grhead.isra.27+0x235/0x300 net/ipv4/igmp.c:432
     add_grec+0xbd3/0x1170 net/ipv4/igmp.c:565
     igmpv3_send_report+0xd5/0x5b0 net/ipv4/igmp.c:605
     igmp_send_report+0xc43/0x1050 net/ipv4/igmp.c:722
     igmp_timer_expire+0x322/0x5c0 net/ipv4/igmp.c:831
     call_timer_fn+0x228/0x820 kernel/time/timer.c:1326
     expire_timers kernel/time/timer.c:1363 [inline]
     __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666
     run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
     __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
     invoke_softirq kernel/softirq.c:365 [inline]
     irq_exit+0x1cc/0x200 kernel/softirq.c:405
     exiting_irq arch/x86/include/asm/apic.h:541 [inline]
     smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
     apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:938
    
    Fixes: a46182b00290 ("net: igmp: Use correct source address on IGMPv3 reports")
    Signed-off-by: Eric Dumazet <[email protected]>
    Reported-by: syzbot <[email protected]>
    
    Signed-off-by: David S. Miller <[email protected]>

commit a107311d7fdf6b826f3737c4a90fd0e0046e7a3a
Author: Desnes Augusto Nunes do Rosario <[email protected]>
Date:   Thu Feb 1 16:04:30 2018 -0200

    ibmvnic: fix firmware version when no firmware level has been provided by the VIOS server
    
    Older versions of VIOS servers do not send the firmware level in the VPD
    buffer for the ibmvnic driver. Thus, not only the current message is mis-
    leading but the firmware version in the ethtool will be NULL. Therefore,
    this patch fixes the firmware string and its warning.
    
    Fixes: 4e6759be28e4 ("ibmvnic: Feature implementation of VPD for the ibmvnic driver")
    Signed-off-by: Desnes A. Nunes do Rosario <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 5e264e2b532966bfcfe8869a3fccc9876ec2122c
Author: Colin Ian King <[email protected]>
Date:   Thu Feb 1 17:29:21 2018 +0000

    vmxnet3: remove redundant initialization of pointer 'rq'
    
    Pointer rq is being initialized but this value is never read, it
    is being updated inside a for-loop. Remove the initialization and
    move it into the scope of the for-loop.
    
    Cleans up clang warning:
    drivers/net/vmxnet3/vmxnet3_drv.c:2763:27: warning: Value stored
    to 'rq' during its initialization is never read
    
    Signed-off-by: Colin Ian King <[email protected]>
    Acked-by: Shrikrishna Khare <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 3b51cc75eba28a7b2ca013f8255a4fd425b12b26
Author: Colin Ian King <[email protected]>
Date:   Thu Feb 1 17:10:18 2018 +0000

    lan78xx: remove redundant initialization of pointer 'phydev'
    
    Pointer phydev is initialized and this value is never read, phydev
    is immediately updated to a new value, hence this initialization
    is redundant and can be removed
    
    Cleans up clang warning:
    drivers/net/usb/lan78xx.c:2009:21: warning: Value stored to 'phydev'
    during its initialization is never read
    
    Signed-off-by: Colin Ian King <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit f14d244f6147066c65dd98caa08aab0135ab1cc4
Author: Colin Ian King <[email protected]>
Date:   Thu Feb 1 16:58:42 2018 +0000

    net: jme: remove unused initialization of 'rxdesc'
    
    Pointer rxdesc is assigned a value that is never read, it is overwritten
    by a new assignment inside a while loop hence the initial assignment
    is redundant and can be removed.
    
    Cleans up clang warning:
    drivers/net/ethernet/jme.c:1074:17: warning: Value stored to 'rxdesc'
    during its initialization is never read
    
    Signed-off-by: Colin Ian King <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 7ac07fdaf840f9b141c6d5c286805107227c0e68
Author: Andreas Gruenbacher <[email protected]>
Date:   Mon Jan 8 22:35:43 2018 +0100

    gfs2: Glock dump performance regression fix
    
    Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks
    dump": keep the glock hash table iterator active while the glock dump
    file is held open.  This avoids having to rescan the hash table from the
    start for each read, with quadratically rising runtime.
    
    In addition, use rhastable_walk_peek for resuming a glock dump at the
    current position: when a glock doesn't fit in the provided buffer
    anymore, the next read must revisit the same glock.
    
    Finally, also restart the dump from the first entry when we notice that
    the hash table has been resized in gfs2_glock_seq_start.
    
    Signed-off-by: Andreas Gruenbacher <[email protected]>
    Signed-off-by: Bob Peterson <[email protected]>

commit dcb2cd55cf43fe06ada66265c1e088a4b08d3e3d
Author: Andreas Gruenbacher <[email protected]>
Date:   Thu Feb 1 11:12:13 2018 +0100

    gfs2: Fix the crc32c dependency
    
    Depend on LIBCRC32C which uses the crypto API to select the appropriate
    crc32c implementation.  With the CRYPTO and CRYPTO_CRC32C dependencies,
    gfs2 would still need to use the crypto API directly like ext4 and btrfs
    do, which isn't necessary.
    
    Signed-off-by: Andreas Gruenbacher <[email protected]>
    Signed-off-by: Bob Peterson <[email protected]>

commit 0b1dfa4cc6c60052b2c30ead316fa84c46d3c43c
Author: Eric Biggers <[email protected]>
Date:   Fri Jan 19 13:45:24 2018 -0800

    fscrypt: fix build with pre-4.6 gcc versions
    
    gcc versions prior to 4.6 require an extra level of braces when using a
    designated initializer for a member in an anonymous struct or union.
    This caused a compile error with the 'struct qstr' initialization in
    __fscrypt_encrypt_symlink().
    
    Fix it by using QSTR_INIT().
    
    Reported-by: Andrew Morton <[email protected]>
    Fixes: 76e81d6d5048 ("fscrypt: new helper functions for ->symlink()")
    Signed-off-by: Eric Biggers <[email protected]>
    Signed-off-by: Theodore Ts'o <[email protected]>

commit 1640eea35e8dcf0cb437f03c56868a97d0666df3
Author: Julia Lawall <[email protected]>
Date:   Thu Feb 1 10:20:55 2018 +0100

    Coccinelle: coccicheck: fix typo
    
    Correct spelling of "coccinelle".
    
    Signed-off-by: Julia Lawall <[email protected]>
    Signed-off-by: Masahiro Yamada <[email protected]>

commit 7973bfd8758d05c85ee32052a3d7d5d0549e91b4
Author: Christian Brauner <[email protected]>
Date:   Thu Feb 1 12:56:00 2018 +0100

    rtnetlink: remove check for IFLA_IF_NETNSID
    
    RTM_NEWLINK supports the IFLA_IF_NETNSID property since
    5bb8ed075428b71492734af66230aa0c07fcc515 so we should not error out
    when it is passed.
    
    Signed-off-by: Christian Brauner <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit a83165f00f16c0e0ef5b7cec3cbd0d4788699265
Author: Jiri Pirko <[email protected]>
Date:   Thu Feb 1 12:21:15 2018 +0100

    rocker: fix possible null pointer dereference in rocker_router_fib_event_work
    
    Currently, rocker user may experience following null pointer
    derefence bug:
    
    [    3.062141] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
    [    3.065163] IP: rocker_router_fib_event_work+0x36/0x110 [rocker]
    
    The problem is uninitialized rocker->wops pointer that is initialized
    only with the first initialized port. So move the port initialization
    before registering the fib events.
    
    Fixes: 936bd486564a ("rocker: use FIB notifications instead of switchdev calls")
    Signed-off-by: Jiri Pirko <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 0ba987181028ab41cdc68fa91b74c98d97b93ff3
Author: Geert Uytterhoeven <[email protected]>
Date:   Thu Feb 1 11:26:23 2018 +0100

    inet: Avoid unitialized variable warning in inet_unhash()
    
    With gcc-4.1.2:
    
        net/ipv4/inet_hashtables.c: In function ‘inet_unhash’:
        net/ipv4/inet_hashtables.c:628: warning: ‘ilb’ may be used uninitialized in this function
    
    While this is a false positive, it can easily be avoided by using the
    pointer itself as the canary variable.
    
    Signed-off-by: Geert Uytterhoeven <[email protected]>
    Acked-by: Arnd Bergmann <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 367dc6586d2d9c0c347b567f7efec57f59c376fd
Author: Geert Uytterhoeven <[email protected]>
Date:   Thu Feb 1 11:25:27 2018 +0100

    net: bridge: Fix uninitialized error in br_fdb_sync_static()
    
    With gcc-4.1.2.:
    
        net/bridge/br_fdb.c: In function ‘br_fdb_sync_static’:
        net/bridge/br_fdb.c:996: warning: ‘err’ may be used uninitialized in this function
    
    Indeed, if the list is empty, err will be uninitialized, and will be
    propagated up as the function return value.
    
    Fix this by preinitializing err to zero.
    
    Fixes: eb7935830d00b9e0 ("net: bridge: use rhashtable for fdbs")
    Signed-off-by: Geert Uytterhoeven <[email protected]>
    Acked-by: Nikolay Aleksandrov <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

commit 9382fe71c0058465e942a633869629929102843d
Author: Ed Swierk <[email protected]>
Date:   Wed Jan 31 18:48:02 2018 -0800

    openvswitch: Remove padding from packet before L3+ conntrack processing
    
    IPv4 and IPv6 packets may arrive with lower-layer padding that is not
    included in the L3 length. For example, a short IPv4 packet may have
    up to 6 bytes of padding following the IP payload when received on an
    Ethernet device with a minimum packet length of 64 bytes.
    
    Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
    and help() in nf_conntrack_ftp) assume skb->len reflects the length of
    the L3 header and payload, rather than referring back to
    ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
    lower-layer padding.
    
    In the normal IPv4 receive path, ip_rcv() trims the packet to
    ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
    path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
    in the br_netfilter receive path, br_validate_ipv4() and
    br_validate_ipv6() trim the packet to the L3 length before invoking
    netfilter hooks.
    
    Currently in the OVS conntrack receive path, ovs_ct_execute() pulls
    the skb to the L3 header but does not trim it to the L3 length before
    calling nf_conntrack_in(NF_INET_PRE_ROUTING). When
    nf_conntrack_proto_tcp encounters a packet with lower-layer padding,
    nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log
    message. While extra zero bytes don't affect the checksum, the length
    in the IP pseudoheader does. That length is based on skb->len, and
    without trimming, it doesn't match the length the sender used when
    computing the checksum.
    
    In ovs_ct_execute(), trim the skb to…
mmayer pushed a commit to mmayer/linux that referenced this pull request Oct 6, 2018
team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c2
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ torvalds#176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ torvalds#176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b0 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Noltari pushed a commit to Noltari/linux that referenced this pull request Oct 18, 2018
[ Upstream commit 471b83b ]

team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c2
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ torvalds#176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ torvalds#176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b0 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
frank-w referenced this pull request in frank-w/BPI-Router-Linux Oct 19, 2018
[ Upstream commit 471b83b ]

team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c2
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ #176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ #176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b0 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Noltari pushed a commit to Noltari/linux that referenced this pull request Oct 20, 2018
[ Upstream commit 471b83b ]

team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c2
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ torvalds#176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ torvalds#176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b0 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
gabrielesvelto pushed a commit to gabrielesvelto/CI20_linux that referenced this pull request Nov 23, 2018
[ Upstream commit 471b83b ]

team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c2
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ torvalds#176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 MIPS#1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ torvalds#176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b0 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Noltari pushed a commit to Noltari/linux that referenced this pull request Dec 17, 2018
commit 471b83b upstream.

team's ndo_add_slave() acquires 'team->lock' and later tries to open the
newly enslaved device via dev_open(). This emits a 'NETDEV_UP' event
that causes the VLAN driver to add VLAN 0 on the team device. team's
ndo_vlan_rx_add_vid() will also try to acquire 'team->lock' and
deadlock.

Fix this by checking early at the enslavement function that a team
device is not being enslaved to itself.

A similar check was added to the bond driver in commit 09a89c2
("bonding: disallow enslaving a bond to itself").

WARNING: possible recursive locking detected
4.18.0-rc7+ torvalds#176 Not tainted
--------------------------------------------
syz-executor4/6391 is trying to acquire lock:
(____ptrval____) (&team->lock){+.+.}, at: team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868

but task is already holding lock:
(____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&team->lock);
  lock(&team->lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by syz-executor4/6391:
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
 #0: (____ptrval____) (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4662
 #1: (____ptrval____) (&team->lock){+.+.}, at: team_add_slave+0xdb/0x1c30 drivers/net/team/team.c:1947

stack backtrace:
CPU: 1 PID: 6391 Comm: syz-executor4 Not tainted 4.18.0-rc7+ torvalds#176
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1765 [inline]
 check_deadlock kernel/locking/lockdep.c:1809 [inline]
 validate_chain kernel/locking/lockdep.c:2405 [inline]
 __lock_acquire.cold.64+0x1fb/0x486 kernel/locking/lockdep.c:3435
 lock_acquire+0x1e4/0x540 kernel/locking/lockdep.c:3924
 __mutex_lock_common kernel/locking/mutex.c:757 [inline]
 __mutex_lock+0x176/0x1820 kernel/locking/mutex.c:894
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909
 team_vlan_rx_add_vid+0x3b/0x1e0 drivers/net/team/team.c:1868
 vlan_add_rx_filter_info+0x14a/0x1d0 net/8021q/vlan_core.c:210
 __vlan_vid_add net/8021q/vlan_core.c:278 [inline]
 vlan_vid_add+0x63e/0x9d0 net/8021q/vlan_core.c:308
 vlan_device_event.cold.12+0x2a/0x2f net/8021q/vlan.c:381
 notifier_call_chain+0x180/0x390 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394 [inline]
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1735
 call_netdevice_notifiers net/core/dev.c:1753 [inline]
 dev_open+0x173/0x1b0 net/core/dev.c:1433
 team_port_add drivers/net/team/team.c:1219 [inline]
 team_add_slave+0xa8b/0x1c30 drivers/net/team/team.c:1948
 do_set_master+0x1c9/0x220 net/core/rtnetlink.c:2248
 do_setlink+0xba4/0x3e10 net/core/rtnetlink.c:2382
 rtnl_setlink+0x2a9/0x400 net/core/rtnetlink.c:2636
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4665
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2455
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4683
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfd0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:642 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:652
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2126
 __sys_sendmsg+0x11d/0x290 net/socket.c:2164
 __do_sys_sendmsg net/socket.c:2173 [inline]
 __se_sys_sendmsg net/socket.c:2171 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2171
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x456b29
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9706bf8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9706bf96d4 RCX: 0000000000456b29
RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d3548 R14: 00000000004c8227 R15: 0000000000000000

Fixes: 87002b0 ("net: introduce vlan_vid_[add/del] and use them instead of direct [add/kill]_vid ndo calls")
Signed-off-by: Ido Schimmel <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: David S. Miller <[email protected]>
[bwh: Backported to 3.16: drop the extack message]
Signed-off-by: Ben Hutchings <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 4, 2020
claim_swapfile() currently keeps the inode locked when it is successful, or
the file is already swapfile (with -EBUSY). And, on the other error cases,
it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by unlocking the inode on the error path. It
also reverts blocksize and releases bdev, so that the caller can safely
forget about the inode.

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Feb 7, 2020
claim_swapfile() currently keeps the inode locked when it is successful, or
the file is already swapfile (with -EBUSY). And, on the other error cases,
it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile(). The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap". Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
ruscur pushed a commit to ruscur/linux that referenced this pull request Feb 17, 2020
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
ruscur pushed a commit to ruscur/linux that referenced this pull request Feb 21, 2020
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
ruscur pushed a commit to ruscur/linux that referenced this pull request Feb 24, 2020
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
ruscur pushed a commit to ruscur/linux that referenced this pull request Feb 26, 2020
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
alistair23 pushed a commit to alistair23/linux that referenced this pull request Jan 31, 2021
Update 5.4-2.2.x-imx to v5.4.78 from stable
chombourger pushed a commit to chombourger/linux that referenced this pull request Feb 16, 2021
…from plsdk-2948 to processor-sdk-linux-4.19.y

* commit '105757bf774c6f646fc416d636d192f982e3a02f':
  arm64: dts: ti: k3-am65-mcu: fix power-domains format for WDT device node
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Mar 15, 2021
This commit fixes the following checkpatch.pl errors:

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#170: FILE: ./hal/HalBtc8723b1Ant.h:170:
    +void EXhalbtc8723b1ant_PowerOnSetting(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#171: FILE: ./hal/HalBtc8723b1Ant.h:171:
    +void EXhalbtc8723b1ant_InitHwConfig(struct BTC_COEXIST * pBtCoexist, bool bWifiOnly);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#172: FILE: ./hal/HalBtc8723b1Ant.h:172:
    +void EXhalbtc8723b1ant_InitCoexDm(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#173: FILE: ./hal/HalBtc8723b1Ant.h:173:
    +void EXhalbtc8723b1ant_IpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#174: FILE: ./hal/HalBtc8723b1Ant.h:174:
    +void EXhalbtc8723b1ant_LpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#175: FILE: ./hal/HalBtc8723b1Ant.h:175:
    +void EXhalbtc8723b1ant_ScanNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#176: FILE: ./hal/HalBtc8723b1Ant.h:176:
    +void EXhalbtc8723b1ant_ConnectNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#177: FILE: ./hal/HalBtc8723b1Ant.h:177:
    +void EXhalbtc8723b1ant_MediaStatusNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#178: FILE: ./hal/HalBtc8723b1Ant.h:178:
    +void EXhalbtc8723b1ant_SpecialPacketNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#180: FILE: ./hal/HalBtc8723b1Ant.h:180:
    +	struct BTC_COEXIST * pBtCoexist, u8 *tmpBuf, u8 length

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#182: FILE: ./hal/HalBtc8723b1Ant.h:182:
    +void EXhalbtc8723b1ant_HaltNotify(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#183: FILE: ./hal/HalBtc8723b1Ant.h:183:
    +void EXhalbtc8723b1ant_PnpNotify(struct BTC_COEXIST * pBtCoexist, u8 pnpState);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#184: FILE: ./hal/HalBtc8723b1Ant.h:184:
    +void EXhalbtc8723b1ant_Periodical(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#185: FILE: ./hal/HalBtc8723b1Ant.h:185:
    +void EXhalbtc8723b1ant_DisplayCoexInfo(struct BTC_COEXIST * pBtCoexist);

Signed-off-by: Marco Cesati <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Mar 16, 2021
This commit fixes the following checkpatch.pl errors:

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#170: FILE: ./hal/HalBtc8723b1Ant.h:170:
    +void EXhalbtc8723b1ant_PowerOnSetting(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#171: FILE: ./hal/HalBtc8723b1Ant.h:171:
    +void EXhalbtc8723b1ant_InitHwConfig(struct BTC_COEXIST * pBtCoexist, bool bWifiOnly);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#172: FILE: ./hal/HalBtc8723b1Ant.h:172:
    +void EXhalbtc8723b1ant_InitCoexDm(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#173: FILE: ./hal/HalBtc8723b1Ant.h:173:
    +void EXhalbtc8723b1ant_IpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#174: FILE: ./hal/HalBtc8723b1Ant.h:174:
    +void EXhalbtc8723b1ant_LpsNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#175: FILE: ./hal/HalBtc8723b1Ant.h:175:
    +void EXhalbtc8723b1ant_ScanNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#176: FILE: ./hal/HalBtc8723b1Ant.h:176:
    +void EXhalbtc8723b1ant_ConnectNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#177: FILE: ./hal/HalBtc8723b1Ant.h:177:
    +void EXhalbtc8723b1ant_MediaStatusNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#178: FILE: ./hal/HalBtc8723b1Ant.h:178:
    +void EXhalbtc8723b1ant_SpecialPacketNotify(struct BTC_COEXIST * pBtCoexist, u8 type);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#180: FILE: ./hal/HalBtc8723b1Ant.h:180:
    +	struct BTC_COEXIST * pBtCoexist, u8 *tmpBuf, u8 length

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#182: FILE: ./hal/HalBtc8723b1Ant.h:182:
    +void EXhalbtc8723b1ant_HaltNotify(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#183: FILE: ./hal/HalBtc8723b1Ant.h:183:
    +void EXhalbtc8723b1ant_PnpNotify(struct BTC_COEXIST * pBtCoexist, u8 pnpState);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#184: FILE: ./hal/HalBtc8723b1Ant.h:184:
    +void EXhalbtc8723b1ant_Periodical(struct BTC_COEXIST * pBtCoexist);

    ERROR:POINTER_LOCATION: "foo * bar" should be "foo *bar"
    torvalds#185: FILE: ./hal/HalBtc8723b1Ant.h:185:
    +void EXhalbtc8723b1ant_DisplayCoexInfo(struct BTC_COEXIST * pBtCoexist);

Reviewed-by: Dan Carpenter <[email protected]>
Signed-off-by: Marco Cesati <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ojeda added a commit to ojeda/linux that referenced this pull request Apr 11, 2021
Normalize semaphore samples and add them to the CI
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Aug 9, 2021
…rogs

Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ torvalds#176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

To fix the issue, let us make these two helpers not available
to sleepable program. I marked the patch fixing 95b861a
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to
5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program and bpf_get_current_cgroup_id() is not available to
syscall program any more.

 [1] https://lore.kernel.org/bpf/[email protected]/

Reported-by: [email protected]
Fixes: 95b861a ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Signed-off-by: Yonghong Song <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Aug 9, 2021
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ torvalds#176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/[email protected]/

Reported-by: [email protected]
Fixes: 95b861a ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Signed-off-by: Yonghong Song <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Aug 10, 2021
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ torvalds#176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/[email protected]/

Reported-by: [email protected]
Fixes: 95b861a ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Signed-off-by: Yonghong Song <[email protected]>
roxell pushed a commit to roxell/linux that referenced this pull request Aug 12, 2021
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ torvalds#176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/[email protected]/

Fixes: 95b861a ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Reported-by: [email protected]
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
ammarfaizi2 pushed a commit to ammarfaizi2/linux-fork that referenced this pull request Nov 20, 2021
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -ebusy).  and, on the other error
cases, it does not lock the inode.

this inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

this commit fixes this issue by moving the inode_lock() and is_swapfile
check out of claim_swapfile().  the inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    warning: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
    [<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    cpu: 5 pid: 4294 comm: swapon not tainted 5.5.0-rc7-btrfs-zns+ torvalds#176
    hardware name: asus all series/h87-pro, bios 2102 07/29/2014
    call trace:
     dump_stack+0xa1/0xea
     ? __do_sys_swapon+0x94b/0x3550
     print_unlock_imbalance_bug.cold+0x114/0x123
     ? __do_sys_swapon+0x94b/0x3550
     lock_release+0x562/0xed0
     ? kvfree+0x31/0x40
     ? lock_downgrade+0x770/0x770
     ? kvfree+0x31/0x40
     ? rcu_read_lock_sched_held+0xa1/0xd0
     ? rcu_read_lock_bh_held+0xb0/0xb0
     up_write+0x2d/0x490
     ? kfree+0x293/0x2f0
     __do_sys_swapon+0x94b/0x3550
     ? putname+0xb0/0xf0
     ? kmem_cache_free+0x2e7/0x370
     ? do_sys_open+0x184/0x3e0
     ? generic_max_swapfile_size+0x40/0x40
     ? do_syscall_64+0x27/0x4b0
     ? entry_syscall_64_after_hwframe+0x49/0xbe
     ? lockdep_hardirqs_on+0x38c/0x590
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_syscall_64_after_hwframe+0x49/0xbe
    rip: 0033:0x7f15da0a0dc7

link: http://lkml.kernel.org/r/[email protected]
fixes: 1638045 ("mm: set s_swapfile on blockdev swap devices")
signed-off-by: naohiro aota <[email protected]>
reviewed-by: andrew morton <[email protected]>
reviewed-by: darrick j. wong <[email protected]>
tested-by: qais youef <[email protected]>
cc: christoph hellwig <[email protected]>
cc: <[email protected]>
vlsunil pushed a commit to ventana-micro-systems/RISC-V-Linux that referenced this pull request Nov 23, 2021
Currently, if bpf_get_current_cgroup_id() or
bpf_get_current_ancestor_cgroup_id() helper is
called with sleepable programs e.g., sleepable
fentry/fmod_ret/fexit/lsm programs, a rcu warning
may appear. For example, if I added the following
hack to test_progs/test_lsm sleepable fentry program
test_sys_setdomainname:

  --- a/tools/testing/selftests/bpf/progs/lsm.c
  +++ b/tools/testing/selftests/bpf/progs/lsm.c
  @@ -168,6 +168,10 @@ int BPF_PROG(test_sys_setdomainname, struct pt_regs *regs)
          int buf = 0;
          long ret;

  +       __u64 cg_id = bpf_get_current_cgroup_id();
  +       if (cg_id == 1000)
  +               copy_test++;
  +
          ret = bpf_copy_from_user(&buf, sizeof(buf), ptr);
          if (len == -2 && ret == 0 && buf == 1234)
                  copy_test++;

I will hit the following rcu warning:

  include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
  other info that might help us debug this:
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by test_progs/260:
      #0: ffffffffa5173360 (rcu_read_lock_trace){....}-{0:0}, at: __bpf_prog_enter_sleepable+0x0/0xa0
    stack backtrace:
    CPU: 1 PID: 260 Comm: test_progs Tainted: G           O      5.14.0-rc2+ torvalds#176
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
    Call Trace:
      dump_stack_lvl+0x56/0x7b
      bpf_get_current_cgroup_id+0x9c/0xb1
      bpf_prog_a29888d1c6706e09_test_sys_setdomainname+0x3e/0x89c
      bpf_trampoline_6442469132_0+0x2d/0x1000
      __x64_sys_setdomainname+0x5/0x110
      do_syscall_64+0x3a/0x80
      entry_SYSCALL_64_after_hwframe+0x44/0xae

I can get similar warning using bpf_get_current_ancestor_cgroup_id() helper.
syzbot reported a similar issue in [1] for syscall program. Helper
bpf_get_current_cgroup_id() or bpf_get_current_ancestor_cgroup_id()
has the following callchain:
   task_dfl_cgroup
     task_css_set
       task_css_set_check
and we have
   #define task_css_set_check(task, __c)                                   \
           rcu_dereference_check((task)->cgroups,                          \
                   lockdep_is_held(&cgroup_mutex) ||                       \
                   lockdep_is_held(&css_set_lock) ||                       \
                   ((task)->flags & PF_EXITING) || (__c))
Since cgroup_mutex/css_set_lock is not held and the task
is not existing and rcu read_lock is not held, a warning
will be issued. Note that bpf sleepable program is protected by
rcu_read_lock_trace().

The above sleepable bpf programs are already protected
by migrate_disable(). Adding rcu_read_lock() in these
two helpers will silence the above warning.
I marked the patch fixing 95b861a
("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
which added bpf_get_current_ancestor_cgroup_id() to tracing programs
in 5.14. I think backporting 5.14 is probably good enough as sleepable
progrems are not widely used.

This patch should fix [1] as well since syscall program is a sleepable
program protected with migrate_disable().

 [1] https://lore.kernel.org/bpf/[email protected]/

Fixes: 95b861a ("bpf: Allow bpf_get_current_ancestor_cgroup_id for tracing")
Reported-by: [email protected]
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 4, 2022
If sas_phy_alloc() returns error in sas_register_phys(), the phys that
have been added are not deleted, so the memory of them are leaked, also,
this leads the list of phy_attr_cont is not empty, it tiggers a BUG while
calling sas_release_transport() in hisi_sas_exit() when removing module.

kernel BUG at ./include/linux/transport_class.h:92!
CPU: 8 PID: 38014 Comm: rmmod Kdump: loaded Not tainted 6.1.0-rc1+ torvalds#176
Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : sas_release_transport+0x78/0x84 [scsi_transport_sas]
lr : sas_release_transport+0x2c/0x84 [scsi_transport_sas]
Call trace:
 sas_release_transport+0x78/0x84 [scsi_transport_sas]
 hisi_sas_exit+0x1c/0x9a8 [hisi_sas_main]
 __arm64_sys_delete_module+0x19c/0x358

Fix this by deleting the phys that have been added if sas_phy_alloc()
returns error.

Besides, if sas_phy_add() fails in sas_register_phys(), the phy->dev
is not added to the klist_children of shost_gendev, so the phy can not
be delete in sas_remove_children(), the phy and name memory allocated
in sas_phy_alloc() are leaked.

Fix this by checking and handling return value of sas_phy_add() in
sas_register_phys(), call sas_phy_free() in the error path.

Fixes: 2908d77 ("[SCSI] aic94xx: new driver")
Signed-off-by: Yang Yingliang <[email protected]>
gatieme pushed a commit to gatieme/linux that referenced this pull request Nov 24, 2022
ANBZ: torvalds#1075

commit d795a90 upstream.

claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY).  And, on the other error
cases, it does not lock the inode.

This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().

This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile().  The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap".  Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".

    =====================================
    WARNING: bad unlock balance detected!
    5.5.0-rc7+ torvalds#176 Not tainted
    -------------------------------------
    swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550
    but there are no more locks to release!

    other info that might help us debug this:
    no locks held by swapon/4294.

    stack backtrace:
    CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ torvalds#176
    Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
    Call Trace:
     dump_stack+0xa1/0xea
     print_unlock_imbalance_bug.cold+0x114/0x123
     lock_release+0x562/0xed0
     up_write+0x2d/0x490
     __do_sys_swapon+0x94b/0x3550
     __x64_sys_swapon+0x54/0x80
     do_syscall_64+0xa4/0x4b0
     entry_SYSCALL_64_after_hwframe+0x49/0xbe
    RIP: 0033:0x7f15da0a0dc7

Fixes: 1638045 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Tested-by: Qais Youef <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Hongnan Li <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Mar 8, 2023
The current codebase calls the function no matter net device has XDP
programs or not. So the finalize function is being called everytime when RX
bottom-half in progress. It needs a few machine instructions for nothing
in the case that XDP programs are not attached at all.

Lets it call the function on a condition that if xdp_status variable has
not zero value. That means XDP programs are attached to the net device
and it should be finalized based on the variable.

The following instructions show that it's better than calling the function
unconditionally.

  0.31 │6b8:   ldr     w0, [sp, torvalds#196]
       │    ┌──cbz     w0, 6cc
       │    │  mov     x1, x0
       │    │  mov     x0, x27
       │    │→ bl     stmmac_finalize_xdp_rx
       │6cc:└─→ldr    x1, [sp, torvalds#176]

with 'if (xdp_status)' statement, jump to '6cc' label if xdp_status has
zero value.

Signed-off-by: Leesoo Ahn <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 3, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 5, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 6, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 7, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Signed-off-by: John Fastabend <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-fork that referenced this pull request Apr 13, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Signed-off-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 2, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 17, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Reviewed-by: Jakub Sitnicki <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 19, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Reviewed-by: Jakub Sitnicki <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 23, 2023
…rops

When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.

Original patch series broke this resulting in

test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:FAIL

After updated patch with fix.

torvalds#176/16  sockmap_basic/sockmap skb_verdict fionread:OK
torvalds#176/17  sockmap_basic/sockmap skb_verdict fionread on drop:OK

Reviewed-by: Jakub Sitnicki <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 27, 2024
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning may
be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 torvalds#176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after one
assigns slab->obj_exts, the second one will happily do the exchange if it
reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request May 30, 2024
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 torvalds#176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jun 1, 2024
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 torvalds#176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jun 5, 2024
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 torvalds#176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
ioworker0 pushed a commit to ioworker0/linux that referenced this pull request Jun 6, 2024
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 torvalds#176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
torvalds pushed a commit that referenced this pull request Jun 8, 2024
When CONFIG_MEM_ALLOC_PROFILING_DEBUG is enabled, the following warning
may be noticed:

[   48.299584] ------------[ cut here ]------------
[   48.300092] alloc_tag was not set
[   48.300528] WARNING: CPU: 2 PID: 1361 at include/linux/alloc_tag.h:130 alloc_tagging_slab_free_hook+0x84/0xc7
[   48.301305] Modules linked in:
[   48.301553] CPU: 2 PID: 1361 Comm: systemd-udevd Not tainted 6.10.0-rc1-00003-gac8755535862 #176
[   48.302196] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   48.302752] RIP: 0010:alloc_tagging_slab_free_hook+0x84/0xc7
[   48.303169] Code: 8d 1c c4 48 85 db 74 4d 48 83 3b 00 75 1e 80 3d 65 02 86 04 00 75 15 48 c7 c7 11 48 1d 85 c6 05 55 02 86 04 01 e8 64 44 a5 ff <0f> 0b 48 8b 03 48 85 c0 74 21 48 83 f8 01 74 14 48 8b 50 20 48 f7
[   48.304411] RSP: 0018:ffff8880111b7d40 EFLAGS: 00010282
[   48.304916] RAX: 0000000000000000 RBX: ffff88800fcc9008 RCX: 0000000000000000
[   48.305455] RDX: 0000000080000000 RSI: ffff888014060000 RDI: ffffed1002236f97
[   48.305979] RBP: 0000000000001100 R08: fffffbfff0aa73a1 R09: 0000000000000000
[   48.306473] R10: ffffffff814515e5 R11: 0000000000000003 R12: ffff88800fcc9000
[   48.306943] R13: ffff88800b2e5cc0 R14: ffff8880111b7d90 R15: 0000000000000000
[   48.307529] FS:  00007faf5d1908c0(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000
[   48.308223] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.308710] CR2: 000058fb220c9118 CR3: 00000000110cc000 CR4: 0000000000750ef0
[   48.309274] PKRU: 55555554
[   48.309804] Call Trace:
[   48.310029]  <TASK>
[   48.310290]  ? show_regs+0x84/0x8d
[   48.310722]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311298]  ? __warn+0x13b/0x2ff
[   48.311580]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.311987]  ? report_bug+0x2ce/0x3ab
[   48.312292]  ? handle_bug+0x8c/0x107
[   48.312563]  ? exc_invalid_op+0x34/0x6f
[   48.312842]  ? asm_exc_invalid_op+0x1a/0x20
[   48.313173]  ? this_cpu_in_panic+0x1c/0x72
[   48.313503]  ? alloc_tagging_slab_free_hook+0x84/0xc7
[   48.313880]  ? putname+0x143/0x14e
[   48.314152]  kmem_cache_free+0xe9/0x214
[   48.314454]  putname+0x143/0x14e
[   48.314712]  do_unlinkat+0x413/0x45e
[   48.315001]  ? __pfx_do_unlinkat+0x10/0x10
[   48.315388]  ? __check_object_size+0x4d7/0x525
[   48.315744]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316167]  ? __sanitizer_cov_trace_pc+0x20/0x4a
[   48.316757]  ? getname_flags+0x4ed/0x500
[   48.317261]  __x64_sys_unlink+0x42/0x4a
[   48.317741]  do_syscall_64+0xe2/0x149
[   48.318171]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   48.318602] RIP: 0033:0x7faf5d8850ab
[   48.318891] Code: fd ff ff e8 27 dd 01 00 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 5f 00 00 00 0f 05 c3 0f 1f 40 00 f3 0f 1e fa b8 57 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 41 2d 0e 00 f7 d8
[   48.320649] RSP: 002b:00007ffc44982b38 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
[   48.321182] RAX: ffffffffffffffda RBX: 00005ba344a44680 RCX: 00007faf5d8850ab
[   48.321667] RDX: 0000000000000000 RSI: 00005ba344a44430 RDI: 00007ffc44982b40
[   48.322139] RBP: 00007ffc44982c00 R08: 0000000000000000 R09: 0000000000000007
[   48.322598] R10: 00005ba344a44430 R11: 0000000000000246 R12: 0000000000000000
[   48.323071] R13: 00007ffc44982b40 R14: 0000000000000000 R15: 0000000000000000
[   48.323596]  </TASK>

This is due to a race when two objects are allocated from the same slab,
which did not have an obj_exts allocated for.

In such a case, the two threads will notice the NULL obj_exts and after
one assigns slab->obj_exts, the second one will happily do the exchange if
it reads this new assigned value.

In order to avoid that, verify that the read obj_exts does not point to an
allocated obj_exts before doing the exchange.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 09c4656 ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Gustavo A. R. Silva <[email protected]>
Cc: Hyeonggon Yoo <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Thadeu Lima de Souza Cascardo <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Nov 11, 2024
Hej,

I stumbled over this lockdep splat during pci hotplug:
[   26.016648] ======================================================
[   26.019646] WARNING: possible circular locking dependency detected
[   26.022785] 6.12.0-rc6+ torvalds#176 Not tainted
[   26.024776] ------------------------------------------------------
[   26.027909] irq/50-pciehp/57 is trying to acquire lock:
[   26.030559] ffff0000c02ad700 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_configure_device+0xe4/0x1a0
[   26.035423] [   26.035423] but task is already holding lock:
[   26.038505] ffff800082f819f8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x24/0x38
[   26.043512] [   26.043512] which lock already depends on the new lock.
[   26.043512] [   26.047863] [   26.047863] the existing dependency chain (in reverse order) is:
[   26.051823] [   26.051823] -> #1 (pci_rescan_remove_lock){+.+.}-{3:3}:
[   26.056209]        __mutex_lock+0x90/0x3a0
[   26.057946]        mutex_lock_nested+0x2c/0x40
[   26.059848]        pci_lock_rescan_remove+0x24/0x38
[   26.062560]        pciehp_configure_device+0x48/0x1a0
[   26.065592]        pciehp_handle_presence_or_link_change+0x1e0/0x4a0
[   26.069044]        pciehp_ist+0x21c/0x268
[   26.071186]        irq_thread_fn+0x34/0xb8
[   26.073368]        irq_thread+0x154/0x2d0
[   26.075503]        kthread+0x108/0x120
[   26.077504]        ret_from_fork+0x10/0x20
[   26.079695] [   26.079695] -> #0 (&ctrl->reset_lock){.+.+}-{3:3}:
[   26.083164]        __lock_acquire+0x12bc/0x1eb8
[   26.085592]        lock_acquire+0x1e0/0x358
[   26.087831]        down_read_nested+0x54/0x160
[   26.090198]        pciehp_configure_device+0xe4/0x1a0
[   26.092895]        pciehp_handle_presence_or_link_change+0x1e0/0x4a0
[   26.096225]        pciehp_ist+0x21c/0x268
[   26.098337]        irq_thread_fn+0x34/0xb8
[   26.100509]        irq_thread+0x154/0x2d0
[   26.102668]        kthread+0x108/0x120
[   26.104660]        ret_from_fork+0x10/0x20
[   26.106790] [   26.106790] other info that might help us debug this:
[   26.106790] [   26.111033]  Possible unsafe locking scenario:
[   26.111033] [   26.114184]        CPU0                    CPU1
[   26.116607]        ----                    ----
[   26.119023]   lock(pci_rescan_remove_lock);
[   26.121776]                                lock(&ctrl->reset_lock);
[   26.123924]                                lock(pci_rescan_remove_lock);
[   26.126098]   rlock(&ctrl->reset_lock);
[   26.127349] [   26.127349]  *** DEADLOCK ***
[   26.127349] [   26.129274] 1 lock held by irq/50-pciehp/57:
[   26.130664]  #0: ffff800082f819f8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x24/0x38
[   26.135941] [   26.135941] stack backtrace:
[   26.138240] CPU: 0 UID: 0 PID: 57 Comm: irq/50-pciehp Not tainted 6.12.0-rc6+ torvalds#176
[   26.142223] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20230524-3.fc37 05/24/2023
[   26.146514] Call trace:
[   26.147795]  dump_backtrace+0xa4/0x130
[   26.149759]  show_stack+0x20/0x38
[   26.151504]  dump_stack_lvl+0x90/0xd0
[   26.153429]  dump_stack+0x18/0x28
[   26.155209]  print_circular_bug+0x28c/0x370
[   26.157601]  check_noncircular+0x140/0x150
[   26.159830]  __lock_acquire+0x12bc/0x1eb8
[   26.161949]  lock_acquire+0x1e0/0x358
[   26.163990]  down_read_nested+0x54/0x160
[   26.166172]  pciehp_configure_device+0xe4/0x1a0
[   26.168586]  pciehp_handle_presence_or_link_change+0x1e0/0x4a0
[   26.171755]  pciehp_ist+0x21c/0x268
[   26.173672]  irq_thread_fn+0x34/0xb8
[   26.175523]  irq_thread+0x154/0x2d0
[   26.177336]  kthread+0x108/0x120
[   26.179000]  ret_from_fork+0x10/0x20

I don't think that this could actually happen since this is only called by a
single irq thread but this splat is kinda annoying and pciehp_configure_device()
doesn't seem to do much that needs the reset_lock. How about this?

---->8
[PATCH] pciehp: fix lockdep warning

Call pciehp_configure_device() without reset_lock being held to
fix the following lockdep warning. The only action that seems to
require the reset_lock is writing to ctrl->dsn, so move that to
the caller that holds the lock.

[   26.019646] WARNING: possible circular locking dependency detected
[   26.022785] 6.12.0-rc6+ torvalds#176 Not tainted
[   26.024776] ------------------------------------------------------
[   26.027909] irq/50-pciehp/57 is trying to acquire lock:
[   26.030559] ffff0000c02ad700 (&ctrl->reset_lock){.+.+}-{3:3}, at: pciehp_configure_device+0xe4/0x1a0
[   26.035423]
[   26.035423] but task is already holding lock:
[   26.038505] ffff800082f819f8 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_lock_rescan_remove+0x24/0x38
[   26.043512]
[   26.043512] which lock already depends on the new lock.

Signed-off-by: Sebastian Ott <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant