Compile fixes for vc4-kms-v3d-rpi2 #4

gohai · 2015-06-18T20:22:00Z

Signed-off-by: Gottfried Haider [email protected]

Signed-off-by: popcornmix <[email protected]> Signed-off-by: Noralf Trønnes <[email protected]>

Signed-off-by: popcornmix <[email protected]> BCM270x: power: Change initcall level to subsys Load ordering of modules are determined by the initcall used. If it's the same initcall level, makefile ordering decides. Now that the mailbox driver is being moved, it's no longer placed before the power driver by the linker. So use a later initcall level to let the mailbox driver load first. Signed-off-by: Noralf Trønnes <[email protected]> BCM270x: Move power module Make the power module available on ARCH_BCM2835 by moving it. The module turns on USB power making it possible to boot ARCH_BCM2835 directly with the VC bootloader. Signed-off-by: Noralf Trønnes <[email protected]>

Signed-off-by: popcornmix <[email protected]> bcm2708: Add extension to configure internal pulls The bcm2708 gpio controller supports internal pulls to be used as pull-up, pull-down or being entirely disabled. As it can be useful for a driver to change the pull configuration from it's default pull-down state, add an extension which allows configuring the pull per gpio. Signed-off-by: Julian Scheel <[email protected]> bcm2708-gpio: Revert the use of pinctrl_request_gpio In non-DT systems, pinctrl_request_gpio always fails causing "requests probe deferral" messages. In DT systems, it isn't useful because the reference counting is independent of the normal pinctrl pin reservations. gpio: Only clear the currently occurring interrupt. Avoids losing interrupts See: linux raspberrypi#760 bcm2708_gpio: Avoid calling irq_unmask for all interrupts When setting up the interrupts, specify that the handle_simple_irq handler should be used. This leaves interrupt acknowledgement to the caller, and prevents irq_unmask from being called for all interrupts. Issue: linux raspberrypi#760

Signed-off-by: popcornmix <[email protected]> Copy the arch vcio.c driver to drivers/mailbox. This is done to make it available on ARCH_BCM2835. Signed-off-by: Noralf Trønnes <[email protected]> mailbox: bcm2708-vcio: Allocation does not need to be atomic No need to do atomic allocation in a context that can sleep. Signed-off-by: Noralf Trønnes <[email protected]> mailbox: bcm2708-vcio: Check the correct status register before writing With the VC reader blocked and the ARM writing, MAIL0_STA reads empty permanently while MAIL1_STA goes from empty (0x40000000) to non-empty (0x00000001-0x00000007) to full (0x80000008). Suggested-by: Phil Elwell <[email protected]> Signed-off-by: Noralf Trønnes <[email protected]>

Signed-off-by: popcornmix <[email protected]> usb: dwc: fix lockdep false positive Signed-off-by: Kari Suvanto <[email protected]> usb: dwc: fix inconsistent lock state Signed-off-by: Kari Suvanto <[email protected]> Add FIQ patch to dwc_otg driver. Enable with dwc_otg.fiq_fix_enable=1. Should give about 10% more ARM performance. Thanks to Gordon and Costas Avoid dynamic memory allocation for channel lock in USB driver. Thanks ddv2005. Add NAK holdoff scheme. Enabled by default, disable with dwc_otg.nak_holdoff_enable=0. Thanks gsh Make sure we wait for the reset to finish dwc_otg: fix bug in dwc_otg_hcd.c resulting in silent kernel memory corruption, escalating to OOPS under high USB load. dwc_otg: Fix unsafe access of QTD during URB enqueue In dwc_otg_hcd_urb_enqueue during qtd creation, it was possible that the transaction could complete almost immediately after the qtd was assigned to a host channel during URB enqueue, which meant the qtd pointer was no longer valid having been completed and removed. Usually, this resulted in an OOPS during URB submission. By predetermining whether transactions need to be queued or not, this unsafe pointer access is avoided. This bug was only evident on the Pi model A where a device was attached that had no periodic endpoints (e.g. USB pendrive or some wlan devices). dwc_otg: Fix incorrect URB allocation error handling If the memory allocation for a dwc_otg_urb failed, the kernel would OOPS because for some reason a member of the *unallocated* struct was set to zero. Error handling changed to fail correctly. dwc_otg: fix potential use-after-free case in interrupt handler If a transaction had previously aborted, certain interrupts are enabled to track error counts and reset where necessary. On IN endpoints the host generates an ACK interrupt near-simultaneously with completion of transfer. In the case where this transfer had previously had an error, this results in a use-after-free on the QTD memory space with a 1-byte length being overwritten to 0x00. dwc_otg: add handling of SPLIT transaction data toggle errors Previously a data toggle error on packets from a USB1.1 device behind a TT would result in the Pi locking up as the driver never handled the associated interrupt. Patch adds basic retry mechanism and interrupt acknowledgement to cater for either a chance toggle error or for devices that have a broken initial toggle state (FT8U232/FT232BM). dwc_otg: implement tasklet for returning URBs to usbcore hcd layer The dwc_otg driver interrupt handler for transfer completion will spend a very long time with interrupts disabled when a URB is completed - this is because usb_hcd_giveback_urb is called from within the handler which for a USB device driver with complicated processing (e.g. webcam) will take an exorbitant amount of time to complete. This results in missed completion interrupts for other USB packets which lead to them being dropped due to microframe overruns. This patch splits returning the URB to the usb hcd layer into a high-priority tasklet. This will have most benefit for isochronous IN transfers but will also have incidental benefit where multiple periodic devices are active at once. dwc_otg: fix NAK holdoff and allow on split transactions only This corrects a bug where if a single active non-periodic endpoint had at least one transaction in its qh, on frnum == MAX_FRNUM the qh would get skipped and never get queued again. This would result in a silent device until error detection (automatic or otherwise) would either reset the device or flush and requeue the URBs. Additionally the NAK holdoff was enabled for all transactions - this would potentially stall a HS endpoint for 1ms if a previous error state enabled this interrupt and the next response was a NAK. Fix so that only split transactions get held off. dwc_otg: Call usb_hcd_unlink_urb_from_ep with lock held in completion handler usb_hcd_unlink_urb_from_ep must be called with the HCD lock held. Calling it asynchronously in the tasklet was not safe (regression in c4564d4). This change unlinks it from the endpoint prior to queueing it for handling in the tasklet, and also adds a check to ensure the urb is OK to be unlinked before doing so. NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb when a USB device was unplugged/replugged during data transfer. This effect was reproduced using automated USB port power control, hundreds of replug events were performed during active transfers to confirm that the problem was eliminated. USB fix using a FIQ to implement split transactions This commit adds a FIQ implementaion that schedules the split transactions using a FIQ so we don't get held off by the interrupt latency of Linux dwc_otg: fix device attributes and avoid kernel warnings on boot dcw_otg: avoid logging function that can cause panics See: raspberrypi/firmware#21 Thanks to cleverca22 for fix dwc_otg: mask correct interrupts after transaction error recovery The dwc_otg driver will unmask certain interrupts on a transaction that previously halted in the error state in order to reset the QTD error count. The various fine-grained interrupt handlers do not consider that other interrupts besides themselves were unmasked. By disabling the two other interrupts only ever enabled in DMA mode for this purpose, we can avoid unnecessary function calls in the IRQ handler. This will also prevent an unneccesary FIQ interrupt from being generated if the FIQ is enabled. dwc_otg: fiq: prevent FIQ thrash and incorrect state passing to IRQ In the case of a transaction to a device that had previously aborted due to an error, several interrupts are enabled to reset the error count when a device responds. This has the side-effect of making the FIQ thrash because the hardware will generate multiple instances of a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the associated interrupts. Additionally, on non-split transactions make sure that only unmasked interrupts are cleared. This caused a hard-to-trigger but serious race condition when you had the combination of an endpoint awaiting error recovery and a transaction completed on an endpoint - due to the sequencing and timing of interrupts generated by the dwc_otg core, it was possible to confuse the IRQ handler. Fix function tracing dwc_otg: whitespace cleanup in dwc_otg_urb_enqueue dwc_otg: prevent OOPSes during device disconnects The dwc_otg_urb_enqueue function is thread-unsafe. In particular the access of urb->hcpriv, usb_hcd_link_urb_to_ep, dwc_otg_urb->qtd and friends does not occur within a critical section and so if a device was unplugged during activity there was a high chance that the usbcore hub_thread would try to disable the endpoint with partially- formed entries in the URB queue. This would result in BUG() or null pointer dereferences. Fix so that access of urb->hcpriv, enqueuing to the hardware and adding to usbcore endpoint URB lists is contained within a single critical section. dwc_otg: prevent BUG() in TT allocation if hub address is > 16 A fixed-size array is used to track TT allocation. This was previously set to 16 which caused a crash because dwc_otg_hcd_allocate_port would read past the end of the array. This was hit if a hub was plugged in which enumerated as addr > 16, due to previous device resets or unplugs. Also add #ifdef FIQ_DEBUG around hcd->hub_port_alloc[], which grows to a large size if 128 hub addresses are supported. This field is for debug only for tracking which frame an allocate happened in. dwc_otg: make channel halts with unknown state less damaging If the IRQ received a channel halt interrupt through the FIQ with no other bits set, the IRQ would not release the host channel and never complete the URB. Add catchall handling to treat as a transaction error and retry. dwc_otg: fiq_split: use TTs with more granularity This fixes certain issues with split transaction scheduling. - Isochronous multi-packet OUT transactions now hog the TT until they are completed - this prevents hubs aborting transactions if they get a periodic start-split out-of-order - Don't perform TT allocation on non-periodic endpoints - this allows simultaneous use of the TT's bulk/control and periodic transaction buffers This commit will mainly affect USB audio playback. dwc_otg: fix potential sleep while atomic during urb enqueue Fixes a regression introduced with eb1b482. Kmalloc called from dwc_otg_hcd_qtd_add / dwc_otg_hcd_qtd_create did not always have the GPF_ATOMIC flag set. Force this flag when inside the larger critical section. dwc_otg: make fiq_split_enable imply fiq_fix_enable Failing to set up the FIQ correctly would result in "IRQ 32: nobody cared" errors in dmesg. dwc_otg: prevent crashes on host port disconnects Fix several issues resulting in crashes or inconsistent state if a Model A root port was disconnected. - Clean up queue heads properly in kill_urbs_in_qh_list by removing the empty QHs from the schedule lists - Set the halt status properly to prevent IRQ handlers from using freed memory - Add fiq_split related cleanup for saved registers - Make microframe scheduling reclaim host channels if active during a disconnect - Abort URBs with -ESHUTDOWN status response, informing device drivers so they respond in a more correct fashion and don't try to resubmit URBs - Prevent IRQ handlers from attempting to handle channel interrupts if the associated URB was dequeued (and the driver state was cleared) dwc_otg: prevent leaking URBs during enqueue A dwc_otg_urb would get leaked if the HCD enqueue function failed for any reason. Free the URB at the appropriate points. dwc_otg: Enable NAK holdoff for control split transactions Certain low-speed devices take a very long time to complete a data or status stage of a control transaction, producing NAK responses until they complete internal processing - the USB2.0 spec limit is up to 500mS. This causes the same type of interrupt storm as seen with USB-serial dongles prior to c8edb23. In certain circumstances, usually while booting, this interrupt storm could cause SD card timeouts. dwc_otg: Fix for occasional lockup on boot when doing a USB reset dwc_otg: Don't issue traffic to LS devices in FS mode Issuing low-speed packets when the root port is in full-speed mode causes the root port to stop responding. Explicitly fail when enqueuing URBs to a LS endpoint on a FS bus. Fix ARM architecture issue with local_irq_restore() If local_fiq_enable() is called before a local_irq_restore(flags) where the flags variable has the F bit set, the FIQ will be erroneously disabled. Fixup arch_local_irq_restore to avoid trampling the F bit in CPSR. Also fix some of the hacks previously implemented for previous dwc_otg incarnations. dwc_otg: fiq_fsm: Base commit for driver rewrite This commit removes the previous FIQ fixes entirely and adds fiq_fsm. This rewrite features much more complete support for split transactions and takes into account several OTG hardware bugs. High-speed isochronous transactions are also capable of being performed by fiq_fsm. All driver options have been removed and replaced with: - dwc_otg.fiq_enable (bool) - dwc_otg.fiq_fsm_enable (bool) - dwc_otg.fiq_fsm_mask (bitmask) - dwc_otg.nak_holdoff (unsigned int) Defaults are specified such that fiq_fsm behaves similarly to the previously implemented FIQ fixes. fiq_fsm: Push error recovery into the FIQ when fiq_fsm is used If the transfer associated with a QTD failed due to a bus error, the HCD would retry the transfer up to 3 times (implementing the USB2.0 three-strikes retry in software). Due to the masking mechanism used by fiq_fsm, it is only possible to pass a single interrupt through to the HCD per-transfer. In this instance host channels would fall off the radar because the error reset would function, but the subsequent channel halt would be lost. Push the error count reset into the FIQ handler. fiq_fsm: Implement timeout mechanism For full-speed endpoints with a large packet size, interrupt latency runs the risk of the FIQ starting a transaction too late in a full-speed frame. If the device is still transmitting data when EOF2 for the downstream frame occurs, the hub will disable the port. This change is not reflected in the hub status endpoint and the device becomes unresponsive. Prevent high-bandwidth transactions from being started too late in a frame. The mechanism is not guaranteed: a combination of bit stuffing and hub latency may still result in a device overrunning. fiq_fsm: fix bounce buffer utilisation for Isochronous OUT Multi-packet isochronous OUT transactions were subject to a few bounday bugs. Fix them. Audio playback is now much more robust: however, an issue stands with devices that have adaptive sinks - ALSA plays samples too fast. dwc_otg: Return full-speed frame numbers in HS mode The frame counter increments on every *microframe* in high-speed mode. Most device drivers expect this number to be in full-speed frames - this caused considerable confusion to e.g. snd_usb_audio which uses the frame counter to estimate the number of samples played. fiq_fsm: save PID on completion of interrupt OUT transfers Also add edge case handling for interrupt transports. Note that for periodic split IN, data toggles are unimplemented in the OTG host hardware - it unconditionally accepts any PID. fiq_fsm: add missing case for fiq_fsm_tt_in_use() Certain combinations of bitrate and endpoint activity could result in a periodic transaction erroneously getting started while the previous Isochronous OUT was still active. fiq_fsm: clear hcintmsk for aborted transactions Prevents the FIQ from erroneously handling interrupts on a timed out channel. fiq_fsm: enable by default fiq_fsm: fix dequeues for non-periodic split transactions If a dequeue happened between the SSPLIT and CSPLIT phases of the transaction, the HCD would never receive an interrupt. fiq_fsm: Disable by default fiq_fsm: Handle HC babble errors The HCTSIZ transfer size field raises a babble interrupt if the counter wraps. Handle the resulting interrupt in this case. dwc_otg: fix interrupt registration for fiq_enable=0 Additionally make the module parameter conditional for wherever hcd->fiq_state is touched. fiq_fsm: Enable by default dwc_otg: Fix various issues with root port and transaction errors Process the host port interrupts correctly (and don't trample them). Root port hotplug now functional again. Fix a few thinkos with the transaction error passthrough for fiq_fsm. fiq_fsm: Implement hack for Split Interrupt transactions Hubs aren't too picky about which endpoint we send Control type split transactions to. By treating Interrupt transfers as Control, it is possible to use the non-periodic queue in the OTG core as well as the non-periodic FIFOs in the hub itself. This massively reduces the microframe exclusivity/contention that periodic split transactions otherwise have to enforce. It goes without saying that this is a fairly egregious USB specification violation, but it works. Original idea by Hans Petter Selasky @ FreeBSD.org. dwc_otg: FIQ support on SMP. Set up FIQ stack and handler on Core 0 only. dwc_otg: introduce fiq_fsm_spin(un|)lock() SMP safety for the FIQ relies on register read-modify write cycles being completed in the correct order. Several places in the DWC code modify registers also touched by the FIQ. Protect these by a bare-bones lock mechanism. This also makes it possible to run the FIQ and IRQ handlers on different cores. fiq_fsm: fix build on bcm2708 and bcm2709 platforms dwc_otg: put some barriers back where they should be for UP bcm2709/dwc_otg: Setup FIQ on core 1 if >1 core active dwc_otg: fixup read-modify-write in critical paths Be more careful about read-modify-write on registers that the FIQ also touches. Guard fiq_fsm_spin_lock with fiq_enable check fiq_fsm: Falling out of the state machine isn't fatal This edge case can be hit if the port is disabled while the FIQ is in the middle of a transaction. Make the effects less severe. Also get rid of the useless return value. squash: dwc_otg: Allow to build without SMP usb: core: make overcurrent messages more prominent Hub overcurrent messages are more serious than "debug". Increase loglevel. usb: dwc_otg: Don't use dma_to_virt() Commit 6ce0d20 changes dma_to_virt() which breaks this driver. Open code the old dma_to_virt() implementation to work around this. Limit the use of __bus_to_virt() to cases where transfer_buffer_length is set and transfer_buffer is not set. This is done to increase the chance that this driver will also work on ARCH_BCM2835. transfer_buffer should not be NULL if the length is set, but the comment in the code indicates that there are situations where this might happen. drivers/usb/isp1760/isp1760-hcd.c also has a similar comment pointing to a possible: 'usb storage / SCSI bug'. Signed-off-by: Noralf Trønnes <[email protected]> dwc_otg: Fix crash when fiq_enable=0

Signed-off-by: popcornmix <[email protected]>

Signed-off-by: popcornmix <[email protected]> bcm2708_fb : Implement blanking support using the mailbox property interface bcm2708_fb: Add pan and vsync controls bcm2708_fb: DMA acceleration for fb_copyarea Based on http://www.raspberrypi.org/phpBB3/viewtopic.php?p=62425#p62425 Also used Simon's dmaer_master module as a reference for tweaking DMA settings for better performance. For now busylooping only. IRQ support might be added later. With non-overclocked Raspberry Pi, the performance is ~360 MB/s for simple copy or ~260 MB/s for two-pass copy (used when dragging windows to the right). In the case of using DMA channel 0, the performance improves to ~440 MB/s. For comparison, VFP optimized CPU copy can only do ~114 MB/s in the same conditions (hindered by reading uncached source buffer). Signed-off-by: Siarhei Siamashka <[email protected]> bcm2708_fb: report number of dma copies Add a counter (exported via debugfs) reporting the number of dma copies that the framebuffer driver has done, in order to help evaluate different optimization strategies. Signed-off-by: Luke Diamand <[email protected]> bcm2708_fb: use IRQ for DMA copies The copyarea ioctl() uses DMA to speed things along. This was busy-waiting for completion. This change supports using an interrupt instead for larger transfers. For small transfers, busy-waiting is still likely to be faster. Signed-off-by: Luke Diamand <[email protected]> bcm2708: Make ioctl logging quieter video: fbdev: bcm2708_fb: Don't panic on error No need to panic the kernel if the video driver fails. Just print a message and return an error. Signed-off-by: Noralf Trønnes <[email protected]> fbdev: bcm2708_fb: Add ARCH_BCM2835 support Add Device Tree support. Pass the device to dma_alloc_coherent() in order to get the correct bus address on ARCH_BCM2835. Use the new DMA legacy API header file. Including <mach/platform.h> is not necessary. Signed-off-by: Noralf Trønnes <[email protected]> BCM270x_DT: Add bcm2708-fb device Add bcm2708-fb to Device Tree and don't add the platform device when booting in DT mode. Signed-off-by: Noralf Trønnes <[email protected]>

Add support for DMA controller of BCM2708 as used in the Raspberry Pi. Currently it only supports cyclic DMA. Signed-off-by: Florian Meier <[email protected]> dmaengine: expand functionality by supporting scatter/gather transfers sdhci-bcm2708 and dma.c: fix for LITE channels DMA: fix cyclic LITE length overflow bug dmaengine: bcm2708: Remove chancnt affectations Mirror bcm2835-dma.c commit 9eba553: chancnt is already filled by dma_async_device_register, which uses the channel list to know how much channels there is. Since it's already filled, we can safely remove it from the drivers' probe function. Signed-off-by: Noralf Trønnes <[email protected]> dmaengine: bcm2708: overwrite dreq only if it is not set dreq is set when the DMA channel is fetched from Device Tree. slave_id is set using dmaengine_slave_config(). Only overwrite dreq with slave_id if it is not set. dreq/slave_id in the cyclic DMA case is not touched, because I don't have hardware to test with. Signed-off-by: Noralf Trønnes <[email protected]> dmaengine: bcm2708: do device registration in the board file Don't register the device in the driver. Do it in the board file. Signed-off-by: Noralf Trønnes <[email protected]> dmaengine: bcm2708: don't restrict DT support to ARCH_BCM2835 Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now. Add Device Tree support to the non ARCH_BCM2835 case. Use the same driver name regardless of architecture. Signed-off-by: Noralf Trønnes <[email protected]> BCM270x_DT: add bcm2835-dma entry Add Device Tree entry for bcm2835-dma. The entry doesn't contain any resources since they are handled by the arch/arm/mach-bcm270x/dma.c driver. In non-DT mode, don't add the device in the board file. Signed-off-by: Noralf Trønnes <[email protected]> bcm2708-dmaengine: Add debug options BCM270x: Add memory and irq resources to dmaengine device and DT Prepare for merging of the legacy DMA API arch driver dma.c with bcm2708-dmaengine by adding memory and irq resources both to platform file device and Device Tree node. Don't use BCM_DMAMAN_DRIVER_NAME so we don't have to include mach/dma.h Signed-off-by: Noralf Trønnes <[email protected]> dmaengine: bcm2708: Merge with arch dma.c driver and disable dma.c Merge the legacy DMA API driver with bcm2708-dmaengine. This is done so we can use bcm2708_fb on ARCH_BCM2835 (mailbox driver is also needed). Changes to the dma.c code: - Use BIT() macro. - Cutdown some comments to one line. - Add mutex to vc_dmaman and use this, since the dev lock is locked during probing of the engine part. - Add global g_dmaman variable since drvdata is used by the engine part. - Restructure for readability: vc_dmaman_chan_alloc() vc_dmaman_chan_free() bcm_dma_chan_free() - Restructure bcm_dma_chan_alloc() to simplify error handling. - Use device irq resources instead of hardcoded bcm_dma_irqs table. - Remove dev_dmaman_register() and code it directly. - Remove dev_dmaman_deregister() and code it directly. - Simplify bcm_dmaman_probe() using devm_* functions. - Get dmachans from DT if available. - Keep 'dma.dmachans' module argument name for backwards compatibility. Make it available on ARCH_BCM2835 as well. Signed-off-by: Noralf Trønnes <[email protected]> dmaengine: bcm2708: set residue_granularity field bcm2708-dmaengine supports residue reporting at burst level but didn't report this via the residue_granularity field. Without this field set properly we get playback issues with I2S cards.

mmc: Disable CMD23 transfers on all cards Pending wire-level investigation of these types of transfers and associated errors on bcm2835-mmc, disable for now. Fallback of CMD18/CMD25 transfers will be used automatically by the MMC layer. Reported/Tested-by: Gellert Weisz <[email protected]> mmc: bcm2835-mmc: enable DT support for all architectures Both ARCH_BCM2835 and ARCH_BCM270x are built with OF now. Enable Device Tree support for all architectures. Signed-off-by: Noralf Trønnes <[email protected]> mmc: bcm2835-mmc: fix probe error handling Probe error handling is broken in several places. Simplify error handling by using device managed functions. Replace pr_{err,info} with dev_{err,info}. Signed-off-by: Noralf Trønnes <[email protected]> bcm2835-mmc: Add locks when accessing sdhost registers bcm2835-mmc: Add range of debug options for slowing things down bcm2835-mmc: Add option to disable some delays bcm2835-mmc: Add option to disable MMC_QUIRK_BLK_NO_CMD23 bcm2835-mmc: Default to disabling MMC_QUIRK_BLK_NO_CMD23 bcm2835-mmc: Adding overclocking option Allow a different clock speed to be substitued for a requested 50MHz. This option is exposed using the "overclock_50" DT parameter. Note that the mmc interface is restricted to EVEN integer divisions of 250MHz, and the highest sensible option is 63 (250/4 = 62.5), the next being 125 (250/2) which is much too high. Use at your own risk. bcm2835-mmc: Round up the overclock, so 62 works for 62.5Mhz Also only warn once for each overclock setting. mmc: bcm2835-mmc: Make available on ARCH_BCM2835 Make the bcm2835-mmc driver available for use on ARCH_BCM2835. Signed-off-by: Noralf Trønnes <[email protected]> BCM270x_DT: add bcm2835-mmc entry Add Device Tree entry for bcm2835-mmc. In non-DT mode, don't add the device in the board file. Signed-off-by: Noralf Trønnes <[email protected]>

BCM2835 has two SD card interfaces. This driver uses the other one. bcm2835-sdhost: Error handling fix, and code clarification bcm2835-sdhost: Adding overclocking option Allow a different clock speed to be substitued for a requested 50MHz. This option is exposed using the "overclock_50" DT parameter. Note that the sdhost interface is restricted to integer divisions of core_freq, and the highest sensible option for a core_freq of 250MHz is 84 (250/3 = 83.3MHz), the next being 125 (250/2) which is much too high. Use at your own risk. bcm2835-sdhost: Round up the overclock, so 62 works for 62.5Mhz Also only warn once for each overclock setting.

Signed-off-by: popcornmix <[email protected]> vc_cma: Make the vc_cma area the default contiguous DMA area

Signed-off-by: popcornmix <[email protected]> alsa: add mmap support and some cleanups to bcm2835 ALSA driver snd-bcm2835: Add support for spdif/hdmi passthrough This adds a dedicated subdevice which can be used for passthrough of non-audio formats (ie encoded a52) through the hdmi audio link. In addition to this driver extension an appropriate card config is required to make alsa-lib support the AES parameters for this device. snd-bcm2708: Add mutex, improve logging Fix for ALSA driver crash Avoids an issue when closing and opening vchiq where a message can arrive before service handle has been written alsa: reduce severity of expected warning message snd-bcm2708: Fix dmesg spam for non-error case alsa: Ensure mutexes are released through error paths alsa: Make interrupted close paths quieter BCM270x: Add onboard sound device to Device Tree Add Device Tree support to alsa driver. Add device to Device Tree. Don't add platform devices when booting in DT mode. Signed-off-by: Noralf Trønnes <[email protected]>

Signed-off-by: popcornmix <[email protected]> vchiq: create_pagelist copes with vmalloc memory Signed-off-by: Daniel Stone <[email protected]> vchiq: fix the shim message release Signed-off-by: Daniel Stone <[email protected]> vchiq: export additional symbols Signed-off-by: Daniel Stone <[email protected]> VCHIQ: Make service closure fully synchronous (drv) This is one half of a two-part patch, the other half of which is to the vchiq_lib user library. With these patches, calls to vchiq_close_service and vchiq_remove_service won't return until any associated callbacks have been delivered to the callback thread. VCHIQ: Add per-service tracing The new service option VCHIQ_SERVICE_OPTION_TRACE is a boolean that toggles tracing for the specified service. This commit also introduces vchi_service_set_option and the associated option VCHI_SERVICE_OPTION_TRACE. vchiq: Make the synchronous-CLOSE logic more tolerant vchiq: Move logging control into debugfs vchiq: Take care of a corner case tickled by VCSM Closing a connection that isn't fully open requires care, since one side does not know the other side's port number. Code was present to handle the case where a CLOSE is sent immediately after an OPEN, i.e. before the OPENACK has been received, but this was incorrectly being used when an OPEN from a client using port 0 was rejected. (In the observed failure, the host was attempting to use the VCSM service, which isn't present in the 'cutdown' firmware. The failure was intermittent because sometimes the keepalive service would grab port 0.) This case can be distinguished because the client's remoteport will still be VCHIQ_PORT_FREE, and the srvstate will be OPENING. Either condition is sufficient to differentiate it from the special case described above. vchiq: Avoid high load when blocked and unkillable vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work vchiq_arm: Complete support for SYNCHRONOUS mode vchiq: Remove inline from suspend/resume vchiq: Allocation does not need to be atomic vchiq: Fix wrong condition check The log level is checked from within the log call. Remove the check in the call. Signed-off-by: Pranith Kumar <[email protected]> BCM270x: Add vchiq device to platform file and Device Tree Prepare to turn the vchiq module into a driver. Signed-off-by: Noralf Trønnes <[email protected]> bcm2708: vchiq: Add Device Tree support Turn vchiq into a driver and stop hardcoding resources. Use devm_* functions in probe path to simplify cleanup. A global variable is used to hold the register address. This is done to keep this patch as small as possible. Also make available on ARCH_BCM2835. Based on work by Lubomir Rintel. Signed-off-by: Noralf Trønnes <[email protected]> vchiq: Change logging level for inbound data

Signed-off-by: popcornmix <[email protected]> BCM270x: Move vc_mem Make the vc_mem module available for ARCH_BCM2835 by moving it. Signed-off-by: Noralf Trønnes <[email protected]>

Add experimental support for the VideoCore shared memory service. This allows user processes to allocate memory from VideoCore's GPU relocatable heap and mmap the buffers. Additionally, the memory handles can passed to other VideoCore services such as MMAL, OpenMax and DispmanX TODO * This driver was originally released for BCM28155 which has a different cache architecture to BCM2835. Consequently, in this release only uncached mappings are supported. However, there's no fundamental reason which cached mappings cannot be support or BCM2835 * More refactoring is required to remove the typedefs. * Re-enable the some of the commented out debug-fs statistics which were disabled when migrating code from proc-fs. * There's a lot of code to support sharing of VCSM in order to support Android. This could probably done more cleanly or perhaps just removed. Signed-off-by: Tim Gover <[email protected]> config: Disable VC_SM for now to fix hang with cutdown kernel vcsm: Use boolean as it cannot be built as module On building the bcm_vc_sm as a module we get the following error: v7_dma_flush_range and do_munmap are undefined in vc-sm.ko. Fix by making it not an option to build as module vcsm: Add ioctl for custom cache flushing

lirc_rpi: Use read_current_timer to determine transmitter delay. Thanks to jjmz and others See: raspberrypi#525 lirc: Remove restriction on gpio pins that can be used with lirc Compute Module, for example could use different pins lirc_rpi: Add parameter to specify input pin pull Depending on the connected IR circuitry it might be desirable to change the gpios internal pull from it pull-down default behaviour. Add a module parameter to allow the user to set it explicitly. Signed-off-by: Julian Scheel <[email protected]> lirc-rpi: Use the higher-level irq control functions This module used to access the irq_chip methods of the gpio controller directly, rather than going through the standard enable_irq/irq_set_irq_type functions. This caused problems on pinctrl-bcm2835 which only implements the irq_enable/disable methods and not irq_unmask/mask. lirc-rpi: Correct the interrupt usage 1) Correct the use of enable_irq (i.e. don't call it so often) 2) Correct the shutdown sequence. 3) Avoid a bcm2708_gpio driver quirk by setting the irq flags earlier lirc-rpi: use getnstimeofday instead of read_current_timer read_current_timer isn't guaranteed to return values in microseconds, and indeed it doesn't on a Pi2. Issue: linux#827 lirc-rpi: Add device tree support, and a suitable overlay The overlay supports DT parameters that match the old module parameters, except that gpio_in_pull should be set using the strings "up", "down" or "off". lirc-rpi: Also support pinctrl-bcm2835 in non-DT mode

Signed-off-by: popcornmix <[email protected]>

BCM270x: Move thermal sensor to Device Tree Add Device Tree support to bcm2835-thermal driver. Add thermal sensor device to Device Tree. Don't add platform device when booting in DT mode. Signed-off-by: Noralf Trønnes <[email protected]>

spi: bcm2708: add device tree support Add DT support to driver and add to .dtsi file. Setup pins and spidev in .dts file. SPI is disabled by default. Signed-off-by: Noralf Tronnes <[email protected]> BCM2708: don't register SPI controller when using DT The device for the SPI controller is in the Device Tree. Only register the device when not using DT. Signed-off-by: Noralf Tronnes <[email protected]> spi: bcm2835: make driver available on ARCH_BCM2708 Make this driver available on ARCH_BCM2708 Signed-off-by: Noralf Tronnes <[email protected]> bcm2708: Remove the prohibition on mixing SPIDEV and DT spi-bcm2708: Prepare for Common Clock Framework migration As part of migrating to use the Common Clock Framework, replace clk_enable() with clk_prepare_enable() and clk_disable() with clk_disable_unprepare(). This does not affect behaviour under the current clock implementation. Also add a missing clk_disable_unprepare() in the probe error path. Signed-off-by: Noralf Tronnes <[email protected]> spi: bcm2835: fix all checkpath --strict messages The following errors/warnings issued by checkpatch.pl --strict have been fixed: drivers/spi/spi-bcm2835.c:182: CHECK: Alignment should match open parenthesis drivers/spi/spi-bcm2835.c:191: CHECK: braces {} should be used on all arms of this statement drivers/spi/spi-bcm2835.c:234: CHECK: Alignment should match open parenthesis drivers/spi/spi-bcm2835.c:256: CHECK: Alignment should match open parenthesis drivers/spi/spi-bcm2835.c:271: CHECK: Alignment should match open parenthesis drivers/spi/spi-bcm2835.c:346: CHECK: Alignment should match open parenthesis total: 0 errors, 0 warnings, 6 checks, 403 lines checked In 2 locations the arguments had to get split/moved to the next line so that the line width stays below 80 chars. Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: fill/drain SPI-fifo as much as possible during interrupt Implement the recommendation from the BCM2835 data-sheet with regards to polling drivers to fill/drain the FIFO as much data as possible also for the interrupt-driven case (which this driver is making use of). This means that for long transfers (>64bytes) we need one interrupt every 64 bytes instead of every 12 bytes, as the FIFO is 16 words (not bytes) wide. Tested with mcp251x (can bus), fb_st7735 (TFT framebuffer device) and enc28j60 (ethernet) drivers. Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: clock divider can be a multiple of 2 The official documentation is wrong in this respect. Has been tested empirically for dividers 2-1024 Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: enable support of 3-wire mode Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: move to the transfer_one driver model This also allows for GPIO-CS to get used removing the limitation of 2/3 SPI devises on the SPI bus. Fixes: spi-cs-high with native CS with multiple devices on the spi-bus resetting the chip selects to "normal" polarity after a finished transfer. No other functionality/improvements added. Tested with the following 4 devices on the spi-bus: * mcp2515 with native CS * mcp2515 with gpio CS * fb_st7735r with native CS (plus spi-cs-high via transistor inverting polarity) * enc28j60 with gpio-CS Tested-by: Martin Sperl <[email protected]> Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: fix code formatting issue Signed-off-by: Martin Sperl <[email protected]> Tested-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: fill FIFO before enabling interrupts to reduce interrupts/message To reduce the number of interrupts/message we fill the FIFO before enabling interrupts - for short messages this reduces the interrupt count from 2 to 1 interrupt. There have been rare cases where short (<200ns) chip-select switches with native CS have been observed during such operation, this is why this optimization is only enabled for GPIO-CS. Signed-off-by: Martin Sperl <[email protected]> Tested-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: transform native-cs to gpio-cs on first spi_setup Transforms the bcm-2835 native SPI-chip select to their gpio-cs equivalent. This allows for some support of some optimizations that are not possible due to HW-gliches on the CS line - especially filling the FIFO before enabling SPI interrupts (by writing to CS register) while the transfer is already in progress (See commit: e3a2be3) This patch also works arround some issues in bcm2835-pinctrl which does not set the value when setting the GPIO as output - it just sets up output and (typically) leaves the GPIO as low. When a fix for this is merged then this gpio_set_value can get removed from bcm2835_spi_setup. Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: enabling polling mode for transfers shorter than 30us In cases of short transfer times the CPU is spending lots of time in the interrupt handler and scheduler to reschedule the worker thread. Measurements show that we have times where it takes 29.32us to between the last clock change and the time that the worker-thread is running again returning from wait_for_completion_timeout(). During this time the interrupt-handler is running calling complete() and then also the scheduler is rescheduling the worker thread. This time can vary depending on how much of the code is still in CPU-caches, when there is a burst of spi transfers the subsequent delays are in the order of 25us, so the value of 30us seems reasonable. With polling the whole transfer of 4 bytes at 10MHz finishes after 6.16us (CS down to up) with the real transfer (clock running) taking 3.56us. So the efficiency has much improved and is also freeing CPU cycles, reducing interrupts and context switches. Because of the above 30us seems to be a reasonable limit for polling. Signed-off-by: Martin Sperl <[email protected]> Signed-off-by: Mark Brown <[email protected]> spi: bcm2835: change timeout of polling driver to 1s The way that the timeout code is written in the polling function the timeout does also trigger when interrupted or rescheduled while in the polling loop. This patch changes the timeout from effectively 20ms (=2 jiffies) to 1 second and removes the time that the transfer really takes out of the computation, as - per design - this is <30us and the jiffie resolution is 10ms so that does not make any difference what so ever. Signed-off-by: Martin Sperl <[email protected]>

i2c-bcm2708: fixed baudrate Fixed issue where the wrong CDIV value was set for baudrates below 3815 Hz (for 250MHz bus clock). In that case the computed CDIV value was more than 0xffff. However the CDIV register width is only 16 bits. This resulted in incorrect setting of CDIV and higher baudrate than intended. Example: 3500Hz -> CDIV=0x11704 -> CDIV(16bit)=0x1704 -> 42430Hz After correction: 3500Hz -> CDIV=0x11704 -> CDIV(16bit)=0xffff -> 3815Hz The correct baudrate is shown in the log after the cdiv > 0xffff correction. Perform I2C combined transactions when possible Perform I2C combined transactions whenever possible, within the restrictions of the Broadcomm Serial Controller. Disable DONE interrupt during TA poll Prevent interrupt from being triggered if poll is missed and transfer starts and finishes. i2c: Make combined transactions optional and disabled by default i2c: bcm2708: add device tree support Add DT support to driver and add to .dtsi file. Setup pins in .dts file. i2c is disabled by default. Signed-off-by: Noralf Tronnes <[email protected]> bcm2708: don't register i2c controllers when using DT The devices for the i2c controllers are in the Device Tree. Only register devices when not using DT. Signed-off-by: Noralf Tronnes <[email protected]> I2C: Only register the I2C device for the current board revision i2c_bcm2708: Fix clock reference counting Fix grabbing lock from atomic context in i2c driver 2 main changes: - check for timeouts in the bcm2708_bsc_setup function as indicated by this comment: /* poll for transfer start bit (should only take 1-20 polls) */ This implies that the setup function can now fail so account for this everywhere it's called - Removed the clk_get_rate call from inside the setup function as it locks a mutex and that's not ok since we call it from under a spin lock. i2c-bcm2708: When using DT, leave the GPIO setup to pinctrl

- Supports raw YUV capture, preview, JPEG and H264. - Uses videobuf2 for data transfer, using dma_buf. - Uses 3.6.10 timestamping - Camera power based on use - Uses immutable input mode on video encoder Signed-off-by: Daniel Stone <[email protected]> Signed-off-by: Luke Diamand <[email protected]> V4L2: Fixes from 6by9 V4L2: Fix EV values. Add manual shutter speed control V4L2 EV values should be in units of 1/1000. Corrected. Add support for V4L2_CID_EXPOSURE_ABSOLUTE which should give manual shutter control. Requires manual exposure mode to be selected first. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Correct JPEG Q-factor range Should be 1-100, not 0-100 Signed-off-by: Dave Stevenson <[email protected]> V4L2: Fix issue of driver jamming if STREAMON failed. Fix issue where the driver was left in a partially enabled state if STREAMON failed, and would then reject many IOCTLs as it thought it was streaming. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Fix ISO controls. Driver was passing the index to the GPU, and not the desired ISO value. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add flicker avoidance controls Add support for V4L2_CID_POWER_LINE_FREQUENCY to set flicker avoidance frequencies. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add support for frame rate control. Add support for frame rate (or time per frame as V4L2 inverts it) control via s_parm. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Improve G_FBUF handling so we pass conformance Return some sane numbers for get framebuffer so that we pass conformance. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Fix information advertised through g_vidfmt Width and height were being stored based on incorrect values. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add support for inline H264 headers Add support for V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER to control H264 inline headers. Requires firmware fix to work correctly, otherwise format has to be set to H264 before this parameter is set. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Fix JPEG timestamp issue JPEG images were coming through from the GPU with timestamp of 0. Detect this and give current system time instead of some invalid value. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Fix issue when switching down JPEG resolution. JPEG buffer size calculation is based on input resolution. Input resolution was being configured after output port format. Caused failures if switching from one JPEG resolution to a smaller one. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Enable MJPEG encoding Requires GPU firmware update to support MJPEG encoder. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Correct flag settings for compressed formats Set flags field correctly on enum_fmt_vid_cap for compressed image formats. Signed-off-by: Dave Stevenson <[email protected]> V4L2: H264 profile & level ctrls, FPS control and auto exp pri Several control handling updates. H264 profile and level controls. Timeperframe/FPS reworked to add V4L2_CID_EXPOSURE_AUTO_PRIORITY to select whether AE is allowed to override the framerate specified. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Correct BGR24 to RGB24 in format table Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add additional pixel formats. Correct colourspace Adds the other flavours of YUYV, and NV12. Corrects the overlay advertised colourspace. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Drop logging msg from info to debug Signed-off-by: Dave Stevenson <[email protected]> V4L2: Initial pass at scene modes. Only supports exposure mode and metering modes. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add manual white balance control. Adds support for V4L2_CID_RED_BALANCE and V4L2_CID_BLUE_BALANCE. Only has an effect if V4L2_CID_AUTO_N_PRESET_WHITE_BALANCE has V4L2_WHITE_BALANCE_MANUAL selected. Signed-off-by: Dave Stevenson <[email protected]> config: Enable V4L / MMAL driver V4L2: Increase the MMAL timeout to 3sec MJPEG codec flush is now taking longer and results in a kernel panic if the driver has stopped waiting for the result when it finally completes. Increase the timeout value from 1 to 3secs. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add support for setting H264_I_PERIOD Adds support for the parameter V4L2_CID_MPEG_VIDEO_H264_I_PERIOD to set the frequency with which I frames are produced. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Enable GPU function for removing padding from images. GPU can now support arbitrary strides, although may require additional processing to achieve it. Enable this feature so that the images delivered are the size requested. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add support for V4L2_PIX_FMT_BGR32 Signed-off-by: Dave Stevenson <[email protected]> V4L2: Set the colourspace to avoid odd YUV-RGB conversions Removes the amiguity from the conversion routines and stops them dropping back to the SD vs HD choice of coeffs. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Make video/still threshold a run-time param Move the define for at what resolution the driver switches from a video mode capture to a stills mode capture to module parameters. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Fix incorrect pool sizing Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add option to disable enum_framesizes. Gstreamer's handling of a driver that advertises V4L2_FRMSIZE_TYPE_STEPWISE to define the supported resolutions is broken. See bug https://bugzilla.gnome.org/show_bug.cgi?id=726521 Optional parameter of gst_v4l2src_is_broken added. If non-zero, the driver claims not to support that ioctl, and gstreamer should be happy again (it guesses a set of defaults for itself). Signed-off-by: Dave Stevenson <[email protected]> V4L2: Add support for more image formats Adds YVU420 (YV12), YVU420SP (NV21), and BGR888. Signed-off-by: Dave Stevenson <[email protected]> V4L2: Extend range for V4L2_CID_MPEG_VIDEO_H264_I_PERIOD Request to extend the range from the fairly arbitrary 1000 frames (33 seconds at 30fps). Extend out to the max range supported (int32 value). Also allow 0, which is handled by the codec as only send an I-frame on the first frame and never again. There may be an exception if it detects a significant scene change, but there's no easy way around that. Signed-off-by: Dave Stevenson <[email protected]> bcm2835-camera: stop_streaming now has a void return BCM2835-V4L2: Fix compliance test failures VIDIOC_TRY_FMT and VIDIOC_S_FMT tests were faling due to reporting V4L2_COLORSPACE_JPEG when the colour format wasn't V4L2_PIX_FMT_JPEG. Now reports V4L2_COLORSPACE_SMPTE170M for YUV formats.

The Raspberry Pi firmware looks for a trailer on the kernel image to determine whether it was compiled with Device Tree support enabled. If the firmware finds a kernel without this trailer, or which has a trailer indicating that it isn't DT-capable, it disables DT support and reverts to using ATAGs. The mkknlimg utility adds that trailer, having first analysed the image to look for signs of DT support and the kernel version string. knlinfo displays the contents of the trailer in the given kernel image. scripts/mkknlimg: Add support for ARCH_BCM2835 Add a new trailer field indicating whether this is an ARCH_BCM2835 build, as opposed to MACH_BCM2708/9. If the loader finds this flag is set it changes the default base dtb file name from bcm270x... to bcm283y... Also update knlinfo to show the status of the field.

Add the bare minimum needed to boot BCM2708 from a Device Tree. Signed-off-by: Noralf Tronnes <[email protected]> BCM2708: DT: change 'axi' nodename to 'soc' Change DT node named 'axi' to 'soc' so it matches ARCH_BCM2835. The VC4 bootloader fills in certain properties in the 'axi' subtree, but since this is part of an upstreaming effort, the name is changed. Signed-off-by: Noralf Tronnes [email protected] BCM2708_DT: Correct length of the peripheral space

Based on the patch authored by Ali Gholami Rudi at https://lkml.org/lkml/2009/7/13/153 Provide an ioctl for userspace applications, but only if this operation is hardware accelerated (otherwide it does not make any sense). Signed-off-by: Siarhei Siamashka <[email protected]>

…9000 as this is widely used. Disabled older rtlwifi driver 8192cu needs old wireless extensions The obsolete WIRELESS_EXT configuration is used by the old Realtek code and is needed for AP support. 8192cu: CONFIG_AP_MODE hardcoded in autoconf.h

Especially on platforms with a slower CPU but a relatively high framebuffer fill bandwidth, like current ARM devices, the existing console monochrome imageblit function used to draw console text is suboptimal for common pixel depths such as 16bpp and 32bpp. The existing code is quite general and can deal with several pixel depths. By creating special case functions for 16bpp and 32bpp, by far the most common pixel formats used on modern systems, a significant speed-up is attained which can be readily felt on ARM-based devices like the Raspberry Pi and the Allwinner platform, but should help any platform using the fb layer. The special case functions allow constant folding, eliminating a number of instructions including divide operations, and allow the use of an unrolled loop, eliminating instructions with a variable shift size, reducing source memory access instructions, and eliminating excessive branching. These unrolled loops also allow much better code optimization by the C compiler. The code that selects which optimized variant is used is also simplified, eliminating integer divide instructions. The speed-up, measured by timing 'cat file.txt' in the console, varies between 40% and 70%, when testing on the Raspberry Pi and Allwinner ARM-based platforms, depending on font size and the pixel depth, with the greater benefit for 32bpp. Signed-off-by: Harm Hanemaaijer <[email protected]>

Scenario: 1. Port down and do fail over 2. Ap do rds_bind syscall PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518 #3 [ffff898e35f15b60] no_context at ffffffff8104854c #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 PID: 45659 PID: 47039 rds_ib_laddr_check /* create id_priv with a null event_handler */ rdma_create_id rdma_bind_addr cma_acquire_dev /* add id_priv to cma_dev->id_list */ cma_attach_to_dev cma_ndev_work_handler /* event_hanlder is null */ id_priv->id.event_handler Signed-off-by: Guanglei Li <[email protected]> Signed-off-by: Honglei Wang <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> Reviewed-by: Yanjun Zhu <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Acked-by: Doug Ledford <[email protected]> Signed-off-by: David S. Miller <[email protected]>

commit bb3ffb7 upstream. rxe_qp_cleanup() can sleep so it must be run in thread context and not in atomic context. This patch avoids that the following bug is triggered: Kernel BUG at 00000000560033f3 [verbose debug info unavailable] BUG: sleeping function called from invalid context at net/core/sock.c:2761 in_atomic(): 1, irqs_disabled(): 0, pid: 7, name: ksoftirqd/0 INFO: lockdep is turned off. Preemption disabled at: [<00000000b6e69628>] __do_softirq+0x4e/0x540 CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.15.0-rc7-dbg+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x85/0xbf ___might_sleep+0x177/0x260 lock_sock_nested+0x1d/0x90 inet_shutdown+0x2e/0xd0 rxe_qp_cleanup+0x107/0x140 [rdma_rxe] rxe_elem_release+0x18/0x80 [rdma_rxe] rxe_requester+0x1cf/0x11b0 [rdma_rxe] rxe_do_task+0x78/0xf0 [rdma_rxe] tasklet_action+0x99/0x270 __do_softirq+0xc0/0x540 run_ksoftirqd+0x1c/0x70 smpboot_thread_fn+0x1be/0x270 kthread+0x117/0x130 ret_from_fork+0x24/0x30 Signed-off-by: Bart Van Assche <[email protected]> Cc: Moni Shoua <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit d0e312f upstream. The RDMA netlink core code checks validity of messages by ensuring that type and operand are in range. It works well for almost all clients except NLDEV, which has cb_table less than number of operands. Request to access such operand will trigger the following kernel panic. This patch updates all places where cb_table is declared for the consistency, but only NLDEV is actually need it. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 0 PID: 522 Comm: syz-executor6 Not tainted 4.13.0+ #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: ffff8800657799c0 task.stack: ffff8800695d000 RIP: 0010:rdma_nl_rcv_msg+0x13a/0x4c0 RSP: 0018:ffff8800695d7838 EFLAGS: 00010207 RAX: dffffc0000000000 RBX: 1ffff1000d2baf0b RCX: 00000000704ff4d7 RDX: 0000000000000000 RSI: ffffffff81ddb03c RDI: 00000003827fa6bc RBP: ffff8800695d7900 R08: ffffffff82ec0578 R09: 0000000000000000 R10: ffff8800695d7900 R11: 0000000000000001 R12: 000000000000001c R13: ffff880069d31e00 R14: 00000000ffffffff R15: ffff880069d357c0 FS: 00007fee6acb8700(0000) GS:ffff88006ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000201a9000 CR3: 0000000059766000 CR4: 00000000000006b0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? rdma_nl_multicast+0x80/0x80 rdma_nl_rcv+0x36b/0x4d0 ? ibnl_put_attr+0xc0/0xc0 netlink_unicast+0x4bd/0x6d0 ? netlink_sendskb+0x50/0x50 ? drop_futex_key_refs.isra.4+0x68/0xb0 netlink_sendmsg+0x9ab/0xbd0 ? nlmsg_notify+0x140/0x140 ? wake_up_q+0xa1/0xf0 ? drop_futex_key_refs.isra.4+0x68/0xb0 sock_sendmsg+0x88/0xd0 sock_write_iter+0x228/0x3c0 ? sock_sendmsg+0xd0/0xd0 ? do_futex+0x3e5/0xb20 ? iov_iter_init+0xaf/0x1d0 __vfs_write+0x46e/0x640 ? sched_clock_cpu+0x1b/0x190 ? __vfs_read+0x620/0x620 ? __fget+0x23a/0x390 ? rw_verify_area+0xca/0x290 vfs_write+0x192/0x490 SyS_write+0xde/0x1c0 ? SyS_read+0x1c0/0x1c0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0xad RIP: 0033:0x7fee6a74a219 RSP: 002b:00007fee6acb7d58 EFLAGS: 00000212 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000638000 RCX: 00007fee6a74a219 RDX: 0000000000000078 RSI: 0000000020141000 RDI: 0000000000000006 RBP: 0000000000000046 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: ffff8800695d7f98 R13: 0000000020141000 R14: 0000000000000006 R15: 00000000ffffffff Code: d6 48 b8 00 00 00 00 00 fc ff df 66 41 81 e4 ff 03 44 8d 72 ff 4a 8d 3c b5 c0 a6 7f 82 44 89 b5 4c ff ff ff 48 89 f9 48 c1 e9 03 <0f> b6 0c 01 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RIP: rdma_nl_rcv_msg+0x13a/0x4c0 RSP: ffff8800695d7838 ---[ end trace ba085d123959c8ec ]--- Kernel panic - not syncing: Fatal exception Cc: syzkaller <[email protected]> Fixes: b4c598a ("RDMA/netlink: Implement nldev device dumpit calback") Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Doug Ledford <[email protected]>

commit fba4adb upstream. One I2C bus on my Atom E3845 board has been broken since 4.9. It has two devices, both declared by ACPI and with built-in drivers. There are two back-to-back transactions originating from the kernel, one targeting each device. The first transaction works, the second one locks up the I2C controller. The controller never recovers. These kernel logs show up whenever an I2C transaction is attempted after this failure. i2c-designware-pci 0000:00:18.3: timeout in disabling adapter i2c-designware-pci 0000:00:18.3: timeout waiting for bus ready Waiting for the I2C controller status to indicate that it is enabled before programming it fixes the issue. I have tested this patch on 4.14 and 4.15. Fixes: commit 2702ea7 ("i2c: designware: wait for disable/enable only if necessary") Cc: linux-stable <[email protected]> #4.13+ Signed-off-by: Ben Gardner <[email protected]> Acked-by: Jarkko Nikula <[email protected]> Reviewed-by: José Roberto de Souza <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 7ba7166 upstream. It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) #1 0x00007fc08889c2f3 malloc (libc.so.6) #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) #3 0x0000560e6005e75c n/a (urxvt) #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) #8 0x0000560e6005cb55 ev_run (urxvt) #9 0x0000560e6003b9b9 main (urxvt) #10 0x00007fc08883af4a __libc_start_main (libc.so.6) #11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/[email protected] Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <[email protected]> Reported-by: Sergey Senozhatsky <[email protected]> Tested-by: Sergey Senozhatsky <[email protected]> Suggested-by: Minchan Kim <[email protected]> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: <[email protected]> [4.14] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

ACM driver may accept data to transmit while system is not fully resumed. In this case ACM driver buffers data and prepare URBs on usb anchor list. There is a little chance that two tasks put a char and initiate acm_tty_flush_chars(). In such a case, driver will put one URB twice on usb anchor list. This patch also reset length of data before resue of a buffer. This not only prevent sending rubbish, but also lower risc of race. Without this patch we hit following kernel panic in one of our stabilty/stress tests. [ 46.884442] *list_add double add*: new=ffff9b2ab7289330, prev=ffff9b2ab7289330, next=ffff9b2ab81e28e0. [ 46.884476] Modules linked in: hci_uart btbcm bluetooth rfkill_gpio igb_avb(O) cfg80211 snd_soc_sst_bxt_tdf8532 snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_sst_acpi snd_soc_sst_match snd_hda_ext_core snd_hda_core trusty_timer trusty_wall trusty_log trusty_virtio trusty_ipc trusty_mem trusty_irq trusty virtio_ring virtio intel_ipu4_mmu_bxtB0 lib2600_mod_bxtB0 intel_ipu4_isys_mod_bxtB0 lib2600psys_mod_bxtB0 intel_ipu4_psys_mod_bxtB0 intel_ipu4_mod_bxtB0 intel_ipu4_wrapper_bxtB0 intel_ipu4_acpi videobuf2_dma_contig as3638 dw9714 lm3643 crlmodule smiapp smiapp_pll [ 46.884480] CPU: 1 PID: 33 Comm: kworker/u8:1 Tainted: G U W O 4.9.56-quilt-2e5dc0ac-g618ed69ced6e-dirty anholt#4 [ 46.884489] Workqueue: events_unbound flush_to_ldisc [ 46.884494] ffffb98ac012bb08 ffffffffad3e82e5 ffffb98ac012bb58 0000000000000000 [ 46.884497] ffffb98ac012bb48 ffffffffad0a23d1 00000024ad6374dd ffff9b2ab7289330 [ 46.884500] ffff9b2ab81e28e0 ffff9b2ab7289330 0000000000000002 0000000000000000 [ 46.884501] Call Trace: [ 46.884507] [<ffffffffad3e82e5>] dump_stack+0x67/0x92 [ 46.884511] [<ffffffffad0a23d1>] __warn+0xd1/0xf0 [ 46.884513] [<ffffffffad0a244f>] warn_slowpath_fmt+0x5f/0x80 [ 46.884516] [<ffffffffad407443>] __list_add+0xb3/0xc0 [ 46.884521] [<ffffffffad71133c>] *usb_anchor_urb*+0x4c/0xa0 [ 46.884524] [<ffffffffad782c6f>] *acm_tty_flush_chars*+0x8f/0xb0 [ 46.884527] [<ffffffffad782cd1>] *acm_tty_put_char*+0x41/0x100 [ 46.884530] [<ffffffffad4ced34>] tty_put_char+0x24/0x40 [ 46.884533] [<ffffffffad4d3bf5>] do_output_char+0xa5/0x200 [ 46.884535] [<ffffffffad4d3e98>] __process_echoes+0x148/0x290 [ 46.884538] [<ffffffffad4d654c>] n_tty_receive_buf_common+0x57c/0xb00 [ 46.884541] [<ffffffffad4d6ae4>] n_tty_receive_buf2+0x14/0x20 [ 46.884543] [<ffffffffad4d9662>] tty_ldisc_receive_buf+0x22/0x50 [ 46.884545] [<ffffffffad4d9c05>] flush_to_ldisc+0xc5/0xe0 [ 46.884549] [<ffffffffad0bcfe8>] process_one_work+0x148/0x440 [ 46.884551] [<ffffffffad0bdc19>] worker_thread+0x69/0x4a0 [ 46.884554] [<ffffffffad0bdbb0>] ? max_active_store+0x80/0x80 [ 46.884556] [<ffffffffad0c2e10>] kthread+0x110/0x130 [ 46.884559] [<ffffffffad0c2d00>] ? kthread_park+0x60/0x60 [ 46.884563] [<ffffffffadad9917>] ret_from_fork+0x27/0x40 [ 46.884566] ---[ end trace 3bd599058b8a9eb3 ]--- Signed-off-by: Dominik Bozek <[email protected]> Signed-off-by: Kuppuswamy Sathyanarayanan <[email protected]> Acked-by: Oliver Neukum <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

One I2C bus on my Atom E3845 board has been broken since 4.9. It has two devices, both declared by ACPI and with built-in drivers. There are two back-to-back transactions originating from the kernel, one targeting each device. The first transaction works, the second one locks up the I2C controller. The controller never recovers. These kernel logs show up whenever an I2C transaction is attempted after this failure. i2c-designware-pci 0000:00:18.3: timeout in disabling adapter i2c-designware-pci 0000:00:18.3: timeout waiting for bus ready Waiting for the I2C controller status to indicate that it is enabled before programming it fixes the issue. I have tested this patch on 4.14 and 4.15. Fixes: commit 2702ea7 ("i2c: designware: wait for disable/enable only if necessary") Cc: linux-stable <[email protected]> anholt#4.13+ Signed-off-by: Ben Gardner <[email protected]> Acked-by: Jarkko Nikula <[email protected]> Reviewed-by: José Roberto de Souza <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>

It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) anholt#1 0x00007fc08889c2f3 malloc (libc.so.6) anholt#2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) anholt#3 0x0000560e6005e75c n/a (urxvt) anholt#4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) anholt#5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) anholt#6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) anholt#7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) anholt#8 0x0000560e6005cb55 ev_run (urxvt) anholt#9 0x0000560e6003b9b9 main (urxvt) anholt#10 0x00007fc08883af4a __libc_start_main (libc.so.6) anholt#11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/[email protected] Fixes: bd4c82c ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <[email protected]> Reported-by: Sergey Senozhatsky <[email protected]> Tested-by: Sergey Senozhatsky <[email protected]> Suggested-by: Minchan Kim <[email protected]> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: <[email protected]> [4.14] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Reported by syzkaller: WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel] CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ anholt#4 RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel] Call Trace: vmx_handle_exit+0xbd/0xe20 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm] kvm_vcpu_ioctl+0x3e9/0x720 [kvm] do_vfs_ioctl+0xa4/0x6a0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x25/0x9c The testcase creates a first thread to issue KVM_SMI ioctl, and then creates a second thread to mmap and operate on the same vCPU. This triggers a race condition when running the testcase with multiple threads. Sometimes one thread exits with a triple fault while another thread mmaps and operates on the same vCPU. Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE in kvm_handle_bad_page(), which will go on to cause an emulation failure and an exit with KVM_EXIT_INTERNAL_ERROR. Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: [email protected] Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

This patch fixes NULL pointer crash due to active timer running for abort IOCB. From crash dump analysis it was discoverd that get_next_timer_interrupt() encountered a corrupted entry on the timer list. anholt#9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8 [exception RIP: get_next_timer_interrupt+440] RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013 RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600 RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6 RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001 R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6 R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Looking at the assembly of get_next_timer_interrupt(), address came from %r8 (ffff95e1f6451188) which is pointing to list_head with single entry at ffff95e5ff621178. 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1) 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx) crash> rd ffff95e1f6451188 10 ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b..... ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E..... ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E..... ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|.... ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E..... crash> rd ffff95e5ff621178 10 ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y.... ffff95e5ff621188: 0000000000000000 00000000ffffffff ................ ffff95e5ff621198: 00000000000000a0 0000000000000010 ................ ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b............. ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q.... ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080. CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13 FREE / [ALLOCATED] ffff95e5ff621080 (cpu 6 cache) Examining the contents of that memory reveals a pointer to a constant string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). crash> rd ffffffffc059277c 20 ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma crash> struct -ox srb_iocb struct srb_iocb { union { struct {...} logio; struct {...} els_logo; struct {...} tmf; struct {...} fxiocb; struct {...} abt; struct ct_arg ctarg; struct {...} mbx; struct {...} nack; [0x0 ] } u; [0xb8] struct timer_list timer; [0x108] void (*timeout)(void *); } SIZE: 0x110 crash> ! bc ibase=16 obase=10 B8+40 F8 The object is a srb_t, and at offset 0xf8 within that structure (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list. Cc: <[email protected]> anholt#4.4+ Fixes: 4440e46 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.") Signed-off-by: Himanshu Madhani <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>

iproute2 print_skbmod() prints the configured ethertype using format 0x%X: therefore, test 9aa8 systematically fails, because it configures action anholt#4 using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing the expected value to 0x31 lets the test result 'not ok' become 'ok'. tested with: # ./tdc.py -e 9aa8 Test 9aa8: Get a single skbmod action from a list All test results: 1..1 ok 1 9aa8 Get a single skbmod action from a list Fixes: cf797ac ("tc-testing: Add test cases for police and skbmod") Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Currently we can crash perf record when running in pipe mode, like: $ perf record ls | perf report # To display the perf.data header info, please use --header/--header-only options. # perf: Segmentation fault Error: The - file has no samples! The callstack of the crash is: 0x0000000000515242 in perf_event__synthesize_event_update_name 3513 ev = event_update_event__new(len + 1, PERF_EVENT_UPDATE__NAME, evsel->id[0]); (gdb) bt #0 0x0000000000515242 in perf_event__synthesize_event_update_name anholt#1 0x00000000005158a4 in perf_event__synthesize_extra_attr anholt#2 0x0000000000443347 in record__synthesize anholt#3 0x00000000004438e3 in __cmd_record anholt#4 0x000000000044514e in cmd_record anholt#5 0x00000000004cbc95 in run_builtin anholt#6 0x00000000004cbf02 in handle_internal_command anholt#7 0x00000000004cc054 in run_argv anholt#8 0x00000000004cc422 in main The reason of the crash is that the evsel does not have ids array allocated and the pipe's synthesize code tries to access it. We don't force evsel ids allocation when we have single event, because it's not needed. However we need it when we are in pipe mode even for single event as a key for evsel update event. Fixing this by forcing evsel ids allocation event for single event, when we are in pipe mode. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

After v4.12 commit e2460f2 ("dm: mark targets that pass integrity data"), dm-multipath, e.g. on DIF+DIX SCSI disk paths, does not support block integrity any more. So add it to the whitelist. This is also a pre-requisite to use block integrity with other dm layer(s) on top of multipath, such as kpartx partitions (dm-linear) or LVM. Also, bump target version to reflect this fix. Fixes: e2460f2 ("dm: mark targets that pass integrity data") Cc: <[email protected]> anholt#4.12+ Bisected-by: Fedor Loshakov <[email protected]> Signed-off-by: Steffen Maier <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>

commit 95e057e upstream. Reported by syzkaller: WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel] CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4 RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel] Call Trace: vmx_handle_exit+0xbd/0xe20 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm] kvm_vcpu_ioctl+0x3e9/0x720 [kvm] do_vfs_ioctl+0xa4/0x6a0 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x25/0x9c The testcase creates a first thread to issue KVM_SMI ioctl, and then creates a second thread to mmap and operate on the same vCPU. This triggers a race condition when running the testcase with multiple threads. Sometimes one thread exits with a triple fault while another thread mmaps and operates on the same vCPU. Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE in kvm_handle_bad_page(), which will go on to cause an emulation failure and an exit with KVM_EXIT_INTERNAL_ERROR. Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com Cc: Paolo Bonzini <[email protected]> Cc: Radim Krčmář <[email protected]> Cc: [email protected] Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 1514839 upstream. This patch fixes NULL pointer crash due to active timer running for abort IOCB. From crash dump analysis it was discoverd that get_next_timer_interrupt() encountered a corrupted entry on the timer list. #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8 [exception RIP: get_next_timer_interrupt+440] RIP: ffffffff90ea3088 RSP: ffff95e1f6f0fdf0 RFLAGS: 00010013 RAX: ffff95e1f6451028 RBX: 000218e2389e5f40 RCX: 00000001232ad600 RDX: 0000000000000001 RSI: ffff95e1f6f0fdf0 RDI: 0000000001232ad6 RBP: ffff95e1f6f0fe40 R8: ffff95e1f6451188 R9: 0000000000000001 R10: 0000000000000016 R11: 0000000000000016 R12: 00000001232ad5f6 R13: ffff95e1f6450000 R14: ffff95e1f6f0fdf8 R15: ffff95e1f6f0fe10 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Looking at the assembly of get_next_timer_interrupt(), address came from %r8 (ffff95e1f6451188) which is pointing to list_head with single entry at ffff95e5ff621178. 0xffffffff90ea307a <get_next_timer_interrupt+426>: mov (%r8),%rdx 0xffffffff90ea307d <get_next_timer_interrupt+429>: cmp %r8,%rdx 0xffffffff90ea3080 <get_next_timer_interrupt+432>: je 0xffffffff90ea30a7 <get_next_timer_interrupt+471> 0xffffffff90ea3082 <get_next_timer_interrupt+434>: nopw 0x0(%rax,%rax,1) 0xffffffff90ea3088 <get_next_timer_interrupt+440>: testb $0x1,0x18(%rdx) crash> rd ffff95e1f6451188 10 ffff95e1f6451188: ffff95e5ff621178 ffff95e5ff621178 x.b.....x.b..... ffff95e1f6451198: ffff95e1f6451198 ffff95e1f6451198 ..E.......E..... ffff95e1f64511a8: ffff95e1f64511a8 ffff95e1f64511a8 ..E.......E..... ffff95e1f64511b8: ffff95e77cf509a0 ffff95e77cf509a0 ...|.......|.... ffff95e1f64511c8: ffff95e1f64511c8 ffff95e1f64511c8 ..E.......E..... crash> rd ffff95e5ff621178 10 ffff95e5ff621178: 0000000000000001 ffff95e15936aa00 ..........6Y.... ffff95e5ff621188: 0000000000000000 00000000ffffffff ................ ffff95e5ff621198: 00000000000000a0 0000000000000010 ................ ffff95e5ff6211a8: ffff95e5ff621198 000000000000000c ..b............. ffff95e5ff6211b8: 00000f5800000000 ffff95e751f8d720 ....X... ..Q.... ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080. CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff95dc7fd74d00 mnt_cache 384 19785 24948 594 16k SLAB MEMORY NODE TOTAL ALLOCATED FREE ffffdc5dabfd8800 ffff95e5ff620000 1 42 29 13 FREE / [ALLOCATED] ffff95e5ff621080 (cpu 6 cache) Examining the contents of that memory reveals a pointer to a constant string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). crash> rd ffffffffc059277c 20 ffffffffc059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. ffffffffc059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i ffffffffc059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi ffffffffc05927ac: 636976656420676e 786c252074612065 ng device at %lx ffffffffc05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod ffffffffc05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. ffffffffc05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A ffffffffc05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking ffffffffc05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl ffffffffc059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma crash> struct -ox srb_iocb struct srb_iocb { union { struct {...} logio; struct {...} els_logo; struct {...} tmf; struct {...} fxiocb; struct {...} abt; struct ct_arg ctarg; struct {...} mbx; struct {...} nack; [0x0 ] } u; [0xb8] struct timer_list timer; [0x108] void (*timeout)(void *); } SIZE: 0x110 crash> ! bc ibase=16 obase=10 B8+40 F8 The object is a srb_t, and at offset 0xf8 within that structure (i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list. Cc: <[email protected]> #4.4+ Fixes: 4440e46 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.") Signed-off-by: Himanshu Madhani <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 3efc31f upstream. During error test case where switch port status is toggled from enable to disable, following stack trace is seen which indicates recursion trying to send terminate exchange. This regression was introduced by commit 82de802 ("scsi: qla2xxx: Preparation for Target MQ.") BUG: stack guard page was hit at ffffb96488383ff8 (stack is ffffb96488384000..ffffb96488387fff) BUG: stack guard page was hit at ffffb964886c3ff8 (stack is ffffb964886c4000..ffffb964886c7fff) kernel stack overflow (double-fault): 0000 [#1] SMP qlt_term_ctio_exchange+0x9c/0xb0 [qla2xxx] qlt_term_ctio_exchange+0x9c/0xb0 [qla2xxx] qlt_term_ctio_exchange+0x9c/0xb0 [qla2xxx] qlt_term_ctio_exchange+0x9c/0xb0 [qla2xxx] qlt_term_ctio_exchange+0x9c/0xb0 [qla2xxx] Fixes: 82de802 ("scsi: qla2xxx: Preparation for Target MQ.") Cc: <[email protected]> #4.10 Signed-off-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 8c5c147 upstream. After v4.12 commit e2460f2 ("dm: mark targets that pass integrity data"), dm-multipath, e.g. on DIF+DIX SCSI disk paths, does not support block integrity any more. So add it to the whitelist. This is also a pre-requisite to use block integrity with other dm layer(s) on top of multipath, such as kpartx partitions (dm-linear) or LVM. Also, bump target version to reflect this fix. Fixes: e2460f2 ("dm: mark targets that pass integrity data") Cc: <[email protected]> #4.12+ Bisected-by: Fedor Loshakov <[email protected]> Signed-off-by: Steffen Maier <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 72d5481 ] It is unlikely request_threaded_irq will fail, but if it does for some reason we should clear iommu->pr_irq in the error path. Also intel_svm_finish_prq shouldn't try to clean up the page request interrupt if pr_irq is 0. Without these, if request_threaded_irq were to fail the following occurs: fail with no fixes: [ 0.683147] ------------[ cut here ]------------ [ 0.683148] NULL pointer, cannot free irq [ 0.683158] WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1632 irq_domain_free_irqs+0x126/0x140 [ 0.683160] Modules linked in: [ 0.683163] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2 #3 [ 0.683165] Hardware name: /NUC7i3BNB, BIOS BNKBL357.86A.0036.2017.0105.1112 01/05/2017 [ 0.683168] RIP: 0010:irq_domain_free_irqs+0x126/0x140 [ 0.683169] RSP: 0000:ffffc90000037ce8 EFLAGS: 00010292 [ 0.683171] RAX: 000000000000001d RBX: ffff880276283c00 RCX: ffffffff81c5e5e8 [ 0.683172] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246 [ 0.683174] RBP: ffff880276283c00 R08: 0000000000000000 R09: 000000000000023c [ 0.683175] R10: 0000000000000007 R11: 0000000000000000 R12: 000000000000007a [ 0.683176] R13: 0000000000000001 R14: 0000000000000000 R15: 0000010010000000 [ 0.683178] FS: 0000000000000000(0000) GS:ffff88027ec80000(0000) knlGS:0000000000000000 [ 0.683180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.683181] CR2: 0000000000000000 CR3: 0000000001c09001 CR4: 00000000003606e0 [ 0.683182] Call Trace: [ 0.683189] intel_svm_finish_prq+0x3c/0x60 [ 0.683191] free_dmar_iommu+0x1ac/0x1b0 [ 0.683195] init_dmars+0xaaa/0xaea [ 0.683200] ? klist_next+0x19/0xc0 [ 0.683203] ? pci_do_find_bus+0x50/0x50 [ 0.683205] ? pci_get_dev_by_id+0x52/0x70 [ 0.683208] intel_iommu_init+0x498/0x5c7 [ 0.683211] pci_iommu_init+0x13/0x3c [ 0.683214] ? e820__memblock_setup+0x61/0x61 [ 0.683217] do_one_initcall+0x4d/0x1a0 [ 0.683220] kernel_init_freeable+0x186/0x20e [ 0.683222] ? set_debug_rodata+0x11/0x11 [ 0.683225] ? rest_init+0xb0/0xb0 [ 0.683226] kernel_init+0xa/0xff [ 0.683229] ret_from_fork+0x1f/0x30 [ 0.683259] Code: 89 ee 44 89 e7 e8 3b e8 ff ff 5b 5d 44 89 e7 44 89 ee 41 5c 41 5d 41 5e e9 a8 84 ff ff 48 c7 c7 a8 71 a7 81 31 c0 e8 6a d3 f9 ff <0f> ff 5b 5d 41 5c 41 5d 41 5 e c3 0f 1f 44 00 00 66 2e 0f 1f 84 [ 0.683285] ---[ end trace f7650e42792627ca ]--- with iommu->pr_irq = 0, but no check in intel_svm_finish_prq: [ 0.669561] ------------[ cut here ]------------ [ 0.669563] Trying to free already-free IRQ 0 [ 0.669573] WARNING: CPU: 3 PID: 1 at kernel/irq/manage.c:1546 __free_irq+0xa4/0x2c0 [ 0.669574] Modules linked in: [ 0.669577] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2 #4 [ 0.669579] Hardware name: /NUC7i3BNB, BIOS BNKBL357.86A.0036.2017.0105.1112 01/05/2017 [ 0.669581] RIP: 0010:__free_irq+0xa4/0x2c0 [ 0.669582] RSP: 0000:ffffc90000037cc0 EFLAGS: 00010082 [ 0.669584] RAX: 0000000000000021 RBX: 0000000000000000 RCX: ffffffff81c5e5e8 [ 0.669585] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000046 [ 0.669587] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000023c [ 0.669588] R10: 0000000000000007 R11: 0000000000000000 R12: ffff880276253960 [ 0.669589] R13: ffff8802762538a4 R14: ffff880276253800 R15: ffff880276283600 [ 0.669593] FS: 0000000000000000(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000 [ 0.669594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.669596] CR2: 0000000000000000 CR3: 0000000001c09001 CR4: 00000000003606e0 [ 0.669602] Call Trace: [ 0.669616] free_irq+0x30/0x60 [ 0.669620] intel_svm_finish_prq+0x34/0x60 [ 0.669623] free_dmar_iommu+0x1ac/0x1b0 [ 0.669627] init_dmars+0xaaa/0xaea [ 0.669631] ? klist_next+0x19/0xc0 [ 0.669634] ? pci_do_find_bus+0x50/0x50 [ 0.669637] ? pci_get_dev_by_id+0x52/0x70 [ 0.669639] intel_iommu_init+0x498/0x5c7 [ 0.669642] pci_iommu_init+0x13/0x3c [ 0.669645] ? e820__memblock_setup+0x61/0x61 [ 0.669648] do_one_initcall+0x4d/0x1a0 [ 0.669651] kernel_init_freeable+0x186/0x20e [ 0.669653] ? set_debug_rodata+0x11/0x11 [ 0.669656] ? rest_init+0xb0/0xb0 [ 0.669658] kernel_init+0xa/0xff [ 0.669661] ret_from_fork+0x1f/0x30 [ 0.669662] Code: 7a 08 75 0e e9 c3 01 00 00 4c 39 7b 08 74 57 48 89 da 48 8b 5a 18 48 85 db 75 ee 89 ee 48 c7 c7 78 67 a7 81 31 c0 e8 4c 37 fa ff <0f> ff 48 8b 34 24 4c 89 ef e 8 0e 4c 68 00 49 8b 46 40 48 8b 80 [ 0.669688] ---[ end trace 58a470248700f2fc ]--- Cc: Alex Williamson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Ashok Raj <[email protected]> Signed-off-by: Jerry Snitselaar <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit d754941 ] If, for any reason, userland shuts down iscsi transport interfaces before proper logouts - like when logging in to LUNs manually, without logging out on server shutdown, or when automated scripts can't umount/logout from logged LUNs - kernel will hang forever on its sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all still existent paths. PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow" #0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee #1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5 #2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199 #3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604 #4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c #5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10 #6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7 #7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe #8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7 #9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c This happens because iscsi_eh_cmd_timed_out(), the transport layer timeout helper, would tell the queue timeout function (scsi_times_out) to reset the request timer over and over, until the session state is back to logged in state. Unfortunately, during server shutdown, this might never happen again. Other option would be "not to handle" the issue in the transport layer. That would trigger the error handler logic, which would also need the session state to be logged in again. Best option, for such case, is to tell upper layers that the command was handled during the transport layer error handler helper, marking it as DID_NO_CONNECT, which will allow completion and inform about the problem. After the session was marked as ISCSI_STATE_FAILED, due to the first timeout during the server shutdown phase, all subsequent cmds will fail to be queued, allowing upper logic to fail faster. Signed-off-by: Rafael David Tinoco <[email protected]> Reviewed-by: Lee Duncan <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 2c0aa08 ] Scenario: 1. Port down and do fail over 2. Ap do rds_bind syscall PID: 47039 TASK: ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6" #0 [ffff898e35f159f0] machine_kexec at ffffffff8103abf9 #1 [ffff898e35f15a60] crash_kexec at ffffffff810b96e3 #2 [ffff898e35f15b30] oops_end at ffffffff8150f518 #3 [ffff898e35f15b60] no_context at ffffffff8104854c #4 [ffff898e35f15ba0] __bad_area_nosemaphore at ffffffff81048675 #5 [ffff898e35f15bf0] bad_area_nosemaphore at ffffffff810487d3 #6 [ffff898e35f15c00] do_page_fault at ffffffff815120b8 #7 [ffff898e35f15d10] page_fault at ffffffff8150ea95 [exception RIP: unknown or invalid address] RIP: 0000000000000000 RSP: ffff898e35f15dc8 RFLAGS: 00010282 RAX: 00000000fffffffe RBX: ffff889b77f6fc00 RCX:ffffffff81c99d88 RDX: 0000000000000000 RSI: ffff896019ee08e8 RDI:ffff889b77f6fc00 RBP: ffff898e35f15df0 R8: ffff896019ee08c8 R9:0000000000000000 R10: 0000000000000400 R11: 0000000000000000 R12:ffff896019ee08c0 R13: ffff889b77f6fe68 R14: ffffffff81c99d80 R15: ffffffffa022a1e0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff898e35f15dc8] cma_ndev_work_handler at ffffffffa022a228 [rdma_cm] #9 [ffff898e35f15df8] process_one_work at ffffffff8108a7c6 #10 [ffff898e35f15e58] worker_thread at ffffffff8108bda0 #11 [ffff898e35f15ee8] kthread at ffffffff81090fe6 PID: 45659 TASK: ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap" #0 [ffff881024ccfc98] __schedule at ffffffff8150bac4 #1 [ffff881024ccfd40] schedule at ffffffff8150c2cf #2 [ffff881024ccfd50] __mutex_lock_slowpath at ffffffff8150cee7 #3 [ffff881024ccfdc0] mutex_lock at ffffffff8150cdeb #4 [ffff881024ccfde0] rdma_destroy_id at ffffffffa022a027 [rdma_cm] #5 [ffff881024ccfe10] rds_ib_laddr_check at ffffffffa0357857 [rds_rdma] #6 [ffff881024ccfe50] rds_trans_get_preferred at ffffffffa0324c2a [rds] #7 [ffff881024ccfe80] rds_bind at ffffffffa031d690 [rds] #8 [ffff881024ccfeb0] sys_bind at ffffffff8142a670 PID: 45659 PID: 47039 rds_ib_laddr_check /* create id_priv with a null event_handler */ rdma_create_id rdma_bind_addr cma_acquire_dev /* add id_priv to cma_dev->id_list */ cma_attach_to_dev cma_ndev_work_handler /* event_hanlder is null */ id_priv->id.event_handler Signed-off-by: Guanglei Li <[email protected]> Signed-off-by: Honglei Wang <[email protected]> Reviewed-by: Junxiao Bi <[email protected]> Reviewed-by: Yanjun Zhu <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Acked-by: Doug Ledford <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Shakeel reported a crash in mem_cgroup_protected(), which can be triggered by memcg reclaim if the legacy cgroup v1 use_hierarchy=0 mode is used: BUG: unable to handle kernel NULL pointer dereference at 0000000000000120 PGD 8000001ff55da067 P4D 8000001ff55da067 PUD 1fdc7df067 PMD 0 Oops: 0000 [#4] SMP PTI CPU: 0 PID: 15581 Comm: bash Tainted: G D 4.17.0-smp-clean #5 Hardware name: ... RIP: 0010:mem_cgroup_protected+0x54/0x130 Code: 4c 8b 8e 00 01 00 00 4c 8b 86 08 01 00 00 48 8d 8a 08 ff ff ff 48 85 d2 ba 00 00 00 00 48 0f 44 ca 48 39 c8 0f 84 cf 00 00 00 <48> 8b 81 20 01 00 00 4d 89 ca 4c 39 c8 4c 0f 46 d0 4d 85 d2 74 05 RSP: 0000:ffffabe64dfafa58 EFLAGS: 00010286 RAX: ffff9fb6ff03d000 RBX: ffff9fb6f5b1b000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9fb6f5b1b000 RDI: ffff9fb6f5b1b000 RBP: ffffabe64dfafb08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000c800 R12: ffffabe64dfafb88 R13: ffff9fb6f5b1b000 R14: ffffabe64dfafb88 R15: ffff9fb77fffe000 FS: 00007fed1f8ac700(0000) GS:ffff9fb6ff400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000120 CR3: 0000001fdcf86003 CR4: 00000000001606f0 Call Trace: ? shrink_node+0x194/0x510 do_try_to_free_pages+0xfd/0x390 try_to_free_mem_cgroup_pages+0x123/0x210 try_charge+0x19e/0x700 mem_cgroup_try_charge+0x10b/0x1a0 wp_page_copy+0x134/0x5b0 do_wp_page+0x90/0x460 __handle_mm_fault+0x8e3/0xf30 handle_mm_fault+0xfe/0x220 __do_page_fault+0x262/0x500 do_page_fault+0x28/0xd0 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 RIP: 0033:0x485b72 The problem happens because parent_mem_cgroup() returns a NULL pointer, which is dereferenced later without a check. As cgroup v1 has no memory guarantee support, let's make mem_cgroup_protected() immediately return MEMCG_PROT_NONE, if the given cgroup has no parent (non-hierarchical mode is used). Link: http://lkml.kernel.org/r/[email protected] Fixes: bf8d5d5 ("memcg: introduce memory.min") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Shakeel Butt <[email protected]> Tested-by: Shakeel Butt <[email protected]> Tested-by: John Stultz <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Since commit 1bb8866 ("mtd: nand: denali: handle timing parameters by setup_data_interface()"), denali_dt.c gets the clock rate from the clock driver. The driver expects the frequency of the bus interface clock, whereas the clock driver of SOCFPGA provides the core clock. Thus, the setup_data_interface() hook calculates timing parameters based on a wrong frequency. To make it work without relying on the clock driver, hard-code the clock frequency, 200MHz. This is fine for existing DT of UniPhier, and also fixes the issue of SOCFPGA because both platforms use 200 MHz for the bus interface clock. Fixes: 1bb8866 ("mtd: nand: denali: handle timing parameters by setup_data_interface()") Cc: linux-stable <[email protected]> #4.14+ Reported-by: Philipp Rosenberger <[email protected]> Suggested-by: Boris Brezillon <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Tested-by: Richard Weinberger <[email protected]> Signed-off-by: Boris Brezillon <[email protected]>

Currently, whenever a new node is created/re-used from the memhotplug path, we call free_area_init_node()->free_area_init_core(). But there is some code that we do not really need to run when we are coming from such path. free_area_init_core() performs the following actions: 1) Initializes pgdat internals, such as spinlock, waitqueues and more. 2) Account # nr_all_pages and # nr_kernel_pages. These values are used later on when creating hash tables. 3) Account number of managed_pages per zone, substracting dma_reserved and memmap pages. 4) Initializes some fields of the zone structure data 5) Calls init_currently_empty_zone to initialize all the freelists 6) Calls memmap_init to initialize all pages belonging to certain zone When called from memhotplug path, free_area_init_core() only performs actions #1 and #4. Action #2 is pointless as the zones do not have any pages since either the node was freed, or we are re-using it, eitherway all zones belonging to this node should have 0 pages. For the same reason, action #3 results always in manages_pages being 0. Action #5 and #6 are performed later on when onlining the pages: online_pages()->move_pfn_range_to_zone()->init_currently_empty_zone() online_pages()->move_pfn_range_to_zone()->memmap_init_zone() This patch does two things: First, moves the node/zone initializtion to their own function, so it allows us to create a small version of free_area_init_core, where we only perform: 1) Initialization of pgdat internals, such as spinlock, waitqueues and more 4) Initialization of some fields of the zone structure data These two functions are: pgdat_init_internals() and zone_init_internals(). The second thing this patch does, is to introduce free_area_init_core_hotplug(), the memhotplug version of free_area_init_core(): Currently, we call free_area_init_node() from the memhotplug path. In there, we set some pgdat's fields, and call calculate_node_totalpages(). calculate_node_totalpages() calculates the # of pages the node has. Since the node is either new, or we are re-using it, the zones belonging to this node should not have any pages, so there is no point to calculate this now. Actually, we re-set these values to 0 later on with the calls to: reset_node_managed_pages() reset_node_present_pages() The # of pages per node and the # of pages per zone will be calculated when onlining the pages: online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_zone_range() online_pages()->move_pfn_range()->move_pfn_range_to_zone()->resize_pgdat_range() Also, since free_area_init_core/free_area_init_node will now only get called during early init, let us replace __paginginit with __init, so their code gets freed up. [[email protected]: fix section usage] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: v6] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Patch series "add support for relative references in special sections", v10. This adds support for emitting special sections such as initcall arrays, PCI fixups and tracepoints as relative references rather than absolute references. This reduces the size by 50% on 64-bit architectures, but more importantly, it removes the need for carrying relocation metadata for these sections in relocatable kernels (e.g., for KASLR) that needs to be fixed up at boot time. On arm64, this reduces the vmlinux footprint of such a reference by 8x (8 byte absolute reference + 24 byte RELA entry vs 4 byte relative reference) Patch #3 was sent out before as a single patch. This series supersedes the previous submission. This version makes relative ksymtab entries dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather than trying to infer from kbuild test robot replies for which architectures it should be blacklisted. Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS, and sets it for the main architectures that are expected to benefit the most from this feature, i.e., 64-bit architectures or ones that use runtime relocations. Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of ksymtab/kcrctab sections in decompressor and EFI stub objects when rebuilding existing C files to run in a different context. Patches #4 - #6 implement relative references for initcalls, PCI fixups and tracepoints, respectively, all of which produce sections with order ~1000 entries on an arm64 defconfig kernel with tracing enabled. This means we save about 28 KB of vmlinux space for each of these patches. [From the v7 series blurb, which included the jump_label patches as well]: For the arm64 kernel, all patches combined reduce the memory footprint of vmlinux by about 1.3 MB (using a config copied from Ubuntu that has KASLR enabled), of which ~1 MB is the size reduction of the RELA section in .init, and the remaining 300 KB is reduction of .text/.data. This patch (of 6): Before updating certain subsystems to use place relative 32-bit relocations in special sections, to save space and reduce the number of absolute relocations that need to be processed at runtime by relocatable kernels, introduce the Kconfig symbol and define it for some architectures that should be able to support and benefit from it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ard Biesheuvel <[email protected]> Acked-by: Michael Ellerman <[email protected]> Reviewed-by: Will Deacon <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Kees Cook <[email protected]> Cc: Thomas Garnier <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Russell King <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Petr Mladek <[email protected]> Cc: James Morris <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Sergey Senozhatsky <[email protected]>, Cc: James Morris <[email protected]> Cc: Jessica Yu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

The writeback thread would exit with a lock held when the cache device is detached via sysfs interface, fix it by releasing the held lock before exiting the while-loop. Fixes: fadd94e (bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set) Signed-off-by: Shan Hai <[email protected]> Signed-off-by: Coly Li <[email protected]> Tested-by: Shenghui Wang <[email protected]> Cc: [email protected] #4.17+ Signed-off-by: Jens Axboe <[email protected]>

Avoid calling cgroup_threadgroup_change_end() without having called cgroup_threadgroup_change_begin() first. During process creation we need to check whether the cgroup we are in allows us to fork. To perform this check the cgroup needs to guard itself against threadgroup changes and takes a lock. Prior to CLONE_PIDFD the cleanup target "bad_fork_free_pid" would also need to call cgroup_threadgroup_change_end() because said lock had already been taken. However, this is not the case anymore with the addition of CLONE_PIDFD. We are now allocating a pidfd before we check whether the cgroup we're in can fork and thus prior to taking the lock. So when copy_process() fails at the right step it would release a lock we haven't taken. This bug is not even very subtle to be honest. It's just not very clear from the naming of cgroup_threadgroup_change_{begin,end}() that a lock is taken. Here's the relevant splat: entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fec849 Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078 RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000 RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(depth <= 0) WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 __lock_release kernel/locking/lockdep.c:4052 [inline] WARNING: CPU: 1 PID: 7744 at kernel/locking/lockdep.c:4052 lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 7744 Comm: syz-executor007 Not tainted 5.1.0+ #4 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 panic+0x2cb/0x65c kernel/panic.c:214 __warn.cold+0x20/0x45 kernel/panic.c:566 report_bug+0x263/0x2b0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:179 [inline] fixup_bug arch/x86/kernel/traps.c:174 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972 RIP: 0010:__lock_release kernel/locking/lockdep.c:4052 [inline] RIP: 0010:lock_release+0x667/0xa00 kernel/locking/lockdep.c:4321 Code: 0f 85 a0 03 00 00 8b 35 77 66 08 08 85 f6 75 23 48 c7 c6 a0 55 6b 87 48 c7 c7 40 25 6b 87 4c 89 85 70 ff ff ff e8 b7 a9 eb ff <0f> 0b 4c 8b 85 70 ff ff ff 4c 89 ea 4c 89 e6 4c 89 c7 e8 52 63 ff RSP: 0018:ffff888094117b48 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 1ffff11012822f6f RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815af236 RDI: ffffed1012822f5b RBP: ffff888094117c00 R08: ffff888092bfc400 R09: fffffbfff113301d R10: fffffbfff113301c R11: ffffffff889980e3 R12: ffffffff8a451df8 R13: ffffffff8142e71f R14: ffffffff8a44cc80 R15: ffff888094117bd8 percpu_up_read.constprop.0+0xcb/0x110 include/linux/percpu-rwsem.h:92 cgroup_threadgroup_change_end include/linux/cgroup-defs.h:712 [inline] copy_process.part.0+0x47ff/0x6710 kernel/fork.c:2222 copy_process kernel/fork.c:1772 [inline] _do_fork+0x25d/0xfd0 kernel/fork.c:2338 __do_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:240 [inline] __se_compat_sys_x86_clone arch/x86/ia32/sys_ia32.c:236 [inline] __ia32_compat_sys_x86_clone+0xbc/0x140 arch/x86/ia32/sys_ia32.c:236 do_syscall_32_irqs_on arch/x86/entry/common.c:334 [inline] do_fast_syscall_32+0x281/0xd54 arch/x86/entry/common.c:405 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 RIP: 0023:0xf7fec849 Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90 RSP: 002b:00000000ffed5a8c EFLAGS: 00000246 ORIG_RAX: 0000000000000078 RAX: ffffffffffffffda RBX: 0000000000003ffc RCX: 0000000000000000 RDX: 00000000200005c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Reported-and-tested-by: [email protected] Fixes: b3e5838 ("clone: add CLONE_PIDFD") Signed-off-by: Christian Brauner <[email protected]>

Patch series "lib/sort & lib/list_sort: faster and smaller", v2. Because CONFIG_RETPOLINE has made indirect calls much more expensive, I thought I'd try to reduce the number made by the library sort functions. The first three patches apply to lib/sort.c. Patch #1 is a simple optimization. The built-in swap has special cases for aligned 4- and 8-byte objects. But those are almost never used; most calls to sort() work on larger structures, which fall back to the byte-at-a-time loop. This generalizes them to aligned *multiples* of 4 and 8 bytes. (If nothing else, it saves an awful lot of energy by not thrashing the store buffers as much.) Patch #2 grabs a juicy piece of low-hanging fruit. I agree that nice simple solid heapsort is preferable to more complex algorithms (sorry, Andrey), but it's possible to implement heapsort with far fewer comparisons (50% asymptotically, 25-40% reduction for realistic sizes) than the way it's been done up to now. And with some care, the code ends up smaller, as well. This is the "big win" patch. Patch #3 adds the same sort of indirect call bypass that has been added to the net code of late. The great majority of the callers use the builtin swap functions, so replace the indirect call to sort_func with a (highly preditable) series of if() statements. Rather surprisingly, this decreased code size, as the swap functions were inlined and their prologue & epilogue code eliminated. lib/list_sort.c is a bit trickier, as merge sort is already close to optimal, and we don't want to introduce triumphs of theory over practicality like the Ford-Johnson merge-insertion sort. Patch #4, without changing the algorithm, chops 32% off the code size and removes the part[MAX_LIST_LENGTH+1] pointer array (and the corresponding upper limit on efficiently sortable input size). Patch #5 improves the algorithm. The previous code is already optimal for power-of-two (or slightly smaller) size inputs, but when the input size is just over a power of 2, there's a very unbalanced final merge. There are, in the literature, several algorithms which solve this, but they all depend on the "breadth-first" merge order which was replaced by commit 835cc0c with a more cache-friendly "depth-first" order. Some hard thinking came up with a depth-first algorithm which defers merges as little as possible while avoiding bad merges. This saves 0.2*n compares, averaged over all sizes. The code size increase is minimal (64 bytes on x86-64, reducing the net savings to 26%), but the comments expanded significantly to document the clever algorithm. TESTING NOTES: I have some ugly user-space benchmarking code which I used for testing before moving this code into the kernel. Shout if you want a copy. I'm running this code right now, with CONFIG_TEST_SORT and CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last round of minor edits to quell checkpatch. I figure there will be at least one round of comments and final testing. This patch (of 5): Rather than having special-case swap functions for 4- and 8-byte objects, special-case aligned multiples of 4 or 8 bytes. This speeds up most users of sort() by avoiding fallback to the byte copy loop. Despite what ca96ab8 ("lib/sort: Add 64 bit swap function") claims, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) The functions also got renamed to reflect the fact that they support multiple words. In the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) With feedback from Andy Shevchenko, Rasmus Villemoes and Geert Uytterhoeven. Link: http://lkml.kernel.org/r/f24f932df3a7fa1973c1084154f1cea596bcf341.1552704200.git.lkml@sdf.org Signed-off-by: George Spelvin <[email protected]> Acked-by: Andrey Abramov <[email protected]> Acked-by: Rasmus Villemoes <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Daniel Wagner <[email protected]> Cc: Don Mullis <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Observe a segmentation fault when 'perf stat' is asked to repeat forever with the interval option. Without fix: # perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.000211692 3,13,89,82,34,157 cycles 10.000380119 1,53,98,52,22,294 cycles 10.040467280 17,16,79,265 cycles Segmentation fault This problem was only observed when we use forever option aka -r 0 and works with limited repeats. Calling print_counter with ts being set to NULL, is not a correct option when interval is set. Hence avoid print_counter(NULL,..) if interval is set. With fix: # perf stat -r 0 -I 5000 -e cycles -a sleep 10 # time counts unit events 5.019866622 3,15,14,43,08,697 cycles 10.039865756 3,15,16,31,95,261 cycles 10.059950628 1,26,05,47,158 cycles 5.009902655 3,14,52,62,33,932 cycles 10.019880228 3,14,52,22,89,154 cycles 10.030543876 66,90,18,333 cycles 5.009848281 3,14,51,98,25,437 cycles 10.029854402 3,15,14,93,04,918 cycles 5.009834177 3,14,51,95,92,316 cycles Committer notes: Did the 'git bisect' to find the cset introducing the problem to add the Fixes tag below, and at that time the problem reproduced as: (gdb) run stat -r0 -I500 sleep 1 <SNIP> Program received signal SIGSEGV, Segmentation fault. print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 866 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep); (gdb) bt #0 print_interval (prefix=prefix@entry=0x7fffffffc8d0 "", ts=ts@entry=0x0) at builtin-stat.c:866 #1 0x000000000041860a in print_counters (ts=ts@entry=0x0, argc=argc@entry=2, argv=argv@entry=0x7fffffffd640) at builtin-stat.c:938 #2 0x0000000000419a7f in cmd_stat (argc=2, argv=0x7fffffffd640, prefix=<optimized out>) at builtin-stat.c:1411 #3 0x000000000045c65a in run_builtin (p=p@entry=0x6291b8 <commands+216>, argc=argc@entry=5, argv=argv@entry=0x7fffffffd640) at perf.c:370 #4 0x000000000045c893 in handle_internal_command (argc=5, argv=0x7fffffffd640) at perf.c:429 #5 0x000000000045c8f1 in run_argv (argcp=argcp@entry=0x7fffffffd4ac, argv=argv@entry=0x7fffffffd4a0) at perf.c:473 #6 0x000000000045cac9 in main (argc=<optimized out>, argv=<optimized out>) at perf.c:588 (gdb) Mostly the same as just before this patch: Program received signal SIGSEGV, Segmentation fault. 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 964 sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); (gdb) bt #0 0x00000000005874a7 in print_interval (config=0xa1f2a0 <stat_config>, evlist=0xbc9b90, prefix=0x7fffffffd1c0 "`", ts=0x0) at util/stat-display.c:964 #1 0x0000000000588047 in perf_evlist__print_counters (evlist=0xbc9b90, config=0xa1f2a0 <stat_config>, _target=0xa1f0c0 <target>, ts=0x0, argc=2, argv=0x7fffffffd670) at util/stat-display.c:1172 #2 0x000000000045390f in print_counters (ts=0x0, argc=2, argv=0x7fffffffd670) at builtin-stat.c:656 #3 0x0000000000456bb5 in cmd_stat (argc=2, argv=0x7fffffffd670) at builtin-stat.c:1960 #4 0x00000000004dd2e0 in run_builtin (p=0xa30e00 <commands+288>, argc=5, argv=0x7fffffffd670) at perf.c:310 #5 0x00000000004dd54d in handle_internal_command (argc=5, argv=0x7fffffffd670) at perf.c:362 #6 0x00000000004dd694 in run_argv (argcp=0x7fffffffd4cc, argv=0x7fffffffd4c0) at perf.c:406 #7 0x00000000004dda11 in main (argc=5, argv=0x7fffffffd670) at perf.c:531 (gdb) Fixes: d4f63a4 ("perf stat: Introduce print_counters function") Signed-off-by: Srikar Dronamraju <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Tested-by: Ravi Bangoria <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: [email protected] # v4.2+ Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

We release wrong pointer on error path in cpu_cache_level__read function, leading to segfault: (gdb) r record ls Starting program: /root/perf/tools/perf/perf record ls ... [ perf record: Woken up 1 times to write data ] double free or corruption (out) Thread 1 "perf" received signal SIGABRT, Aborted. 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6 (gdb) bt #0 0x00007ffff7463798 in raise () from /lib64/power9/libc.so.6 #1 0x00007ffff7443bac in abort () from /lib64/power9/libc.so.6 #2 0x00007ffff74af8bc in __libc_message () from /lib64/power9/libc.so.6 #3 0x00007ffff74b92b8 in malloc_printerr () from /lib64/power9/libc.so.6 #4 0x00007ffff74bb874 in _int_free () from /lib64/power9/libc.so.6 #5 0x0000000010271260 in __zfree (ptr=0x7fffffffa0b0) at ../../lib/zalloc.. #6 0x0000000010139340 in cpu_cache_level__read (cache=0x7fffffffa090, cac.. #7 0x0000000010143c90 in build_caches (cntp=0x7fffffffa118, size=<optimiz.. ... Releasing the proper pointer. Fixes: 720e98b ("perf tools: Add perf data cache feature") Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected]: # v4.6+ Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

[BUG] One user reported a reproducible KASAN report about use-after-free: BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265 BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0 ================================================================== BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs] Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579 CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P OE 5.2.11-arch1-1-kasan #4 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018 Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] Call Trace: dump_stack+0x7b/0xba print_address_description+0x6c/0x22e ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs] __kasan_report.cold+0x1b/0x3b ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs] kasan_report+0x12/0x17 __asan_report_store8_noabort+0x17/0x20 btrfs_init_reloc_root+0x2cd/0x340 [btrfs] record_root_in_trans+0x2a0/0x370 [btrfs] btrfs_record_root_in_trans+0xf4/0x140 [btrfs] start_transaction+0x1ab/0xe90 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs] ? lock_repin_lock+0x400/0x400 ? __kmem_cache_shutdown.cold+0x140/0x1ad ? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs] finish_ordered_fn+0x15/0x20 [btrfs] normal_work_helper+0x1bd/0xca0 [btrfs] ? process_one_work+0x819/0x1720 ? kasan_check_read+0x11/0x20 btrfs_endio_write_helper+0x12/0x20 [btrfs] process_one_work+0x8c9/0x1720 ? pwq_dec_nr_in_flight+0x2f0/0x2f0 ? worker_thread+0x1d9/0x1030 worker_thread+0x98/0x1030 kthread+0x2bb/0x3b0 ? process_one_work+0x1720/0x1720 ? kthread_park+0x120/0x120 ret_from_fork+0x35/0x40 Allocated by task 369692: __kasan_kmalloc.part.0+0x44/0xc0 __kasan_kmalloc.constprop.0+0xba/0xc0 kasan_kmalloc+0x9/0x10 kmem_cache_alloc_trace+0x138/0x260 btrfs_read_tree_root+0x92/0x360 [btrfs] btrfs_read_fs_root+0x10/0xb0 [btrfs] create_reloc_root+0x47d/0xa10 [btrfs] btrfs_init_reloc_root+0x1e2/0x340 [btrfs] record_root_in_trans+0x2a0/0x370 [btrfs] btrfs_record_root_in_trans+0xf4/0x140 [btrfs] start_transaction+0x1ab/0xe90 [btrfs] btrfs_start_transaction+0x1e/0x20 [btrfs] __btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs] btrfs_prealloc_file_range+0x13/0x20 [btrfs] prealloc_file_extent_cluster+0x29f/0x570 [btrfs] relocate_file_extent_cluster+0x193/0xc30 [btrfs] relocate_data_extent+0x1f8/0x490 [btrfs] relocate_block_group+0x600/0x1060 [btrfs] btrfs_relocate_block_group+0x3a0/0xa00 [btrfs] btrfs_relocate_chunk+0x9e/0x180 [btrfs] btrfs_balance+0x14e4/0x2fc0 [btrfs] btrfs_ioctl_balance+0x47f/0x640 [btrfs] btrfs_ioctl+0x119d/0x8380 [btrfs] do_vfs_ioctl+0x9f5/0x1060 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x73/0xb0 do_syscall_64+0xa5/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 369692: __kasan_slab_free+0x14f/0x210 kasan_slab_free+0xe/0x10 kfree+0xd8/0x270 btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs] clean_dirty_subvols+0x227/0x340 [btrfs] relocate_block_group+0x972/0x1060 [btrfs] btrfs_relocate_block_group+0x3a0/0xa00 [btrfs] btrfs_relocate_chunk+0x9e/0x180 [btrfs] btrfs_balance+0x14e4/0x2fc0 [btrfs] btrfs_ioctl_balance+0x47f/0x640 [btrfs] btrfs_ioctl+0x119d/0x8380 [btrfs] do_vfs_ioctl+0x9f5/0x1060 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x73/0xb0 do_syscall_64+0xa5/0x370 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88856f671100 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 1552 bytes inside of 4096-byte region [ffff88856f671100, ffff88856f672100) The buggy address belongs to the page: page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0 flags: 0x2ffff0000010200(slab|head) raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600 raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== BTRFS info (device sdi1): 1 enospc errors during balance BTRFS info (device sdi1): balance: ended with status: -28 [CAUSE] The problem happens when finish_ordered_io() get called with balance still running, while the reloc root of that subvolume is already dead. (Tree is swap already done, but tree not yet deleted for possible qgroup usage.) That means root->reloc_root still exists, but that reloc_root can be under btrfs_drop_snapshot(), thus we shouldn't access it. The following race could cause the use-after-free problem: CPU1 | CPU2 -------------------------------------------------------------------------- | relocate_block_group() | |- unset_reloc_control(rc) | |- btrfs_commit_transaction() btrfs_finish_ordered_io() | |- clean_dirty_subvols() |- btrfs_join_transaction() | | |- record_root_in_trans() | | |- btrfs_init_reloc_root() | | |- if (root->reloc_root) | | | | |- root->reloc_root = NULL | | |- btrfs_drop_snapshot(reloc_root); |- reloc_root->last_trans| = trans->transid | ^^^^^^^^^^^^^^^^^^^^^^ Use after free [FIX] Fix it by the following modifications: - Test if the root has dead reloc tree before accessing root->reloc_root If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to create or update root->reloc_tree - Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped reloc tree To co-operate with above modification, so as long as BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create reloc tree at record_root_in_trans(). Reported-by: Cebtenzzre <[email protected]> Fixes: d2311e6 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") CC: [email protected] # 5.1+ Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

popcornmix and others added 30 commits June 17, 2015 22:32

Main bcm2708/bcm2709 linux port

8e2bdce

Signed-off-by: popcornmix <[email protected]> Signed-off-by: Noralf Trønnes <[email protected]>

bcm2708 watchdog driver

560486d

Signed-off-by: popcornmix <[email protected]>

cma: Add vc_cma driver to enable use of CMA

34fb9ba

Signed-off-by: popcornmix <[email protected]> vc_cma: Make the vc_cma area the default contiguous DMA area

vc_mem: Add vc_mem driver

6290b45

Signed-off-by: popcornmix <[email protected]> BCM270x: Move vc_mem Make the vc_mem module available for ARCH_BCM2835 by moving it. Signed-off-by: Noralf Trønnes <[email protected]>

Add hwrng (hardware random number generator) driver

7f769de

Add cpufreq driver

b359d4f

Signed-off-by: popcornmix <[email protected]>

scripts/dtc: Update to upstream version with overlay patches

2cc5e0d

fdt: Add support for the CONFIG_CMDLINE_EXTEND option

885d18f

rtl8192cu: Add PID for D-Link DWA 131

6759149

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile fixes for vc4-kms-v3d-rpi2 #4

Compile fixes for vc4-kms-v3d-rpi2 #4

gohai commented Jun 18, 2015

Compile fixes for vc4-kms-v3d-rpi2 #4

Compile fixes for vc4-kms-v3d-rpi2 #4

Conversation

gohai commented Jun 18, 2015