forked from armbian/linux-rockchip
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panthor driver backport #7
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Arm has introduced a new v10 GPU architecture that replaces the Job Manager interface with a new Command Stream Frontend. It adds firmware driven command stream queues that can be used by kernel and user space to submit jobs to the GPU. Add the initial schema for the device tree that is based on support for RK3588 SoC. The minimum number of clocks is one for the IP, but on Rockchip platforms they will tend to expose the semi-independent clocks for better power management. v3: - Cleanup commit message to remove redundant text - Added opp-table property and re-ordered entries - Clarified power-domains and power-domain-names requirements for RK3588. - Cleaned up example Note: power-domains and power-domain-names requirements for other platforms are still work in progress, hence the bindings are left incomplete here. v2: - New commit Signed-off-by: Liviu Dudau <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Rob Herring <[email protected]> Cc: Conor Dooley <[email protected]> Cc: [email protected]
This adds the infrastructure for an execution context for GEM buffers which is similar to the existing TTMs execbuf util and intended to replace it in the long term. The basic functionality is that we abstracts the necessary loop to lock many different GEM buffers with automated deadlock and duplicate handling. v2: drop xarray and use dynamic resized array instead, the locking overhead is unnecessary and measurable. v3: drop duplicate tracking, radeon is really the only one needing that. v4: fixes issues pointed out by Danilo, some typos in comments and a helper for lock arrays of GEM objects. v5: some suggestions by Boris Brezillon, especially just use one retry macro, drop loop in prepare_array, use flags instead of bool v6: minor changes suggested by Thomas, Boris and Danilo v7: minor typos pointed out by checkpatch.pl fixed Signed-off-by: Christian König <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Danilo Krummrich <[email protected]> Acked-by: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Add infrastructure to keep track of GPU virtual address (VA) mappings with a decicated VA space manager implementation. New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers start implementing, allow userspace applications to request multiple and arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is intended to serve the following purposes in this context. 1) Provide infrastructure to track GPU VA allocations and mappings, using an interval tree (RB-tree). 2) Generically connect GPU VA mappings to their backing buffers, in particular DRM GEM objects. 3) Provide a common implementation to perform more complex mapping operations on the GPU VA space. In particular splitting and merging of GPU VA mappings, e.g. for intersecting mapping requests or partial unmap requests. Acked-by: Thomas Hellström <[email protected]> Acked-by: Matthew Brost <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Tested-by: Matthew Brost <[email protected]> Tested-by: Donald Robson <[email protected]> Suggested-by: Dave Airlie <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
vm_flags are among VMA attributes which affect decisions like VMA merging and splitting. Therefore all vm_flags modifications are performed after taking exclusive mmap_lock to prevent vm_flags updates racing with such operations. Introduce modifier functions for vm_flags to be used whenever flags are updated. This way we can better check and control correct locking behavior during these updates. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: David Rientjes <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Oskolkov <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Punit Agrawal <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Song Liu <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
Rename struct drm_gpuva_manager to struct drm_gpuvm including corresponding functions. This way the GPUVA manager's structures align very well with the documentation of VM_BIND [1] and VM_BIND locking [2]. It also provides a better foundation for the naming of data structures and functions introduced for implementing a common dma-resv per GPU-VM including tracking of external and evicted objects in subsequent patches. [1] Documentation/gpu/drm-vm-bind-async.rst [2] Documentation/gpu/drm-vm-bind-locking.rst Cc: Thomas Hellström <[email protected]> Cc: Matthew Brost <[email protected]> Acked-by: Dave Airlie <[email protected]> Acked-by: Christian König <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Currently, the DRM GPUVM does not have any core dependencies preventing a module build. Also, new features from subsequent patches require helpers (namely drm_exec) which can be built as module. Reviewed-by: Christian König <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Use drm_WARN() and drm_WARN_ON() variants to indicate drivers the context the failing VM resides in. Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
Don't always WARN in drm_gpuvm_check_overflow() and separate it into a drm_gpuvm_check_overflow() and a dedicated drm_gpuvm_warn_check_overflow() variant. This avoids printing warnings due to invalid userspace requests. Reviewed-by: Thomas Hellström <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
Drivers may use this function to validate userspace requests in advance, hence export it. Reviewed-by: Thomas Hellström <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. This is used in a subsequent patch to generalize dma-resv, external and evicted object handling and GEM validation. Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
Introduce flags for struct drm_gpuvm, this required by subsequent commits. Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
Implement reference counting for struct drm_gpuvm. Signed-off-by: Danilo Krummrich <[email protected]>
Add an abstraction layer between the drm_gpuva mappings of a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Thomas Hellström <[email protected]> Suggested-by: Matthew Brost <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]>
This will be useful for GPU drivers who want to keep page tables in a pool so they can: - keep freed page tables in a free pool and speed-up upcoming page table allocations - batch page table allocation instead of allocating one page at a time - pre-reserve pages for page tables needed for map/unmap operations, to ensure map/unmap operations don't try to allocate memory in paths they're allowed to block or fail It might also be valuable for other aspects of GPU and similar use-cases, like fine-grained memory accounting and resource limiting. We will extend the Arm LPAE format to support custom allocators in a separate commit. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]>
We need that in order to implement the VM_BIND ioctl in the GPU driver targeting new Mali GPUs. VM_BIND is about executing MMU map/unmap requests asynchronously, possibly after waiting for external dependencies encoded as dma_fences. We intend to use the drm_sched framework to automate the dependency tracking and VM job dequeuing logic, but this comes with its own set of constraints, one of them being the fact we are not allowed to allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this sort of deadlocks: - VM_BIND map job needs to allocate a page table to map some memory to the VM. No memory available, so kswapd is kicked - GPU driver shrinker backend ends up waiting on the fence attached to the VM map job or any other job fence depending on this VM operation. With custom allocators, we will be able to pre-reserve enough pages to guarantee the map/unmap operations we queued will take place without going through the system allocator. But we can also optimize allocation/reservation by not free-ing pages immediately, so any upcoming page table allocation requests can be serviced by some free page table pool kept at the driver level. I might also be valuable for other aspects of GPU and similar use-cases, like fine-grained memory accounting and resource limiting. Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]>
Ease debugging of a multi-GPU system by using drm_WARN_*() and drm_dbg_kms() helpers that print out DRM device name corresponding to shmem GEM. Reviewed-by: Thomas Zimmermann <[email protected]> Suggested-by: Thomas Zimmermann <[email protected]> Signed-off-by: Dmitry Osipenko <[email protected]> Link: https://lore.kernel.org/all/[email protected]/
DMA-buf core has its own refcounting of vmaps, use it instead of drm-shmem counting. This change prepares drm-shmem for addition of memory shrinker support where drm-shmem will use a single dma-buf reservation lock for all operations performed over dma-bufs. Reviewed-by: Thomas Zimmermann <[email protected]> Signed-off-by: Dmitry Osipenko <[email protected]> Link: https://lore.kernel.org/all/[email protected]/
Replace all drm-shmem locks with a GEM reservation lock. This makes locks consistent with dma-buf locking convention where importers are responsible for holding reservation lock for all operations performed over dma-bufs, preventing deadlock between dma-buf importers and exporters. Suggested-by: Daniel Vetter <[email protected]> Acked-by: Thomas Zimmermann <[email protected]> Signed-off-by: Dmitry Osipenko <[email protected]> Link: https://lore.kernel.org/all/[email protected]/
The dma-buf backend is supposed to provide its own vm_ops, but some implementation just have nothing special to do and leave vm_ops untouched, probably expecting this field to be zero initialized (this is the case with the system_heap implementation for instance). Let's reset vma->vm_ops to NULL to keep things working with these implementations. Fixes: 26d3ac3 ("drm/shmem-helpers: Redirect mmap for imported dma-buf") Cc: <[email protected]> Cc: Daniel Vetter <[email protected]> Reported-by: Roman Stratiienko <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Tested-by: Roman Stratiienko <[email protected]> Reviewed-by: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Replace all drm-shmem locks with a GEM reservation lock. This makes locks consistent with dma-buf locking convention where importers are responsible for holding reservation lock for all operations performed over dma-bufs, preventing deadlock between dma-buf importers and exporters. Suggested-by: Daniel Vetter <[email protected]> Acked-by: Thomas Zimmermann <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Dmitry Osipenko <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
This way we can grab a pages ref without acquiring the resv lock when pages_use_count > 0. This is needed to implement asynchronous map using the drm_gpuva_mgr when the map/unmap operation triggers a mapping split, requiring the new left/right regions to grab an additional page ref to guarantee that the pages stay pinned when the middle section is unmapped. Signed-off-by: Boris Brezillon <[email protected]> Signed-off-by: Steven Price <[email protected]>
When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben) v6: Add missing drm_sched_rq_remove_fifo_locked v7: Fix ts sampling bug and more cosmetic stuff (Luben) v8: Fix module parameter string (Luben) Cc: Luben Tuikov <[email protected]> Cc: Christian König <[email protected]> Cc: Direct Rendering Infrastructure - Development <[email protected]> Cc: AMD Graphics <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Tested-by: Yunxiang Li (Teddy) <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Otherwise we would crash if the job is not resubmitted. v2: fix second usage of s_fence->parent as well. Signed-off-by: Christian König <[email protected]> Reviewed-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
The currently default Round-Robin GPU scheduling can result in starvation of entities which have a large number of jobs, over entities which have a very small number of jobs (single digit). This can be illustrated in the following diagram, where jobs are alphabetized to show their chronological order of arrival, where job A is the oldest, B is the second oldest, and so on, to J, the most recent job to arrive. ---> entities j | H-F-----A--E--I-- o | --G-----B-----J-- b | --------C-------- s\/ --------D-------- WLOG, assuming all jobs are "ready", then a R-R scheduling will execute them in the following order (a slice off of the top of the entities' list), H, F, A, E, I, G, B, J, C, D. However, to mitigate job starvation, we'd rather execute C and D before E, and so on, given, of course, that they're all ready to be executed. So, if all jobs are ready at this instant, the order of execution for this and the next 9 instances of picking the next job to execute, should really be, A, B, C, D, E, F, G, H, I, J, which is their chronological order. The only reason for this order to be broken, is if an older job is not yet ready, but a younger job is ready, at an instant of picking a new job to execute. For instance if job C wasn't ready at time 2, but job D was ready, then we'd pick job D, like this: 0 +1 +2 ... A, B, D, ... And from then on, C would be preferred before all other jobs, if it is ready at the time when a new job for execution is picked. So, if C became ready two steps later, the execution order would look like this: ......0 +1 +2 ... A, B, D, E, C, F, G, H, I, J This is what the FIFO GPU scheduling algorithm achieves. It uses a Red-Black tree to keep jobs sorted in chronological order, where picking the oldest job is O(1) (we use the "cached" structure), and balancing the tree is O(log n). IOW, it picks the *oldest ready* job to execute now. The implementation is already in the kernel, and this commit only changes the default GPU scheduling algorithm to use. This was tested and achieves about 1% faster performance over the Round Robin algorithm. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Direct Rendering Infrastructure - Development <[email protected]> Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
Add a new function to update job dependencies from a resv obj. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Not used any more. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
This was buggy because when we had to wait for entities which were killed as well we would just deadlock. Instead move all the dependency handling into the callbacks so that will all happen asynchronously. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
This now matches much better what this is doing. Signed-off-by: Christian König <[email protected]> Reviewed-by: Luben Tuikov <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
This reverts commit e6c6338. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely not helpful to crash the hardware multiple times if a clean recovery is desired. For testing and debugging environments we should rather disable recovery alltogether to be able to inspect the state with a hw debugger. Additional to that the sw implementation is clearly buggy and causes reference count issues for the hardware fence. Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
Call drm_gem_prime_handle_to_fd() and drm_gem_prime_fd_to_handle() by default if no PRIME import/export helpers have been set. Both functions are the default for almost all drivers. DRM drivers implement struct drm_driver.gem_prime_import_sg_table to import dma-buf objects from other drivers. Having the function drm_gem_prime_fd_to_handle() functions set by default allows each driver to import dma-buf objects to itself, even without support for other drivers. For drm_gem_prime_handle_to_fd() it is similar: using it by default allows each driver to export to itself, even without support for other drivers. This functionality enables userspace to share per-driver buffers across process boundaries via PRIME (e.g., wlroots requires this functionality). The patch generalizes a pattern that has previously been implemented by GEM VRAM helpers [1] to work with any driver. For example, gma500 can now run the wlroots-based sway compositor. v2: * clean up docs and TODO comments (Simon, Zack) * clean up style in drm_getcap() Signed-off-by: Thomas Zimmermann <[email protected]> Link: https://lore.kernel.org/dri-devel/[email protected]/ # 1 Reviewed-by: Simon Ser <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Jeffrey Hugo <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Clear all assignments of struct drm_driver's fd/handle callbacks to drm_gem_prime_fd_to_handle() and drm_gem_prime_handle_to_fd(). These functions are called by default. Add a TODO item to convert vmwgfx to the defaults as well. v2: * remove TODO item (Zack) * also update amdgpu's amdgpu_partition_driver Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Simon Ser <[email protected]> Acked-by: Alex Deucher <[email protected]> Acked-by: Jeffrey Hugo <[email protected]> # qaic Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
G610 Mali normally takes 2 regulators, but the devfreq implementation can only deal with one. Let's add a regulator coupler as done for mtk8183.
On architectures that support the preservation of memblock metadata after __init, allow drivers to call memblock_free() to free a reservation made by early arch code. This is a hack to support the freeing of bootsplash reservations passed to Linux by the bootloader. (This should be reworked in future versions of Android; do not cherry-pick this patch forward.) Bug: 139653858 Bug: 174620135 Change-Id: I32c0ee70c33c94deff70aa548896caa9978396fb Signed-off-by: Alistair Delva <[email protected]>
DKMS modules use the same makefile, and random warnings on version jumps or toolchain causes dkms modules fail to build
…troyed Some users need to release resources attached to the vm_bo object when it's destroyed. In Panthor's case, we need to release the pin ref so BO pages can be returned to the system when all GPU mappings are gone. This could be done through a custom drm_gpuvm::vm_bo_free() hook, but this has all sort of locking implications that would force us to expose a drm_gem_shmem_unpin_locked() helper, not to mention the fact that having a ::vm_bo_free() implementation without a ::vm_bo_alloc() one seems odd. So let's keep things simple, and extend drm_gpuvm_bo_put() to report when the object is destroyed. Signed-off-by: Boris Brezillon <[email protected]>
Panthor follows the lead of other recently submitted drivers with ioctls allowing us to support modern Vulkan features, like sparse memory binding: - Pretty standard GEM management ioctls (BO_CREATE and BO_MMAP_OFFSET), with the 'exclusive-VM' bit to speed-up BO reservation on job submission - VM management ioctls (VM_CREATE, VM_DESTROY and VM_BIND). The VM_BIND ioctl is loosely based on the Xe model, and can handle both asynchronous and synchronous requests - GPU execution context creation/destruction, tiler heap context creation and job submission. Those ioctls reflect how the hardware/scheduler works and are thus driver specific. We also have a way to expose IO regions, such that the usermode driver can directly access specific/well-isolate registers, like the LATEST_FLUSH register used to implement cache-flush reduction. This uAPI intentionally keeps usermode queues out of the scope, which explains why doorbell registers and command stream ring-buffers are not directly exposed to userspace. v5: - Fix typo - Add Liviu's R-b v4: - Add a VM_GET_STATE ioctl - Fix doc - Expose the CORE_FEATURES register so we can deal with variants in the UMD - Add Steve's R-b v3: - Add the concept of sync-only VM operation - Fix support for 32-bit userspace - Rework drm_panthor_vm_create to pass the user VA size instead of the kernel VA size (suggested by Robin Murphy) - Typo fixes - Explicitly cast enums with top bit set to avoid compiler warnings in -pedantic mode. - Drop property core_group_count as it can be easily calculated by the number of bits set in l2_present. Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]>
Those are the registers directly accessible through the MMIO range. FW registers are exposed in panthor_fw.h. v4: - Add the CORE_FEATURES register (needed for GPU variants) - Add Steve's R-b v3: - Add macros to extract GPU ID info - Formatting changes - Remove AS_TRANSCFG_ADRMODE_LEGACY - it doesn't exist post-CSF - Remove CSF_GPU_LATEST_FLUSH_ID_DEFAULT - Add GPU_L2_FEATURES_LINE_SIZE for extracting the GPU cache line size Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Reviewed-by: Steven Price <[email protected]>
The panthor driver is designed in a modular way, where each logical block is dealing with a specific HW-block or software feature. In order for those blocks to communicate with each other, we need a central panthor_device collecting all the blocks, and exposing some common features, like interrupt handling, power management, reset, ... This what this panthor_device logical block is about. v5: - Suspend the MMU/GPU blocks if panthor_fw_resume() fails in panthor_device_resume() - Move the pm_runtime_use_autosuspend() call before drm_dev_register() - Add Liviu's R-b v4: - Check drmm_mutex_init() return code - Fix panthor_device_reset_work() out path - Fix the race in the unplug logic - Fix typos - Unplug blocks when something fails in panthor_device_init() - Add Steve's R-b v3: - Add acks for the MIT+GPL2 relicensing - Fix 32-bit support - Shorten the sections protected by panthor_device::pm::mmio_lock to fix lock ordering issues. - Rename panthor_device::pm::lock into panthor_device::pm::mmio_lock to better reflect what this lock is protecting - Use dev_err_probe() - Make sure we call drm_dev_exit() when something fails half-way in panthor_device_reset_work() - Replace CSF_GPU_LATEST_FLUSH_ID_DEFAULT with a constant '1' and a comment to explain. Also remove setting the dummy flush ID on suspend. - Remove drm_WARN_ON() in panthor_exception_name() - Check pirq->suspended in panthor_xxx_irq_raw_handler() Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]>
Handles everything that's not related to the FW, the MMU or the scheduler. This is the block dealing with the GPU property retrieval, the GPU block power on/off logic, and some global operations, like global cache flushing. v5: - Fix GPU_MODEL() kernel doc - Fix test in panthor_gpu_block_power_off() - Add Steve's R-b v4: - Expose CORE_FEATURES through DEV_QUERY v3: - Add acks for the MIT/GPL2 relicensing - Use macros to extract GPU ID info - Make sure we reset clear pending_reqs bits when wait_event_timeout() times out but the corresponding bit is cleared in GPU_INT_RAWSTAT (can happen if the IRQ is masked or HW takes to long to call the IRQ handler) - GPU_MODEL now takes separate arch and product majors to be more readable. - Drop GPU_IRQ_MCU_STATUS_CHANGED from interrupt mask. - Handle GPU_IRQ_PROTM_FAULT correctly (don't output registers that are not updated for protected interrupts). - Minor code tidy ups Cc: Alexey Sheplyakov <[email protected]> # MIT+GPL2 relicensing Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Reviewed-by: Steven Price <[email protected]>
Anything relating to GEM object management is placed here. Nothing particularly interesting here, given the implementation is based on drm_gem_shmem_object, which is doing most of the work. v5: - Add Liviu's and Steve's R-b v4: - Force kernel BOs to be GPU mapped - Make panthor_kernel_bo_destroy() robust against ERR/NULL BO pointers to simplify the call sites v3: - Add acks for the MIT/GPL2 relicensing - Provide a panthor_kernel_bo abstraction for buffer objects managed by the kernel (will replace panthor_fw_mem and be used everywhere we were using panthor_gem_create_and_map() before) - Adjust things to match drm_gpuvm changes - Change return of panthor_gem_create_with_handle() to int Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Reviewed-by: Liviu Dudau <[email protected]> Reviewed-by: Steven Price <[email protected]>
Every thing related to devfreq in placed in panthor_devfreq.c, and helpers that can be called by other logical blocks are exposed through panthor_devfreq.h. This implementation is loosely based on the panfrost implementation, the only difference being that we don't count device users, because the idle/active state will be managed by the scheduler logic. v4: - Add Clément's A-b for the relicensing v3: - Add acks for the MIT/GPL2 relicensing v2: - Added in v2 Cc: Clément Péron <[email protected]> # MIT+GPL2 relicensing Reviewed-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Acked-by: Clément Péron <[email protected]> # MIT+GPL2 relicensing
MMU and VM management is related and placed in the same source file. Page table updates are delegated to the io-pgtable-arm driver that's in the iommu subsystem. The VM management logic is based on drm_gpuva_mgr, and is assuming the VA space is mostly managed by the usermode driver, except for a reserved portion of this VA-space that's used for kernel objects (like the heap contexts/chunks). Both asynchronous and synchronous VM operations are supported, and internal helpers are exposed to allow other logical blocks to map their buffers in the GPU VA space. There's one VM_BIND queue per-VM (meaning the Vulkan driver can only expose one sparse-binding queue), and this bind queue is managed with a 1:1 drm_sched_entity:drm_gpu_scheduler, such that each VM gets its own independent execution queue, avoiding VM operation serialization at the device level (things are still serialized at the VM level). The rest is just implementation details that are hopefully well explained in the documentation. v5: - Fix a double panthor_vm_cleanup_op_ctx() call - Fix a race between panthor_vm_prepare_map_op_ctx() and panthor_vm_bo_put() - Fix panthor_vm_pool_destroy_vm() kernel doc - Fix paddr adjustment in panthor_vm_map_pages() - Fix bo_offset calculation in panthor_vm_get_bo_for_va() v4: - Add an helper to return the VM state - Check drmm_mutex_init() return code - Remove the VM from the AS reclaim list when panthor_vm_active() is called - Count the number of active VM users instead of considering there's at most one user (several scheduling groups can point to the same vM) - Pre-allocate a VMA object for unmap operations (unmaps can trigger a sm_step_remap() call) - Check vm->root_page_table instead of vm->pgtbl_ops to detect if the io-pgtable is trying to allocate the root page table - Don't memset() the va_node in panthor_vm_alloc_va(), make it a caller requirement - Fix the kernel doc in a few places - Drop the panthor_vm::base offset constraint and modify panthor_vm_put() to explicitly check for a NULL value - Fix unbalanced vm_bo refcount in panthor_gpuva_sm_step_remap() - Drop stale comments about the shared_bos list - Patch mmu_features::va_bits on 32-bit builds to reflect the io_pgtable limitation and let the UMD know about it v3: - Add acks for the MIT/GPL2 relicensing - Propagate MMU faults to the scheduler - Move pages pinning/unpinning out of the dma_signalling path - Fix 32-bit support - Rework the user/kernel VA range calculation - Make the auto-VA range explicit (auto-VA range doesn't cover the full kernel-VA range on the MCU VM) - Let callers of panthor_vm_alloc_va() allocate the drm_mm_node (embedded in panthor_kernel_bo now) - Adjust things to match the latest drm_gpuvm changes (extobj tracking, resv prep and more) - Drop the per-AS lock and use slots_lock (fixes a race on vm->as.id) - Set as.id to -1 when reusing an address space from the LRU list - Drop misleading comment about page faults - Remove check for irq being assigned in panthor_mmu_unplug() Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora
Contains everything that's FW related, that includes the code dealing with the microcontroller unit (MCU) that's running the FW, and anything related to allocating memory shared between the FW and the CPU. A few global FW events are processed in the IRQ handler, the rest is forwarded to the scheduler, since scheduling is the primary reason for the FW existence, and also the main source of FW <-> kernel interactions. v5: - Fix typo in GLB_PERFCNT_SAMPLE definition - Fix unbalanced panthor_vm_idle/active() calls - Fallback to a slow reset when the fast reset fails - Add extra information when reporting a FW boot failure v4: - Add a MODULE_FIRMWARE() entry for gen 10.8 - Fix a wrong return ERR_PTR() in panthor_fw_load_section_entry() - Fix typos - Add Steve's R-b v3: - Make the FW path more future-proof (Liviu) - Use one waitqueue for all FW events - Simplify propagation of FW events to the scheduler logic - Drop the panthor_fw_mem abstraction and use panthor_kernel_bo instead - Account for the panthor_vm changes - Replace magic number with 0x7fffffff with ~0 to better signify that it's the maximum permitted value. - More accurate rounding when computing the firmware timeout. - Add a 'sub iterator' helper function. This also adds a check that a firmware entry doesn't overflow the firmware image. - Drop __packed from FW structures, natural alignment is good enough. - Other minor code improvements. Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]>
Tiler heap growing requires some kernel driver involvement: when the tiler runs out of heap memory, it will raise an exception which is either directly handled by the firmware if some free heap chunks are available in the heap context, or passed back to the kernel otherwise. The heap helpers will be used by the scheduler logic to allocate more heap chunks to a heap context, when such a situation happens. Heap context creation is explicitly requested by userspace (using the TILER_HEAP_CREATE ioctl), and the returned context is attached to a queue through some command stream instruction. All the kernel does is keep the list of heap chunks allocated to a context, so they can be freed when TILER_HEAP_DESTROY is called, or extended when the FW requests a new chunk. v5: - Fix FIXME comment - Add Steve's R-b v4: - Rework locking to allow concurrent calls to panthor_heap_grow() - Add a helper to return a heap chunk if we couldn't pass it to the FW because the group was scheduled out v3: - Add a FIXME for the heap OOM deadlock - Use the panthor_kernel_bo abstraction for the heap context and heap chunks - Drop the panthor_heap_gpu_ctx struct as it is opaque to the driver - Ensure that the heap context is aligned to the GPU cache line size - Minor code tidy ups Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]>
This is the piece of software interacting with the FW scheduler, and taking care of some scheduling aspects when the FW comes short of slots scheduling slots. Indeed, the FW only expose a few slots, and the kernel has to give all submission contexts, a chance to execute their jobs. The kernel-side scheduler is timeslice-based, with a round-robin queue per priority level. Job submission is handled with a 1:1 drm_sched_entity:drm_gpu_scheduler, allowing us to delegate the dependency tracking to the core. All the gory details should be documented inline. v5: - Fix typos - Call panthor_kernel_bo_destroy(group->syncobjs) unconditionally - Don't move the group to the waiting list tail when it was already waiting for a different syncobj - Fix fatal_queues flagging in the tiler OOM path - Don't warn when more than one job timesout on a group - Add a warning message when we fail to allocate a heap chunk - Add Steve's R-b v4: - Check drmm_mutex_init() return code - s/drm_gem_vmap_unlocked/drm_gem_vunmap_unlocked/ in panthor_queue_put_syncwait_obj() - Drop unneeded WARN_ON() in cs_slot_sync_queue_state_locked() - Use atomic_xchg() instead of atomic_fetch_and(0) - Fix typos - Let panthor_kernel_bo_destroy() check for IS_ERR_OR_NULL() BOs - Defer TILER_OOM event handling to a separate workqueue to prevent deadlocks when the heap chunk allocation is blocked on mem-reclaim. This is just a temporary solution, until we add support for non-blocking/failable allocations - Pass the scheduler workqueue to drm_sched instead of instantiating a separate one (no longer needed now that heap chunk allocation happens on a dedicated wq) - Set WQ_MEM_RECLAIM on the scheduler workqueue, so we can handle job timeouts when the system is under mem pressure, and hopefully free up some memory retained by these jobs v3: - Rework the FW event handling logic to avoid races - Make sure MMU faults kill the group immediately - Use the panthor_kernel_bo abstraction for group/queue buffers - Make in_progress an atomic_t, so we can check it without the reset lock held - Don't limit the number of groups per context to the FW scheduler capacity. Fix the limit to 128 for now. - Add a panthor_job_vm() helper - Account for panthor_vm changes - Add our job fence as DMA_RESV_USAGE_WRITE to all external objects (was previously DMA_RESV_USAGE_BOOKKEEP). I don't get why, given we're supposed to be fully-explicit, but other drivers do that, so there must be a good reason - Account for drm_sched changes - Provide a panthor_queue_put_syncwait_obj() - Unconditionally return groups to their idle list in panthor_sched_suspend() - Condition of sched_queue_{,delayed_}work fixed to be only when a reset isn't pending or in progress. - Several typos in comments fixed. Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]>
This is the last piece missing to expose the driver to the outside world. This is basically a wrapper between the ioctls and the other logical blocks. v5: - Account for the drm_exec_init() prototype change - Include platform_device.h v4: - Add an ioctl to let the UMD query the VM state - Fix kernel doc - Let panthor_device_init() call panthor_device_init() - Fix cleanup ordering in the panthor_init() error path - Add Steve's and Liviu's R-b v3: - Add acks for the MIT/GPL2 relicensing - Fix 32-bit support - Account for panthor_vm and panthor_sched changes - Simplify the resv preparation/update logic - Use a linked list rather than xarray for list of signals. - Simplify panthor_get_uobj_array by returning the newly allocated array. - Drop the "DOC" for job submission helpers and move the relevant comments to panthor_ioctl_group_submit(). - Add helpers sync_op_is_signal()/sync_op_is_wait(). - Simplify return type of panthor_submit_ctx_add_sync_signal() and panthor_submit_ctx_get_sync_signal(). - Drop WARN_ON from panthor_submit_ctx_add_job(). - Fix typos in comments. Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]>
Now that all blocks are available, we can add/update Kconfig/Makefile files to allow compilation. v4: - Add Steve's R-b v3: - Add a dep on DRM_GPUVM - Fix dependencies in Kconfig - Expand help text to (hopefully) describe which GPUs are to be supported by this driver and which are for panfrost. Co-developed-by: Steven Price <[email protected]> Signed-off-by: Steven Price <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Acked-by: Steven Price <[email protected]> # MIT+GPL2 relicensing,Arm Acked-by: Grant Likely <[email protected]> # MIT+GPL2 relicensing,Linaro Acked-by: Boris Brezillon <[email protected]> # MIT+GPL2 relicensing,Collabora Reviewed-by: Steven Price <[email protected]>
In cases where the # is known ahead of time, it is silly to do the table resize dance. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Christian König <[email protected]> Patchwork: https://patchwork.freedesktop.org/patch/568338/
Joshua-Riek
pushed a commit
that referenced
this pull request
Jul 9, 2024
commit 1a5352a upstream. The ath11k active pdevs are protected by RCU but the temperature event handling code calling ath11k_mac_get_ar_by_pdev_id() was not marked as a read-side critical section as reported by RCU lockdep: ============================= WARNING: suspicious RCU usage 6.6.0-rc6 #7 Not tainted ----------------------------- drivers/net/wireless/ath/ath11k/mac.c:638 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by swapper/0/0. ... Call trace: ... lockdep_rcu_suspicious+0x16c/0x22c ath11k_mac_get_ar_by_pdev_id+0x194/0x1b0 [ath11k] ath11k_wmi_tlv_op_rx+0xa84/0x2c1c [ath11k] ath11k_htc_rx_completion_handler+0x388/0x510 [ath11k] Mark the code in question as an RCU read-side critical section to avoid any potential use-after-free issues. Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23 Fixes: a41d103 ("ath11k: add thermal sensor device support") Cc: [email protected] # 5.7 Signed-off-by: Johan Hovold <[email protected]> Acked-by: Jeff Johnson <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Joshua-Riek
pushed a commit
that referenced
this pull request
Jul 9, 2024
commit b684c09 upstream. ppc_save_regs() skips one stack frame while saving the CPU register states. Instead of saving current R1, it pulls the previous stack frame pointer. When vmcores caused by direct panic call (such as `echo c > /proc/sysrq-trigger`), are debugged with gdb, gdb fails to show the backtrace correctly. On further analysis, it was found that it was because of mismatch between r1 and NIP. GDB uses NIP to get current function symbol and uses corresponding debug info of that function to unwind previous frames, but due to the mismatching r1 and NIP, the unwinding does not work, and it fails to unwind to the 2nd frame and hence does not show the backtrace. GDB backtrace with vmcore of kernel without this patch: --------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=<optimized out>, newregs=0xc000000004f8f8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=<optimized out>) at kernel/kexec_core.c:974 #2 0x0000000000000063 in ?? () #3 0xc000000003579320 in ?? () --------- Further analysis revealed that the mismatch occurred because "ppc_save_regs" was saving the previous stack's SP instead of the current r1. This patch fixes this by storing current r1 in the saved pt_regs. GDB backtrace with vmcore of patched kernel: -------- (gdb) bt #0 0xc0000000002a53e8 in crash_setup_regs (oldregs=0x0, newregs=0xc00000000670b8d8) at ./arch/powerpc/include/asm/kexec.h:69 #1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:974 #2 0xc000000000168918 in panic (fmt=fmt@entry=0xc000000001654a60 "sysrq triggered crash\n") at kernel/panic.c:358 #3 0xc000000000b735f8 in sysrq_handle_crash (key=<optimized out>) at drivers/tty/sysrq.c:155 #4 0xc000000000b742cc in __handle_sysrq (key=key@entry=99, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:602 #5 0xc000000000b7506c in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1163 #6 0xc00000000069a7bc in pde_write (ppos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>, pde=0xc00000000362cb40) at fs/proc/inode.c:340 #7 proc_reg_write (file=<optimized out>, buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:352 #8 0xc0000000005b3bbc in vfs_write (file=file@entry=0xc000000006aa6b00, buf=buf@entry=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=count@entry=2, pos=pos@entry=0xc00000000670bda0) at fs/read_write.c:582 #9 0xc0000000005b4264 in ksys_write (fd=<optimized out>, buf=0x61f498b4f60 <error: Cannot access memory at address 0x61f498b4f60>, count=2) at fs/read_write.c:637 #10 0xc00000000002ea2c in system_call_exception (regs=0xc00000000670be80, r0=<optimized out>) at arch/powerpc/kernel/syscall.c:171 #11 0xc00000000000c270 in system_call_vectored_common () at arch/powerpc/kernel/interrupt_64.S:192 -------- Nick adds: So this now saves regs as though it was an interrupt taken in the caller, at the instruction after the call to ppc_save_regs, whereas previously the NIP was there, but R1 came from the caller's caller and that mismatch is what causes gdb's dwarf unwinder to go haywire. Signed-off-by: Aditya Gupta <[email protected]> Fixes: d16a58f ("powerpc: Improve ppc_save_regs()") Reivewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected] Cc: [email protected] Signed-off-by: Aditya Gupta <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Joshua-Riek
pushed a commit
that referenced
this pull request
Sep 1, 2024
When l2tp tunnels use a socket provided by userspace, we can hit lockdep splats like the below when data is transmitted through another (unrelated) userspace socket which then gets routed over l2tp. This issue was previously discussed here: https://lore.kernel.org/netdev/[email protected]/ The solution is to have lockdep treat socket locks of l2tp tunnel sockets separately than those of standard INET sockets. To do so, use a different lockdep subclass where lock nesting is possible. ============================================ WARNING: possible recursive locking detected 6.10.0+ #34 Not tainted -------------------------------------------- iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0 but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET/1); lock(slock-AF_INET/1); *** DEADLOCK *** May be due to missing lock nesting notation 10 locks held by iperf3/771: #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40 #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0 #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260 #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450 #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450 stack backtrace: CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x69/0xa0 dump_stack+0xc/0x20 __lock_acquire+0x135d/0x2600 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc4/0x2a0 ? l2tp_xmit_skb+0x243/0x9d0 ? __skb_checksum+0xa3/0x540 _raw_spin_lock_nested+0x35/0x50 ? l2tp_xmit_skb+0x243/0x9d0 l2tp_xmit_skb+0x243/0x9d0 l2tp_eth_dev_xmit+0x3c/0xc0 dev_hard_start_xmit+0x11e/0x420 sch_direct_xmit+0xc3/0x640 __dev_queue_xmit+0x61c/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 __tcp_send_ack+0x1b8/0x340 tcp_send_ack+0x23/0x30 __tcp_ack_snd_check+0xa8/0x530 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_rcv_established+0x412/0xd70 tcp_v4_do_rcv+0x299/0x420 tcp_v4_rcv+0x1991/0x1e10 ip_protocol_deliver_rcu+0x50/0x220 ip_local_deliver_finish+0x158/0x260 ip_local_deliver+0xc8/0xe0 ip_rcv+0xe5/0x1d0 ? __pfx_ip_rcv+0x10/0x10 __netif_receive_skb_one_core+0xce/0xe0 ? process_backlog+0x28b/0x9f0 __netif_receive_skb+0x34/0xd0 ? process_backlog+0x28b/0x9f0 process_backlog+0x2cb/0x9f0 __napi_poll.constprop.0+0x61/0x280 net_rx_action+0x332/0x670 ? srso_alias_return_thunk+0x5/0xfbef5 ? find_held_lock+0x2b/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 handle_softirqs+0xda/0x480 ? __dev_queue_xmit+0xa2c/0x1450 do_softirq+0xa1/0xd0 </IRQ> <TASK> __local_bh_enable_ip+0xc8/0xe0 ? __dev_queue_xmit+0xa2c/0x1450 __dev_queue_xmit+0xa48/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 tcp_write_xmit+0x766/0x2fb0 ? __entry_text_end+0x102ba9/0x102bad ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_fault+0x74/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 __tcp_push_pending_frames+0x56/0x190 tcp_push+0x117/0x310 tcp_sendmsg_locked+0x14c1/0x1740 tcp_sendmsg+0x28/0x40 inet_sendmsg+0x5d/0x90 sock_write_iter+0x242/0x2b0 vfs_write+0x68d/0x800 ? __pfx_sock_write_iter+0x10/0x10 ksys_write+0xc8/0xf0 __x64_sys_write+0x3d/0x50 x64_sys_call+0xfaf/0x1f50 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4d143af992 Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0 RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992 RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005 RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0 </TASK> Fixes: 0b2c597 ("l2tp: close all race conditions in l2tp_tunnel_register()") Suggested-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4 CC: [email protected] CC: [email protected] Signed-off-by: James Chapman <[email protected]> Signed-off-by: Tom Parkin <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Joshua-Riek
pushed a commit
that referenced
this pull request
Sep 16, 2024
Ethtool callbacks can be executed while reset is in progress and try to access deleted resources, e.g. getting coalesce settings can result in a NULL pointer dereference seen below. Reproduction steps: Once the driver is fully initialized, trigger reset: # echo 1 > /sys/class/net/<interface>/device/reset when reset is in progress try to get coalesce settings using ethtool: # ethtool -c <interface> BUG: kernel NULL pointer dereference, address: 0000000000000020 PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 11 PID: 19713 Comm: ethtool Tainted: G S 6.10.0-rc7+ #7 RIP: 0010:ice_get_q_coalesce+0x2e/0xa0 [ice] RSP: 0018:ffffbab1e9bcf6a8 EFLAGS: 00010206 RAX: 000000000000000c RBX: ffff94512305b028 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9451c3f2e588 RDI: ffff9451c3f2e588 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff9451c3f2e580 R11: 000000000000001f R12: ffff945121fa9000 R13: ffffbab1e9bcf760 R14: 0000000000000013 R15: ffffffff9e65dd40 FS: 00007faee5fbe740(0000) GS:ffff94546fd80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000106c2e005 CR4: 00000000001706f0 Call Trace: <TASK> ice_get_coalesce+0x17/0x30 [ice] coalesce_prepare_data+0x61/0x80 ethnl_default_doit+0xde/0x340 genl_family_rcv_msg_doit+0xf2/0x150 genl_rcv_msg+0x1b3/0x2c0 netlink_rcv_skb+0x5b/0x110 genl_rcv+0x28/0x40 netlink_unicast+0x19c/0x290 netlink_sendmsg+0x222/0x490 __sys_sendto+0x1df/0x1f0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x82/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7faee60d8e27 Calling netif_device_detach() before reset makes the net core not call the driver when ethtool command is issued, the attempt to execute an ethtool command during reset will result in the following message: netlink error: No such device instead of NULL pointer dereference. Once reset is done and ice_rebuild() is executing, the netif_device_attach() is called to allow for ethtool operations to occur again in a safe manner. Fixes: fcea6f3 ("ice: Add stats and ethtool support") Suggested-by: Jakub Kicinski <[email protected]> Reviewed-by: Igor Bagnucki <[email protected]> Signed-off-by: Dawid Osuchowski <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Reviewed-by: Michal Schmidt <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP, please do not merge