Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown segmentation fault throwed on the Intel driver #2027

Closed
Uniformbuffer3 opened this issue Oct 6, 2021 · 2 comments
Closed

Unknown segmentation fault throwed on the Intel driver #2027

Uniformbuffer3 opened this issue Oct 6, 2021 · 2 comments
Labels
api: vulkan Issues with Vulkan external: driver-bug A driver is causing the bug, though we may still want to work around it

Comments

@Uniformbuffer3
Copy link

Description
Hi, i'm running WGpu 0.9 in a project to write a Wayland compositor. I have designed various parts, including the windowing library and the engine. I have designed a task for my engine simply called screen_task that should fit the role of the compositor, from a graphic point of view. Unfortunately running it throws a segmentation fault error, that is actually unexpected since i use no unsafe code in any of my library.

Running the code into rust-lldb and catching the backtrace result into this:

* thread #1, name = 'rectangle_task', stop reason = signal SIGSEGV: invalid address (fault address: 0x38)
  * frame #0: 0x00007fffe7473e1d libvulkan_intel.so`___lldb_unnamed_symbol478$$libvulkan_intel.so + 45
    frame #1: 0x00007fffe74ac7a6 libvulkan_intel.so`___lldb_unnamed_symbol824$$libvulkan_intel.so + 1046
    frame #2: 0x00007fffe74c5b2d libvulkan_intel.so`___lldb_unnamed_symbol927$$libvulkan_intel.so + 141
    frame #3: 0x00007fffe74c9b49 libvulkan_intel.so`___lldb_unnamed_symbol941$$libvulkan_intel.so + 553
    frame #4: 0x00007fffe74cbb8b libvulkan_intel.so`___lldb_unnamed_symbol942$$libvulkan_intel.so + 139
    frame #5: 0x0000555555a22a23 rectangle_task`wgpu_core::command::render::_$LT$impl$u20$wgpu_core..hub..Global$LT$G$GT$$GT$::command_encoder_run_render_pass_impl::h24fa80f167a23915 (.llvm.5012922920467819889) + 5315
    frame #6: 0x00005555559c0041 rectangle_task`_$LT$wgpu..backend..direct..Context$u20$as$u20$wgpu..Context$GT$::command_encoder_end_render_pass::h9e7b4e3be3926c46 + 225
    frame #7: 0x0000555555791303 rectangle_task`wgpu_engine::common::resources::builders::CommandBuilder::build::h340d10d019ff1d86 + 755
    frame #8: 0x0000555555791863 rectangle_task`wgpu_engine::common::resources::builders::CommandBufferBuilder::build::heff13e1bb7c9c8e9 + 99
    frame #9: 0x000055555578b01c rectangle_task`wgpu_engine::common::resources::builders::ResourceBuilder::build::ha52a4bf1818a33a0 + 604
    frame #10: 0x00005555557875c7 rectangle_task`wgpu_engine::engine::resource_manager::ResourceManager::commit_resources::hff5445769bb63bea + 1047
    frame #11: 0x0000555555744426 rectangle_task`wgpu_engine::engine::task_processing::_$LT$impl$u20$wgpu_engine..engine..WGpuEngine$GT$::dispatch_tasks::h5145c575fc6ccfd8 + 246
    frame #12: 0x0000555555623840 rectangle_task`wgpu_engine::utils::quick_run::h265c3b110dd7d3cc + 1520
    frame #13: 0x00005555556867d1 rectangle_task`rectangle_task::main::h9b943e79d4953af4 + 113
    frame #14: 0x00005555556adee3 rectangle_task`std::sys_common::backtrace::__rust_begin_short_backtrace::h5f958e0f75848c85 + 3
    frame #15: 0x000055555561ec09 rectangle_task`std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h75a7efbeb0c40dbe (.llvm.8166823623572094016) + 9
    frame #16: 0x0000555555c8a1ca rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::hab43eb8ec758341e at function.rs:259:13
    frame #17: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::panicking::try::do_call::hb667fefd650e964d at panicking.rs:403
    frame #18: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::panicking::try::hc279775c9409768d at panicking.rs:367
    frame #19: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::panic::catch_unwind::hbe8b6ecc84f73073 at panic.rs:129
    frame #20: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::rt::lang_start_internal::_$u7b$$u7b$closure$u7d$$u7d$::hb04fdaef496a162b at rt.rs:45
    frame #21: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::panicking::try::do_call::habca359de051411d at panicking.rs:403
    frame #22: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::panicking::try::h24c556a6948e0937 at panicking.rs:367
    frame #23: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 [inlined] std::panic::catch_unwind::h58075d47f5a308d8 at panic.rs:129
    frame #24: 0x0000555555c8a1c3 rectangle_task`std::rt::lang_start_internal::h3d20dc1537d76758 at rt.rs:45
    frame #25: 0x0000555555686802 rectangle_task`main + 34
    frame #26: 0x00007ffff7bda565 libc.so.6`__libc_start_main + 213
    frame #27: 0x000055555560a14e rectangle_task`_start + 46

This error seems to affect only the Intel gpu, the Nvidia one is not affected (still do not work because it is a work in progress, but do not throw segmentation fault).

Repro steps
It is possible to run the code by cloning https://github.com/Uniformbuffer3/screen_task and run it with:
RUST_LOG=info cargo run --release --no-default-features --features wgpu_standard_backend

If you run it as debug, the crashing component will be a validation layer, while if runned as release it will return the above error.
The selected feature make the whole project to be configured using the standard WGpu 0.9, and not my fork.
Also when RUST_LOG is used at level info (at least), the engine will print some debug info that may be helpful, but i don't know how much since i still have to write the documentation for the whole project, and i'm very sorry for this. I have started to write it, so meanwhile, please ask for any doubt.

Expected vs observed behavior
The screen_task, runned as it is, have no surface to draw yet, so it should simply show a black window (or crash for something yet to be implemented), but i expect it to not throws segmentation fault.

Platform
OS: Pop!_OS 21.04 x86_64
Kernel: 5.13.0-7614-generic
CPU: Intel i5-4200M (4) @ 3.100GHz
GPU: Nvidia GTX850m (not running for this test)

Please ask if you require any additional data.
Thanks for listening, have a good day

@kvark kvark added external: driver-bug A driver is causing the bug, though we may still want to work around it help required We need community help to make this happen. type: bug Something isn't working labels Oct 6, 2021
@Uniformbuffer3
Copy link
Author

Uniformbuffer3 commented Oct 13, 2021

Hi, after some debugging i made the program working on the Nvidia gpu. So i have tried it again against the Intel driver and it started to work.
I have made various tests to understand what was going wrong and disabling the fragmentation shader in the render pipeline made it working (well, no, but it didn't crash at least).
My fragmentation shader have among the bindings an unsized texture array; Naga do not support it yet, so its validation fails. For this reason i had to disable Naga shader validations.
Among the fixes i have made, i have solved a situation where the binding group layout were declared with just a sampler as entry, while the fragmentation shader was requiring a sampler and a unsized texture array.
My guess is that this situation was the cause of the segmentation fault.
I suppose no one encountered this problem because shader validation checks are generally enabled, so an error would be throwed before actually running the commands on the gpu.

This also means that shader validations are working great!
I will try to develop a little separate program to make this error reproducible and if successfull i will report it here (and on the mesa git)

@teoxoy teoxoy added the api: vulkan Issues with Vulkan label Feb 24, 2023
@teoxoy
Copy link
Member

teoxoy commented Feb 24, 2023

Thanks for the update, it sounds like there isn't anything to do on our side, closing.

@teoxoy teoxoy closed this as not planned Won't fix, can't repro, duplicate, stale Feb 24, 2023
@teoxoy teoxoy removed type: bug Something isn't working help required We need community help to make this happen. labels Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan external: driver-bug A driver is causing the bug, though we may still want to work around it
Projects
None yet
Development

No branches or pull requests

3 participants