Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on wgpu::Adapter::request_device but *only* in debug mode (release mode is okay) #2996

Closed
petersn opened this issue Aug 28, 2022 · 3 comments

Comments

@petersn
Copy link

petersn commented Aug 28, 2022

I'm getting an issue where wgpu applications crash in the debug build, but not in the release build. This holds true for bevy's breakout game example, and some other wgpu applications, but also this minimal example I give below.

Repro steps
Set up the following:

Cargo.toml:

[package]
name = "wgpu-segfault"
version = "0.1.0"
edition = "2021"
[dependencies]
pollster = "0.2.5"
wgpu = "0.13.1"
winit = "0.27.2"

src/main.rs:

async fn run() {
    let event_loop = winit::event_loop::EventLoop::new();
    let window = winit::window::WindowBuilder::new()
        .build(&event_loop)
        .unwrap();
    let size = window.inner_size();
    let instance = wgpu::Instance::new(wgpu::Backends::all());
    let surface = unsafe { instance.create_surface(&window) };
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::default(),
            force_fallback_adapter: false,
            // Request an adapter which can render to our surface
            compatible_surface: Some(&surface),
        })
        .await
        .expect("Failed to find an appropriate adapter");
    println!("Got adapter: {:?}", adapter.get_info());
    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                label: None,
                features: wgpu::Features::empty(),
                limits: wgpu::Limits::downlevel_webgl2_defaults()
                    .using_resolution(adapter.limits()),
            },
            None,
        )
        .await
        .expect("Failed to create device");
    println!("Got device: {:?}", device);
}

fn main() {
    pollster::block_on(run());
}

When I run cargo run I get:

    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/wgpu-segfault`
Got adapter: AdapterInfo { name: "NVIDIA GeForce RTX 3090", vendor: 4318, device: 8708, device_type: DiscreteGpu, backend: Vulkan }
Segmentation fault (core dumped)

When I run cargo run --release I get:

    Finished release [optimized] target(s) in 0.04s
     Running `target/release/wgpu-segfault`
Got adapter: AdapterInfo { name: "NVIDIA GeForce RTX 3090", vendor: 4318, device: 8708, device_type: DiscreteGpu, backend: Vulkan }
Got device: Device { context: Context { type: "Native" }, id: Device { id: (0, 1, Vulkan), error_sink: Mutex { data: ErrorSink }, features: (empty) } }

Expected vs observed behavior
I do not expect the debug build to segfault on adapter.request_device.

gdb logs
Here's more info on the segfault:

$ gdb ./target/debug/wgpu-segfault 
GNU gdb (Ubuntu 12.0.90-0ubuntu1) 12.0.90
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./target/debug/wgpu-segfault...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/snp/tmp/wgpu-segfault/target/debug/wgpu-segfault.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) run
Starting program: /home/snp/tmp/wgpu-segfault/target/debug/wgpu-segfault 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Got adapter: AdapterInfo { name: "NVIDIA GeForce RTX 3090", vendor: 4318, device: 8708, device_type: DiscreteGpu, backend: Vulkan }
[New Thread 0x7fffe5dff640 (LWP 999683)]
[New Thread 0x7fffe55fe640 (LWP 999684)]
[New Thread 0x7fffe4bfd640 (LWP 999685)]

Thread 1 "wgpu-segfault" received signal SIGSEGV, Segmentation fault.
0x00007fffdf1ac73f in ?? () from /lib/librenderdoc.so
(gdb) bt
#0  0x00007fffdf1ac73f in ?? () from /lib/librenderdoc.so
#1  0x00007fffdeb3ec3c in ?? () from /lib/librenderdoc.so
#2  0x00007fffdeb45682 in ?? () from /lib/librenderdoc.so
#3  0x00007ffff7fc947e in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffdfb8, env=env@entry=0x7fffffffdfc8) at ./elf/dl-init.c:70
#4  0x00007ffff7fc9568 in call_init (env=0x7fffffffdfc8, argv=0x7fffffffdfb8, argc=1, l=<optimized out>) at ./elf/dl-init.c:33
#5  _dl_init (main_map=0x5555568acd60, argc=1, argv=0x7fffffffdfb8, env=0x7fffffffdfc8) at ./elf/dl-init.c:117
#6  0x00007ffff7c9ec85 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:182
#7  0x00007ffff7fd0ff6 in dl_open_worker (a=0x7ffffffd0560) at ./elf/dl-open.c:808
#8  dl_open_worker (a=a@entry=0x7ffffffd0560) at ./elf/dl-open.c:771
#9  0x00007ffff7c9ec28 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#10 0x00007ffff7fd134e in _dl_open (file=<optimized out>, mode=-2147483647, caller_dlopen=0x5555558b5fcc <libloading::os::unix::{impl#2}::open::{closure#1}<&str>+236>, nsid=-2, argc=1, argv=<optimized out>, env=0x7fffffffdfc8) at ./elf/dl-open.c:883
#11 0x00007ffff7bba6bc in dlopen_doit (a=a@entry=0x7ffffffd07d0) at ./dlfcn/dlopen.c:56
#12 0x00007ffff7c9ec28 in __GI__dl_catch_exception (exception=exception@entry=0x7ffffffd0730, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:208
#13 0x00007ffff7c9ecf3 in __GI__dl_catch_error (objname=0x7ffffffd0788, errstring=0x7ffffffd0790, mallocedp=0x7ffffffd0787, operate=<optimized out>, args=<optimized out>) at ./elf/dl-error-skeleton.c:227
#14 0x00007ffff7bba1ae in _dlerror_run (operate=operate@entry=0x7ffff7bba660 <dlopen_doit>, args=args@entry=0x7ffffffd07d0) at ./dlfcn/dlerror.c:138
#15 0x00007ffff7bba748 in dlopen_implementation (dl_caller=<optimized out>, mode=<optimized out>, file=<optimized out>) at ./dlfcn/dlopen.c:71
#16 ___dlopen (file=<optimized out>, mode=<optimized out>) at ./dlfcn/dlopen.c:81
#17 0x00005555558b5fcc in libloading::os::unix::{impl#2}::open::{closure#1}<&str> () at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/libloading-0.7.3/src/os/unix/mod.rs:173
#18 0x00005555558b5981 in libloading::os::unix::with_dlerror<libloading::os::unix::Library, libloading::os::unix::{impl#2}::open::{closure_env#1}<&str>> (wrap=0x5555558cfbe0 <core::ops::function::FnOnce::call_once<libloading::os::unix::{impl#2}::open::{closure_env#0}<&str>, (libloading::error::DlDescription)>>, closure=<error reading variable: Cannot access memory at address 0xf>)
    at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/libloading-0.7.3/src/os/unix/mod.rs:54
#19 0x00005555558b5e85 in libloading::os::unix::Library::open<&str> (filename=..., flags=1) at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/libloading-0.7.3/src/os/unix/mod.rs:172
#20 0x00005555558b5c24 in libloading::os::unix::Library::new<&str> (filename=...) at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/libloading-0.7.3/src/os/unix/mod.rs:121
#21 0x00005555558d1d53 in libloading::safe::Library::new<&str> (filename=...) at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/libloading-0.7.3/src/safe.rs:84
#22 0x0000555555877848 in wgpu_hal::auxil::renderdoc::RenderDoc::new () at src/auxil/renderdoc.rs:42
#23 0x0000555555877f1d in wgpu_hal::auxil::renderdoc::{impl#3}::default () at src/auxil/renderdoc.rs:90
#24 0x0000555555824dd7 in wgpu_hal::vulkan::Adapter::device_from_raw (self=0x5555568a7278, raw_device=..., handle_is_owned=true, enabled_extensions=..., features=..., uab_types=..., family_index=0, queue_index=0) at src/vulkan/adapter.rs:1396
#25 0x0000555555825575 in wgpu_hal::vulkan::adapter::{impl#9}::open (self=0x5555568a7278, features=..., limits=0x7fffffffca88) at src/vulkan/adapter.rs:1442
#26 0x0000555555778645 in wgpu_core::instance::Adapter<wgpu_hal::vulkan::Api>::create_device<wgpu_hal::vulkan::Api> (self=0x5555568a7278, self_id=..., desc=0x7fffffffca60, trace_path=...) at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/wgpu-core-0.13.2/src/instance.rs:335
#27 0x00005555557288d0 in wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>::adapter_request_device<wgpu_core::hub::IdentityManagerFactory, wgpu_hal::vulkan::Api> (self=0x5555566cd530, adapter_id=..., desc=0x7fffffffca60, trace_path=..., id_in=...) at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/wgpu-core-0.13.2/src/instance.rs:958
#28 0x00005555556c1a6b in wgpu::backend::direct::{impl#3}::adapter_request_device (self=0x5555566cd530, adapter=0x7fffffffdb88, desc=0x7fffffffdb98, trace_dir=...) at src/backend/direct.rs:891
#29 0x00005555556f2ca1 in wgpu::Adapter::request_device (self=0x7fffffffdb80, desc=0x7fffffffdb98, trace_path=...) at src/lib.rs:1871
#30 0x000055555566196c in wgpu_segfault::run::{async_fn#0} () at src/main.rs:19
#31 0x000055555566ddec in core::future::from_generator::{impl#1}::poll<wgpu_segfault::run::{async_fn_env#0}> (self=..., cx=0x7fffffffd848) at /rustc/878aef79dcdf59d19bb8482202dc55e58ceb62ff/library/core/src/future/mod.rs:91
#32 0x000055555564f20a in pollster::block_on<core::future::from_generator::GenFuture<wgpu_segfault::run::{async_fn_env#0}>> (fut=...) at /home/snp/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/pollster-0.2.5/src/lib.rs:125
#33 0x0000555555672567 in wgpu_segfault::main () at src/main.rs:35

Here are the versions of dynamic libs being linked in:

$ ldd target/debug/wgpu-segfault
	linux-vdso.so.1 (0x00007fffbe708000)
	libfreetype.so.6 => /lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007f1eda712000)
	libfontconfig.so.1 => /lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007f1eda6c8000)
	libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f1eda697000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f1eda5b0000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1eda590000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1eda366000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f1edb2a5000)
	libpng16.so.16 => /lib/x86_64-linux-gnu/libpng16.so.16 (0x00007f1eda32b000)
	libz.so.1 => /usr/local/lib/libz.so.1 (0x00007f1eda2fe000)
	libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f1eda2f0000)
	libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007f1eda2e7000)
	libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f1eda2c2000)

Platform
I'm on Ubuntu 22.04 LTS, with an RTX 3090, running the Nvidia 510.47.03 driver (with CUDA 11.6). Other GPU things work fine (CUDA runs fine, accelerated OpenGL works fine, etc.)

@petersn
Copy link
Author

petersn commented Aug 28, 2022

Okay, I'm playing more with this, this might not be wgpu's fault at all. I noticed that it's dlopening librenderdoc.so when this crash happens. I also note that:

$ python
>>> import ctypes
>>> d = ctypes.CDLL("librenderdoc.so")
Segmentation fault (core dumped)

so it seems like most likely the issue is with librenderdoc.so somehow. I'll investigate further and update, but I suspect most likely this issue should be closed, if I don't find any reason to believe that this is a wgpu issue.

@petersn
Copy link
Author

petersn commented Aug 28, 2022

Okay, I uninstalled librenderdoc, and now wgpu can't find it to dlopen it, and it seems to be fine. I'm going to close this issue, as this seems pretty dang dispositive that it's not wgpu's fault.

If folks are coming here in the future, getting a segfault from wgpu only in debug mode, but not in release mode, I'd encourage them to:

  1. Try: python -c '__import__("ctypes").CDLL("librenderdoc.so")'. If it segfaults then maybe they have this issue.
  2. Try uninstalling/moving librenderdoc.so out of the way. On Ubuntu: sudo apt-get remove librenderdoc

@petersn petersn closed this as not planned Won't fix, can't repro, duplicate, stale Aug 28, 2022
@cwfitzgerald
Copy link
Member

cwfitzgerald commented Aug 29, 2022

Oh this is #2793. It's fixed on master, just not on wgpu 0.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants