`ComputePipeline` is never freed #4073

eadwu · 2023-08-17T22:00:23Z

Description
Memory leak from ComputePipeline never freeing memory.

CommandEncoder.begin_compute_pass does not fix this.
queue.submit() does not fix this either.
device.poll(wgpu::Maintain::Wait) either

Whether or not I use it or not, memory is never freed until program exits.

dhat leads to be believe that it looks like there is some internal storage (lots of Vec? Unnecessary caching or something along lines may be unintentionally extending its lifetime.

Repro steps
Essentially

for i in 1..100000000 {
    let pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
        label: None,
        layout: None,
        module: &compiled_shader,
        entry_point: "main",
    });
}

Expected vs observed behavior
Memory climbs up instead of remaining stable.

Extra materials
Screenshots to help explain your problem.
Validation logs can be attached in case there are warnings and errors.
Zip-compressed API traces and GPU captures can also land here.

Platform
Information about your OS, version of wgpu, your tech stack, etc.
wgpu: 0.17.0

The text was updated successfully, but these errors were encountered:

eadwu · 2023-08-17T22:04:12Z

#[tokio::main]
async fn main() {
    // Instantiates instance of WebGPU
    let instance = wgpu::Instance::default();

    // `request_adapter` instantiates the general connection to the GPU
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::HighPerformance,
            force_fallback_adapter: false,
            ..wgpu::RequestAdapterOptions::default()
        })
        .await.unwrap();

    // `request_device` instantiates the feature specific connection to the GPU, defining some parameters,
    //  `features` being the available features.
    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                label: None,
                features: wgpu::Features::empty(),
                limits: wgpu::Limits::downlevel_defaults(),
            },
            None,
        )
        .await
        .unwrap();

    let compiled_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: None,
        source: wgpu::ShaderSource::Wgsl(Cow::Borrowed("@compute @workgroup_size(1, 1, 1) fn main() {}")),
    });

    for i in 1..100000 {
        let pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
            label: None,
            layout: None,
            module: &compiled_shader,
            entry_point: "main",
        });
    }
}

eadwu · 2023-08-17T22:04:42Z

Does not appear with the iGPU, this is a problem with using external dGPUs (in this case an TB3 NVIDIA GPU) and the fallback (llmvpipe).

Without loop: 200kb
With LowPower: 500kb
With HighPerformance: 3G
With force_adapter_fallback: 3G

cwfitzgerald · 2023-08-22T01:53:33Z

This is expected behavior. We do not clear any resources that are dropped until the device is maintained by either a call to submit or a call to device.poll. Because all of these resources need to wait for the GPU to be finished using them, and creating resources just to never use them is not a use case we really expect to happen, so we don't eagerly destroy resources that aren't used by the GPU.

eadwu · 2023-08-22T20:07:45Z

This is not regarding GPU memory but Host memory. I have already tried using device.poll but here's the example w/ it (I let it balloon to >10G before manually killing):

        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11599556
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0

#[tokio::main]
async fn main() {
    // Instantiates instance of WebGPU
    let instance = wgpu::Instance::default();

    // `request_adapter` instantiates the general connection to the GPU
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::HighPerformance,
            force_fallback_adapter: false,
            ..wgpu::RequestAdapterOptions::default()
        })
        .await.unwrap();

    // `request_device` instantiates the feature specific connection to the GPU, defining some parameters,
    //  `features` being the available features.
    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                label: None,
                features: wgpu::Features::empty(),
                limits: wgpu::Limits::downlevel_defaults(),
            },
            None,
        )
        .await
        .unwrap();

    use std::borrow::Cow;
    let compiled_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: None,
        source: wgpu::ShaderSource::Wgsl(Cow::Borrowed("@compute @workgroup_size(1, 1, 1) fn main() {}")),
    });

    for i in 1..900000 {
        let pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
            label: None,
            layout: None,
            module: &compiled_shader,
            entry_point: "main",
        });

        device.poll(wgpu::Maintain::Wait);
    }
}

eadwu · 2023-08-22T20:09:45Z

In contrast, LowPower gives

Maximum resident set size (kbytes): 2824168

while the fallback gives

        Maximum resident set size (kbytes): 3323276

Either way this is a lot amount of memory being used to maintain pipelines (whether used or unused).

Ideally, it would not be taking more memory, changing the loop to initialize a Vector instead:

    for i in 1..4 {
        let x = (1..100000000).map(|x| x as u64).collect::<Vec<_>>();
        println!("{}", std::mem::size_of_val(&*x) / 1024);
    }

Gives

781249
781249
781249
...
Maximum resident set size (kbytes): 875220

Which is a lot more reasonable.

cwfitzgerald · 2023-08-22T20:24:51Z

Alright, if this is still leaking with a poll, this is definitely a bug.

teoxoy · 2024-07-17T09:28:43Z

I think this is a duplicate of #5029.

cwfitzgerald closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2023

cwfitzgerald reopened this Aug 22, 2023

Wumpf added the type: bug Something isn't working label Sep 5, 2023

teoxoy added this to WebGPU for Firefox Jul 17, 2024

github-project-automation bot moved this to Todo in WebGPU for Firefox Jul 17, 2024

teoxoy mentioned this issue Jul 17, 2024

[wgpu-core] make implicit_pipeline_ids arg optional for users that don't provide IDs #5971

Merged

teoxoy moved this from Todo to In Progress in WebGPU for Firefox Jul 17, 2024

teoxoy self-assigned this Jul 17, 2024

teoxoy closed this as completed in #5971 Jul 17, 2024

github-project-automation bot moved this from In Progress to Done in WebGPU for Firefox Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ComputePipeline` is never freed #4073

`ComputePipeline` is never freed #4073

eadwu commented Aug 17, 2023

eadwu commented Aug 17, 2023

eadwu commented Aug 17, 2023 •

edited

Loading

cwfitzgerald commented Aug 22, 2023

eadwu commented Aug 22, 2023 •

edited

Loading

eadwu commented Aug 22, 2023 •

edited

Loading

cwfitzgerald commented Aug 22, 2023 •

edited

Loading

teoxoy commented Jul 17, 2024

ComputePipeline is never freed #4073

ComputePipeline is never freed #4073

Comments

eadwu commented Aug 17, 2023

eadwu commented Aug 17, 2023

eadwu commented Aug 17, 2023 • edited Loading

cwfitzgerald commented Aug 22, 2023

eadwu commented Aug 22, 2023 • edited Loading

eadwu commented Aug 22, 2023 • edited Loading

cwfitzgerald commented Aug 22, 2023 • edited Loading

teoxoy commented Jul 17, 2024

`ComputePipeline` is never freed #4073

`ComputePipeline` is never freed #4073

eadwu commented Aug 17, 2023 •

edited

Loading

eadwu commented Aug 22, 2023 •

edited

Loading

eadwu commented Aug 22, 2023 •

edited

Loading

cwfitzgerald commented Aug 22, 2023 •

edited

Loading