Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComputePipeline is never freed #4073

Closed
eadwu opened this issue Aug 17, 2023 · 7 comments · Fixed by #5971
Closed

ComputePipeline is never freed #4073

eadwu opened this issue Aug 17, 2023 · 7 comments · Fixed by #5971
Assignees
Labels
type: bug Something isn't working

Comments

@eadwu
Copy link

eadwu commented Aug 17, 2023

Description
Memory leak from ComputePipeline never freeing memory.

CommandEncoder.begin_compute_pass does not fix this.
queue.submit() does not fix this either.
device.poll(wgpu::Maintain::Wait) either

Whether or not I use it or not, memory is never freed until program exits.

dhat leads to be believe that it looks like there is some internal storage (lots of Vec? Unnecessary caching or something along lines may be unintentionally extending its lifetime.

Repro steps
Essentially

for i in 1..100000000 {
    let pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
        label: None,
        layout: None,
        module: &compiled_shader,
        entry_point: "main",
    });
}

Expected vs observed behavior
Memory climbs up instead of remaining stable.

Extra materials
Screenshots to help explain your problem.
Validation logs can be attached in case there are warnings and errors.
Zip-compressed API traces and GPU captures can also land here.

Platform
Information about your OS, version of wgpu, your tech stack, etc.
wgpu: 0.17.0

@eadwu
Copy link
Author

eadwu commented Aug 17, 2023

#[tokio::main]
async fn main() {
    // Instantiates instance of WebGPU
    let instance = wgpu::Instance::default();

    // `request_adapter` instantiates the general connection to the GPU
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::HighPerformance,
            force_fallback_adapter: false,
            ..wgpu::RequestAdapterOptions::default()
        })
        .await.unwrap();

    // `request_device` instantiates the feature specific connection to the GPU, defining some parameters,
    //  `features` being the available features.
    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                label: None,
                features: wgpu::Features::empty(),
                limits: wgpu::Limits::downlevel_defaults(),
            },
            None,
        )
        .await
        .unwrap();

    let compiled_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: None,
        source: wgpu::ShaderSource::Wgsl(Cow::Borrowed("@compute @workgroup_size(1, 1, 1) fn main() {}")),
    });

    for i in 1..100000 {
        let pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
            label: None,
            layout: None,
            module: &compiled_shader,
            entry_point: "main",
        });
    }
}

@eadwu
Copy link
Author

eadwu commented Aug 17, 2023

Does not appear with the iGPU, this is a problem with using external dGPUs (in this case an TB3 NVIDIA GPU) and the fallback (llmvpipe).

Without loop: 200kb
With LowPower: 500kb
With HighPerformance: 3G
With force_adapter_fallback: 3G

@cwfitzgerald
Copy link
Member

This is expected behavior. We do not clear any resources that are dropped until the device is maintained by either a call to submit or a call to device.poll. Because all of these resources need to wait for the GPU to be finished using them, and creating resources just to never use them is not a use case we really expect to happen, so we don't eagerly destroy resources that aren't used by the GPU.

@cwfitzgerald cwfitzgerald closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2023
@eadwu
Copy link
Author

eadwu commented Aug 22, 2023

This is not regarding GPU memory but Host memory. I have already tried using device.poll but here's the example w/ it (I let it balloon to >10G before manually killing):

        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11599556
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
#[tokio::main]
async fn main() {
    // Instantiates instance of WebGPU
    let instance = wgpu::Instance::default();

    // `request_adapter` instantiates the general connection to the GPU
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: wgpu::PowerPreference::HighPerformance,
            force_fallback_adapter: false,
            ..wgpu::RequestAdapterOptions::default()
        })
        .await.unwrap();

    // `request_device` instantiates the feature specific connection to the GPU, defining some parameters,
    //  `features` being the available features.
    let (device, queue) = adapter
        .request_device(
            &wgpu::DeviceDescriptor {
                label: None,
                features: wgpu::Features::empty(),
                limits: wgpu::Limits::downlevel_defaults(),
            },
            None,
        )
        .await
        .unwrap();

    use std::borrow::Cow;
    let compiled_shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
        label: None,
        source: wgpu::ShaderSource::Wgsl(Cow::Borrowed("@compute @workgroup_size(1, 1, 1) fn main() {}")),
    });

    for i in 1..900000 {
        let pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
            label: None,
            layout: None,
            module: &compiled_shader,
            entry_point: "main",
        });

        device.poll(wgpu::Maintain::Wait);
    }
}

@eadwu
Copy link
Author

eadwu commented Aug 22, 2023

In contrast, LowPower gives

Maximum resident set size (kbytes): 2824168

while the fallback gives

        Maximum resident set size (kbytes): 3323276

Either way this is a lot amount of memory being used to maintain pipelines (whether used or unused).

Ideally, it would not be taking more memory, changing the loop to initialize a Vector instead:

    for i in 1..4 {
        let x = (1..100000000).map(|x| x as u64).collect::<Vec<_>>();
        println!("{}", std::mem::size_of_val(&*x) / 1024);
    }

Gives

781249
781249
781249
...
Maximum resident set size (kbytes): 875220

Which is a lot more reasonable.

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Aug 22, 2023

Alright, if this is still leaking with a poll, this is definitely a bug.

@cwfitzgerald cwfitzgerald reopened this Aug 22, 2023
@Wumpf Wumpf added the type: bug Something isn't working label Sep 5, 2023
@teoxoy
Copy link
Member

teoxoy commented Jul 17, 2024

I think this is a duplicate of #5029.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants