Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(GPUManager): check nvidia container toolkit capabilities #1825

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

axel7083
Copy link
Contributor

@axel7083 axel7083 commented Oct 1, 2024

What does this PR do?

Add some utility function to the GPUManager to check if nvidia-ctk is installed on the machine or native host (linux) for nvidia cards.

ℹ️ Nothing is exposed to the user currently.

Screenshot / video of UI

N/A

What issues does this PR fix or reference?

Fixes #1824
Requied for #1591
Part of #1708

How to test this PR?

  • unit tests has been provided

Manually

You can check manually by adding the following code to the studio.ts

    setTimeout(() => {
      this.#gpuManager?.collectGPUs().then((gpus) => {
        console.log('collectGPUs', gpus);

        const connections = this.#podmanConnection?.getContainerProviderConnections() ?? [];
        Promise.all(
          connections.filter(connection => connection.status() === 'started').map((connection) => (this.#gpuManager?.getGPUContainerDeviceInterface(connection)))
        ).then((result) => {
          console.log('promise all get GPU CDI', result);
        })
          .catch((err: unknown) => {
          console.error('promise all get GPU CDI', err);
        });
      }).catch((err: unknown) => {
        console.error('collectGPUs', err);
      });
    }, 10000);

@axel7083 axel7083 requested review from benoitf, jeffmaury and a team as code owners October 1, 2024 10:35
@axel7083 axel7083 requested review from cdrage and gastoner October 1, 2024 10:35
@gastoner
Copy link
Contributor

gastoner commented Oct 1, 2024

I dont have a NVIDIA GPU 😢 to test this

Copy link
Contributor

@jeffmaury jeffmaury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux code seems to be based on local podman. Is there a way to check ? What is user is using Podman machine ?


// check nvidia-ctk version
const { version } = await this.getNvidiaContainerToolKitVersion(connection);
console.log('nvidia-ctk version', version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in a follow-up PR I want to use telemetry, but I want to be a separate issue as discussion will probably happen :)

@axel7083
Copy link
Contributor Author

axel7083 commented Oct 2, 2024

Linux code seems to be based on local podman. Is there a way to check ? What is user is using Podman machine ?

@jeffmaury if the vmType is undefined on Linux, it is a native connection.

@axel7083
Copy link
Contributor Author

axel7083 commented Oct 7, 2024

Moving to draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPU Manager should detect the nvidia-ctk
3 participants