Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option in toolkit container to enable CDI in runtime #838
Add option in toolkit container to enable CDI in runtime #838
Changes from all commits
df73db7
f625242
e89be14
2b417c1
a7786d4
5ed25bb
d8cd543
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it again, we don't need any operator change with this in place. As you said in a prior comment:
So by default, if a user sets
cdi.enabled=true
in the operator, we enable CDI in the runtime (e.g. containerd) while allowing them to opt-out of this behavior by manually configuring theRUNTIME_ENABLE_CDI
environment variable in the toolkit.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So
cdi.default
becomes no-op?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This PR does not change the semantics of the
cdi.default
field. Whencdi.default=true
the "default"nvidia
runtime class will be configured in "cdi" mode. Meaning that any GPUs injected by thenvidia
runtime class will have been done via CDI.In the GPU Operator,
cdi.enabled=true
triggers the creation of additional NVIDIA runtime class, namednvidia-cdi
, and configures additional envvars in the toolkit / device-plugin so that CDI specs get generated (amongst other things). If users want to leverage CDI for device injection, they can use thenvidia-cdi
runtime class in their pod spec. If users want CDI to be used by default, then they would setcdi.default=true
.This PR makes it so that setting
cdi.enabled=true
in the operator also enables CDI in the runtime (e.g. containerd) without requiring any additional operator changes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the default behaviour is to enable CDI in the runtime if CDI is enabled. This can be opted out of if the user sets the envvar -- or opted into if the toolkit option is not set. The Operator API is a separate concern to this.