Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the Heterogenous(different type of) Intel GPU cards in the same OCP cluster #216

Open
uMartinXu opened this issue Mar 4, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request gpu Intel GPU

Comments

@uMartinXu
Copy link
Contributor

uMartinXu commented Mar 4, 2024

Summary

Support the heterogeneous (different) Intel GPU cards in the same OCP cluster.

Detail

In the Scenario, When in the same cluster, different Intel GPU cards like Max-1100, Flex-140, and Flex-170 are provisioned. A mechanism should be provided for the users to pick up the proper GPU card they want to run the workloads on.
To align with the taints/tolerance mechanism from Red Hat OpenShift AI accelerator Profile, We will use the same taints/tolerance mechanism for this feature.

To properly label(taint) the nodes in the cluster automatically, we will rely on the NFD node tainting feature.

So this feature rely on issue openshift/cluster-nfd-operator#356

Note

The feature is for the heterogeneous (different) Intel GPU cards in the same OCP cluster.
The different Intel dGPU cards in the same node are not supported.

@uMartinXu uMartinXu changed the title upport the Heterogenous(different type of) Intel dGPU product Support the Heterogenous(different type of) Intel dGPU products in the same cluster Mar 4, 2024
@uMartinXu uMartinXu changed the title Support the Heterogenous(different type of) Intel dGPU products in the same cluster Support the Heterogenous(different type of) Intel GPU cards in the same OCP cluster Mar 4, 2024
@uMartinXu uMartinXu added enhancement New feature or request gpu Intel GPU labels Mar 4, 2024
@mythi
Copy link

mythi commented Mar 5, 2024

/cc @tkatila

@brgavino
Copy link

brgavino commented Mar 8, 2024

The different Intel dGPU cards in the same node are not supported.

This is only because of GAS support, won't rely on NFD labelling taints/tolerations, correct?

How does this align with future resource requests via DRA? It does seem divergent at first glance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gpu Intel GPU
Projects
None yet
Development

No branches or pull requests

4 participants