Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-ISSUE: Add amd gpu operator #7222

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

jhernand
Copy link
Contributor

This patch adds the AMD GPU operator as a dependency of the OpenShift AI operator.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Currently when the OpenShift AI operator is enabled the NVIDIA GPU is
enabled by default, even if there are no NVIDIA GPUs in the hosts. This
patch changes that so that the NVIDIA GPU operator will only be added
when there is at least one NVIDIA GPU present.

This is a preparation to add support for other GPU operators, in
particular the AMD GPU operator: we don't want to always enable all the
GPU operators.

Signed-off-by: Juan Hernandez <[email protected]>
@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 23, 2025
@openshift-ci openshift-ci bot requested review from danmanor and rccrdpccl January 23, 2025 17:22
@openshift-ci openshift-ci bot added the api-review Categorizes an issue or PR as actively needing an API review. label Jan 23, 2025
Copy link

openshift-ci bot commented Jan 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhernand

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2025
@jhernand jhernand force-pushed the add_amd_gpu_operator branch from 9cf6504 to dade96c Compare January 23, 2025 17:22
@jhernand jhernand changed the title Add amd gpu operator NO-ISSUE: Add amd gpu operator Jan 23, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 23, 2025
@openshift-ci-robot
Copy link

@jhernand: This pull request explicitly references no jira issue.

In response to this:

This patch adds the AMD GPU operator as a dependency of the OpenShift AI operator.

List all the issues related to this PR

  • New Feature
  • Enhancement
  • Bug fix
  • Tests
  • Documentation
  • CI/CD

What environments does this code impact?

  • Automation (CI, tools, etc)
  • Cloud
  • Operator Managed Deployments
  • None

How was this code tested?

  • assisted-test-infra environment
  • dev-scripts environment
  • Reviewer's test appreciated
  • Waiting for CI to do a full test run
  • Manual (Elaborate on how it was tested)
  • No tests needed

Checklist

  • Title and description added to both, commit and PR.
  • Relevant issues have been associated (see CONTRIBUTING guide)
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jhernand
Copy link
Contributor Author

/hold

Needs to be rebased after #7218 is merged.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2025
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 43.41317% with 189 lines in your changes missing coverage. Please review.

Project coverage is 67.73%. Comparing base (9df3014) to head (9bd831b).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
internal/operators/kmm/kmm_operator.go 9.52% 74 Missing and 2 partials ⚠️
internal/operators/amdgpu/amd_gpu_operator.go 32.25% 59 Missing and 4 partials ⚠️
internal/operators/kmm/kmm_manifests.go 55.81% 13 Missing and 6 partials ⚠️
internal/operators/amdgpu/amd_gpu_manifests.go 69.76% 7 Missing and 6 partials ⚠️
internal/featuresupport/features_olm_operators.go 69.23% 8 Missing and 4 partials ⚠️
internal/operators/amdgpu/amd_gpu_templates.go 60.00% 1 Missing and 1 partial ⚠️
internal/operators/kmm/kmm_templates.go 60.00% 1 Missing and 1 partial ⚠️
...nal/operators/openshiftai/openshift_ai_operator.go 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7222      +/-   ##
==========================================
- Coverage   67.89%   67.73%   -0.17%     
==========================================
  Files         298      304       +6     
  Lines       40666    41036     +370     
==========================================
+ Hits        27609    27794     +185     
- Misses      10587    10739     +152     
- Partials     2470     2503      +33     
Files with missing lines Coverage Δ
internal/cluster/statemachine.go 99.65% <100.00%> (+<0.01%) ⬆️
internal/cluster/validation_id.go 92.30% <100.00%> (ø)
internal/featuresupport/feature_support_level.go 96.49% <ø> (ø)
internal/host/statemachine.go 100.00% <100.00%> (ø)
internal/host/validation_id.go 90.90% <100.00%> (ø)
internal/operators/builder.go 100.00% <100.00%> (ø)
...nternal/operators/nvidiagpu/nvidia_gpu_operator.go 32.58% <100.00%> (-0.75%) ⬇️
internal/operators/amdgpu/amd_gpu_templates.go 60.00% <60.00%> (ø)
internal/operators/kmm/kmm_templates.go 60.00% <60.00%> (ø)
...nal/operators/openshiftai/openshift_ai_operator.go 57.06% <81.81%> (+7.61%) ⬆️
... and 5 more

... and 6 files with indirect coverage changes

@jhernand jhernand force-pushed the add_amd_gpu_operator branch from dade96c to 48ab28f Compare January 24, 2025 11:30
This patch adds the AMD GPU operator as a dependency of the OpenShift AI
operator.

Signed-off-by: Juan Hernandez <[email protected]>
@jhernand jhernand force-pushed the add_amd_gpu_operator branch from 48ab28f to 9bd831b Compare January 24, 2025 11:31
Copy link

openshift-ci bot commented Jan 24, 2025

@jhernand: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/edge-subsystem-kubeapi-aws 9bd831b link true /test edge-subsystem-kubeapi-aws
ci/prow/edge-e2e-metal-assisted-mtv-4-17 9bd831b link true /test edge-e2e-metal-assisted-mtv-4-17
ci/prow/edge-e2e-ai-operator-disconnected-capi 9bd831b link false /test edge-e2e-ai-operator-disconnected-capi
ci/prow/edge-subsystem-aws 9bd831b link true /test edge-subsystem-aws
ci/prow/okd-scos-e2e-aws-ovn 9bd831b link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants