Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm: Add Nvidia GPU support for ChatQnA #225

Merged

Conversation

PeterYang12
Copy link
Collaborator

@PeterYang12 PeterYang12 commented Jul 25, 2024

Description

Add Nvidia GPU support for ChatQnA in helm-charts.

Issues

N/A

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

N/A

Tests

Describe the tests that you ran to verify your changes.

@daisy-ycguo daisy-ycguo requested a review from lianhao July 25, 2024 02:02
@daisy-ycguo
Copy link
Collaborator

daisy-ycguo commented Jul 25, 2024

I don't think we have test bed for Nvidia in CI infrastructure. So it is ok that no tests cover this feature.
Please add tests after we get test bed.

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also adjust the manifest/update_manifests.sh to generate the common manifests for NV too, and include those generated manifests in this PR

@PeterYang12 PeterYang12 force-pushed the chatqa-gpu-support-helm-charts branch from 9c92d5a to 8b04a6d Compare July 25, 2024 05:19
@PeterYang12
Copy link
Collaborator Author

Please also adjust the manifest/update_manifests.sh to generate the common manifests for NV too, and include those generated manifests in this PR

Done.

@PeterYang12
Copy link
Collaborator Author

I don't think we have test bed for Nvidia in CI infrastructure. So it is ok that no tests cover this feature. Please add tests after we get test bed.

Got it. Thank you.

@PeterYang12 PeterYang12 force-pushed the chatqa-gpu-support-helm-charts branch from 8b04a6d to 542d94d Compare July 25, 2024 05:50
helm-charts/chatqna/nv-values.yaml Outdated Show resolved Hide resolved
@PeterYang12 PeterYang12 force-pushed the chatqa-gpu-support-helm-charts branch from 542d94d to 00c2743 Compare July 25, 2024 06:03
Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though there is an CI error for ChatQnA, but we're 100% sure this error is not related to this PR. We'll tackle this kind of ephemeral failure in other places. We've seen this kind of ephemeral failures several times, especially when there is quite some tests from different repo on the same xeon node.

@zhlsunshine
Copy link
Collaborator

How about to re-trigger the failed test task? It should be okay if there is no any nv test resource in CI.

@PeterYang12
Copy link
Collaborator Author

How about to re-trigger the failed test task? It should be okay if there is no any nv test resource in CI.

Do you know how to re-trigger the tests? Should I update My PR?

@KfreeZ
Copy link
Collaborator

KfreeZ commented Jul 29, 2024

How about to re-trigger the failed test task? It should be okay if there is no any nv test resource in CI.

Do you know how to re-trigger the tests? Should I update My PR?

you can re-trigger it by re-run the job, click the details and there is a "re-run jobs" button

@PeterYang12
Copy link
Collaborator Author

How about to re-trigger the failed test task? It should be okay if there is no any nv test resource in CI.

Do you know how to re-trigger the tests? Should I update My PR?

you can re-trigger it by re-run the job, click the details and there is a "re-run jobs" button

I don't have the access to do this. @daisy-ycguo Can you help me?

@KfreeZ
Copy link
Collaborator

KfreeZ commented Jul 29, 2024

How about to re-trigger the failed test task? It should be okay if there is no any nv test resource in CI.

Do you know how to re-trigger the tests? Should I update My PR?

you can re-trigger it by re-run the job, click the details and there is a "re-run jobs" button

I don't have the access to do this. @daisy-ycguo Can you help me?

I've re-run it

@PeterYang12
Copy link
Collaborator Author

How about to re-trigger the failed test task? It should be okay if there is no any nv test resource in CI.

Do you know how to re-trigger the tests? Should I update My PR?

you can re-trigger it by re-run the job, click the details and there is a "re-run jobs" button

I don't have the access to do this. @daisy-ycguo Can you help me?

I've re-run it

TY.

@daisy-ycguo
Copy link
Collaborator

daisy-ycguo commented Jul 30, 2024

Our CI infrastructure doesn't support Nvidia test. nv-values is not covered by CI tests and will not cause any CI test failures. I'm fine to merge this PR if @PeterYang12 has tested OK locally.

@PeterYang12 please rebase this PR, then we can merge this one.

1. Add Helm-charts support
2. Add manifests support

Signed-off-by: PeterYang12 <[email protected]>
@daisy-ycguo daisy-ycguo force-pushed the chatqa-gpu-support-helm-charts branch from 35be7a1 to 71daaf1 Compare July 30, 2024 04:03
Copy link
Collaborator

@daisy-ycguo daisy-ycguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@daisy-ycguo daisy-ycguo merged commit 868103b into opea-project:main Jul 30, 2024
15 checks passed
lianhao added a commit to lianhao/GenAIInfra that referenced this pull request Aug 7, 2024
Automatically popluate the default endpoint into no_proxy if proxy is
set.

If users specify the endpoints with their own value, it won't be
populated into no_proxy, because we can't assume the user specified
endpoints should be accessed through proxy.

Fix issue opea-project#225.

Signed-off-by: Lianhao Lu <[email protected]>
lianhao added a commit to lianhao/GenAIInfra that referenced this pull request Aug 7, 2024
Automatically popluate the default endpoint into no_proxy if proxy is
set.

If users specify the endpoints with their own value, it won't be
populated into no_proxy, because we can't assume the user specified
endpoints should be accessed through proxy.

Fix issue opea-project#225.

Signed-off-by: Lianhao Lu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants