Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Health API improvements #70

Merged
merged 1 commit into from
Jul 24, 2024
Merged

Conversation

nvvfedorov
Copy link
Collaborator

The go-dcgm GPU health API modified to expose API access to the following DCGM API methods:

  • HealthSet - Enables the DCGM health check system for the specified systems;

  • HealthGet - Retrieves the current state of the DCGM health check system;

  • HealthCheck - Checks the configured watches for any errors, failures, or warnings.

glowkey
glowkey previously approved these changes Jun 24, 2024
Copy link
Collaborator

@glowkey glowkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@rohit-arora-dev rohit-arora-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good, just a few minor suggestions.

pkg/dcgm/gpu_group.go Outdated Show resolved Hide resolved
pkg/dcgm/health.go Outdated Show resolved Hide resolved
pkg/dcgm/health.go Outdated Show resolved Hide resolved
pkg/dcgm/health.go Outdated Show resolved Hide resolved
Signed-off-by: Vadym Fedorov <[email protected]>
@nvvfedorov nvvfedorov force-pushed the health-watch-api-improvements branch from 8d21804 to 784931e Compare July 2, 2024 21:35
@nvvfedorov nvvfedorov self-assigned this Jul 18, 2024
@nvvfedorov nvvfedorov merged commit f83cdef into main Jul 24, 2024
1 check passed
@nvvfedorov nvvfedorov deleted the health-watch-api-improvements branch July 24, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants