Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new API ListenForPolicyViolations to replace Policy #54

Merged
merged 2 commits into from
Jan 11, 2024

Conversation

dran-dev
Copy link
Contributor

@dran-dev dran-dev commented Jan 8, 2024

Summary of Changes

This pull request introduces enhancements to the Policy API in the go-dcgm bindings for NVIDIA Data Center GPU Manager (DCGM) library. The primary modifications include the deprecation of the existing Policy API and the introduction of the ListenForPolicyViolations API. The new API enables users to set policies for all GPUs collectively, eliminating the need to configure individual GPUs separately. Additionally, the ListenForPolicyViolations API allows users to register and monitor policy violations across all GPUs concurrently, addressing usability constraints and making the API more efficient.

  1. Deprecation of Policy API:

The existing Policy API has been deprecated due to usability limitations in managing policies for multiple GPUs individually.

  1. Introduction of ListenForPolicyViolations API:

The new API provides a more user-friendly interface for setting policies across all GPUs with a single call, streamlining the configuration process.
Policy callbacks can now be registered once during the program's lifetime, simplifying the integration of policy violation monitoring into applications.

Context and Rationale

The decision to deprecate the Policy API and introduce ListenForPolicyViolations stems from usability constraints and the recognition that monitoring policy violations for individual GPUs at a time may not be useful in most scenarios. The changes aim to improve the overall usability and efficiency of policy callback registration with the DCGM library.

Deprecation Notice

Developers are advised to migrate from the deprecated Policy API to the new ListenForPolicyViolations API for improved functionality and to ensure compatibility with future releases.

@nvvfedorov nvvfedorov self-requested a review January 8, 2024 20:25
pkg/dcgm/api.go Outdated Show resolved Hide resolved
pkg/dcgm/api.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@nvvfedorov nvvfedorov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/dcgm/policy_test.go Show resolved Hide resolved
@dran-dev dran-dev requested a review from nikkon-dev January 10, 2024 01:12
@dran-dev dran-dev force-pushed the main branch 2 times, most recently from 8587c10 to 84f6454 Compare January 10, 2024 02:17
This adds a new api ListenForPolicyViolations for setting policy
for all the gpus instead of individual gpus.

The primary modifications include the deprecation of the existing
Policy API and the introduction of the ListenForPolicyViolations API.
Policy API is deprecated due to usability constraints. Moreover
listening to policy violations for one gpu at a time is not very useful.
The new API enables users to set policies for all GPUs collectively,
eliminating the need to configure individual GPUs separately.
Additionally, ListenForPolicyViolations API allows users to register
and monitor policy violations across all GPUs concurrently. Policy
callbacks are required to be registered only once during the lifetime
of the program.

Signed-off-by: dran <[email protected]>
@nvvfedorov nvvfedorov merged commit 26fbf85 into NVIDIA:main Jan 11, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants