Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Use Retry Framework in GpuCoalesceBatches for RetryOOM only #7777

Closed
abellina opened this issue Feb 15, 2023 · 1 comment · Fixed by #7852
Closed

[FEA] Use Retry Framework in GpuCoalesceBatches for RetryOOM only #7777

abellina opened this issue Feb 15, 2023 · 1 comment · Fixed by #7852
Assignees
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@abellina
Copy link
Collaborator

We would like to use the retry framework on GpuCoalesceBatches.

The scope for this is simple, if we are trying a coalesce and we would fail with RetryOOM today, we are now looking to retry it. For goals that are not splittable we will only retry right now, and not attempt to split. For the key batching goal we are also going to just retry without splitting.

I believe figuring out how to split for a TargetSize goal should be fairly simple and should be doable in this issue.

@abellina abellina added feature request New feature or request ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Feb 15, 2023
@abellina abellina changed the title [FEA] Use Retry Framework in GpuCoalesceBatches [FEA] Use Retry Framework in GpuCoalesceBatches for RetryOOM only Feb 15, 2023
@abellina
Copy link
Collaborator Author

This is the follow on issue: #7778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants