Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the test checking for cooperative kernels in conditional nodes. #9869

Closed

Conversation

galv
Copy link
Collaborator

@galv galv commented Jul 24, 2024

Now we conditionally xfail only when a cuda driver version less than 12.6 is installed. CUDA 12.6 fixes this issue. Before it, cooperative kernels could not be used within the body of a conditional node.

We also provide a better error message for users to know that the fix is to upgrade to CUDA 12.6.

@github-actions github-actions bot added core Changes to NeMo Core ASR labels Jul 24, 2024
@galv galv added Run CICD and removed Run CICD labels Jul 24, 2024
@galv galv requested a review from nithinraok July 25, 2024 06:34
Now we conditionally xfail only when a cuda driver version less than
12.6 is installed. CUDA 12.6 fixes this issue. Before it, cooperative
kernels could not be used within the body of a conditional node.

We also provide a better error message for users to know that the fix
is to upgrade to CUDA 12.6.

Signed-off-by: Daniel Galvez <[email protected]>
@galv galv force-pushed the galv/better-cooperative-kernel-error-message branch from 87ee9ed to ce7477f Compare July 25, 2024 06:36
@galv galv added Run CICD and removed Run CICD labels Jul 25, 2024
except RuntimeError as err:
if "CUDA error: invalid argument" in str(err):
raise RuntimeError(
"CUDA Graph capture failed. It is likely that you are calling a cooperative kernel in your RNN-T or TDT prediction network. Cooperative kernels are not allowed inside the bodies of CUDA Graph conditional nodes until CUDA 12.6. Please update to CUDA 12.6. File an issue if that still does not work."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only support CUDA 12.5 with published pytorch containers see here: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html.

Make sure to support running without cuda_graphs decoding by default and for later version we can make cuda_graphs on by default when containers support it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update

__CUDA_PYTHON_MINIMUM_VERSION_CUDA_GRAPH_CONDITIONAL_NODES_SUPPORTED__ = (12, 3) # 12030
the minimum version to 12.6

Copy link
Collaborator

@artbataev artbataev Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think @galv wants to preserve 12.3 as a requirement, but fail in rare cases when cooperative kernels are selected.

I would suggest in this case applying fallback behavior like this

if self.cuda_graphs_mode is self.CudaGraphsMode.FULL_GRAPH:

(also, the same code in tdt_loop_labels_computer.py)

if self.cuda_graphs_mode is self.CudaGraphsMode.FULL_GRAPH:
    try:
        self._full_graph_compile()
    except RuntimeError:
       # fallback to graphs without while loops
       self.cuda_graphs_mode = self.CudaGraphsMode.NO_WHILE_LOOPS
       self._partial_graphs_compile()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an interesting possibility.

One of the big challenges is that the error returned by torch is not very precise. It's just a RuntimeError corresponding to "invalid argument", or cudaErrorInvalidValue, which is not a precise enough error for us to tell that the problem specifically is that the code is using a cooperative kernel within a conditional node's body graph. And unfortunately we cannot check whether this is the case because conditional node API does not expose a way to get the body graph(s) of a conditional node, right now...

Anyway, I suppose if the error was not because of a cooperative kernel, but because of something else, then there is a good chance the error will get thrown by the partial graphs implementation. But it's still not a guarantee!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO sounds like it's worth a shot to be able to move forward here

@@ -630,7 +631,7 @@ def _partial_graphs_compile(self):
with (
torch.cuda.stream(stream_for_graph),
torch.inference_mode(),
torch.cuda.graph(
checked_graph(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need checked_graph for partial graphs (without conditional nodes)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point. No, we don't.

But I am considering doing your suggestion anyway for a fallback, in which case we wouldn't use checked_graph.

Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Aug 23, 2024
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants