Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply the cuFile error work around to data_sink as well #15335

Merged

Conversation

vuule
Copy link
Contributor

@vuule vuule commented Mar 19, 2024

Description

Issue #14140

Follow-up on #15293

Moving the cudaFree(0) call to a function called both by file datasource and data_sink.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vuule vuule added bug Something isn't working non-breaking Non-breaking change labels Mar 19, 2024
@vuule vuule self-assigned this Mar 19, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 19, 2024
@vuule vuule changed the title move ctx init to a global to cover all cases Move the cuFile error work around to a global to cover all cases Mar 19, 2024
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CUDA API call in a global instance may be a CUDA violation.

cpp/src/io/utilities/file_io_utilities.cpp Outdated Show resolved Hide resolved
@vuule vuule marked this pull request as ready for review March 19, 2024 22:08
@vuule vuule requested a review from a team as a code owner March 19, 2024 22:08
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for doing this.

@vuule vuule changed the base branch from branch-24.04 to branch-24.06 March 21, 2024 18:46
@vuule vuule changed the title Move the cuFile error work around to a global to cover all cases Apply the cuFile error work around to data_sink as well Mar 21, 2024
// Workaround for https://github.com/rapidsai/cudf/issues/14140, where cuFileDriverOpen errors
// out if no CUDA calls have been made before it. This is a no-op if the CUDA context is already
// initialized
cudaFree(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we hard code 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we hard code 0?

This initializes the context without actually doing anything.
https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ga042655cbbf3408f01061652a075e094

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing. Maybe the CUDA doc is not accurate? If the input parameter is a pointer, we should pass in nullptr instead of 0.

Copy link
Contributor Author

@vuule vuule Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, cudaFree(0) is the idiomatic way to do this. nullptr would be a better option in general, but it looks like cudaFree(0) is recognized as the idiom. I haven't seen a single reference where cudaFree(nullptr) is used.
Edit: google search for "cudaFree(nullptr)": 121 hits vs cudaFree(0) 1220 hits 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the idiom that I am familiar with as well.

@vuule
Copy link
Contributor Author

vuule commented Apr 1, 2024

/merge

@rapids-bot rapids-bot bot merged commit 0a8807e into rapidsai:branch-24.06 Apr 1, 2024
73 checks passed
@vuule vuule deleted the bug-cuda-ctx-always-global-init branch April 1, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants