Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda.core v0.1.1 final doc touch #301

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cuda_core/docs/source/release/0.1.0-notes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# `cuda.core` Release notes
# `cuda.core` v0.1.0 Release notes

Released on Nov 8, 2024

Expand Down
17 changes: 10 additions & 7 deletions cuda_core/docs/source/release/0.1.1-notes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# `cuda.core` Release notes
# `cuda.core` v0.1.1 Release notes

Released on Dec XX, 2024

Expand All @@ -7,19 +7,22 @@ Released on Dec XX, 2024
- Add `StridedMemoryView` and `@args_viewable_as_strided_memory` that provide a concrete
implementation of DLPack & CUDA Array Interface supports.
- Add `Linker` that can link one or multiple `ObjectCode` instances generated by `Program`s. Under
the hood, it uses either the nvJitLink or cuLink APIs depending on the CUDA version detected
in the current environment.
- Add a `cuda.core.experimental.system` module for querying system- or process- wide information.
- Support TCC devices with a default synchronous memory resource to avoid the use of memory pools
the hood, it uses either the nvJitLink or driver (`cuLink*`) APIs depending on the CUDA version
detected in the current environment.
- Support `pip install cuda-core`. Please see the Installation Guide for further detail.

## New features

- Add a `cuda.core.experimental.system` module for querying system- or process- wide information.
- Add `LaunchConfig.cluster` to support thread block clusters on Hopper GPUs.

## Enchancements

- Ensure "ltoir" is a valid code type to `ObjectCode`.
- Improve test coverage.
- The internal handle held by `ObjectCode` is now lazily initialized upon first touch.
- Support TCC devices with a default synchronous memory resource to avoid the use of memory pools.
- Ensure `"ltoir"` is a valid code type to `ObjectCode`.
- Document the `__cuda_stream__` protocol.
- Improve test coverage & documentation cross-references.
- Enforce code formatting.

## Bug fixes
Expand Down