Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interruptible execution #433

Merged
merged 40 commits into from
Feb 8, 2022
Merged

Conversation

achirkin
Copy link
Contributor

@achirkin achirkin commented Dec 22, 2021

Cooperative-style interruptible C++ threads.

This proposal introduces raft::interruptible introducing three functions:

static void synchronize(rmm::cuda_stream_view stream);
static void yield();
static void cancel(std::thread::id thread_id);

synchronize and yield serve as cancellation points for the executing CPU thread. cancel allows to throw an async exception in a target CPU thread, which is observed in the nearest cancellation point. Altogether, these allow to cancel a long-running job without killing the OS process.

The key to make this work is an obvious observation that the CPU spends most of the time waiting on cudaStreamSynchronize. By replacing that with interruptible::synchronize, we introduce cancellation points in all critical places in code. If that is not enough in some edge cases (the cancellation points are too far apart), a developer can use yield to ensure that a cancellation request is received sooner rather than later.

Implementation

C++

raft::interruptible keeps an std::atomic_flag in the thread-local storage in each thread, which tells whether the thread can continue executing (being in non-cancelled state). cancel clears this flag, and yield checks it and resets to the signalled state (throwing a raft::interrupted_exception exception if necessary). synchronize implements a spinning lock querying the state of the stream and yielding on each iteration. I also add an overload sync_stream to the raft handle type, to make it easier to modify the behavior of all synchronization calls in raft and cuml.

python

This proposal adds a context manager cuda_interruptible to handle Ctrl+C requests during C++ calls (using posix signals). cuda_interruptible simply calls raft::interruptible::cancel on the target C++ thread.

Motivation

See rapidsai/cuml#4463

Resolves rapidsai/cuml#4384

@achirkin achirkin requested review from a team as code owners December 22, 2021 07:15
@achirkin achirkin marked this pull request as draft December 22, 2021 07:15
@achirkin achirkin changed the title [POCFea interruptible [POC] Fea interruptible Dec 22, 2021
@achirkin achirkin changed the title [POC] Fea interruptible [POC] Interruptible execution Dec 22, 2021
@achirkin achirkin added feature request New feature or request non-breaking Non-breaking change labels Dec 22, 2021
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm really excited about having this feature in RAFT and I think this is going to find use in many different RAPIDS projects. Here's my feedback so far.

python/raft/common/interruptible.pyx Show resolved Hide resolved
python/raft/common/interruptible.pyx Outdated Show resolved Hide resolved
cpp/include/raft/interruptible.hpp Outdated Show resolved Hide resolved
cpp/include/raft/interruptible.hpp Outdated Show resolved Hide resolved
@achirkin achirkin requested review from a team as code owners February 3, 2022 06:40
@achirkin
Copy link
Contributor Author

achirkin commented Feb 3, 2022

rerun tests

@cjnolet cjnolet removed the request for review from a team February 3, 2022 20:13
cjnolet
cjnolet previously approved these changes Feb 3, 2022
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. We can continue to iterate on it and fix additional issues as they may arise.

@cjnolet cjnolet dismissed their stale review February 3, 2022 20:14

Using new raft codeowner approval

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. We can continue to iterate on it and fix additional issues as they may arise.

@cjnolet
Copy link
Member

cjnolet commented Feb 3, 2022

@achirkin, this is ready to merge but RAFT's CPU builds were just enabled and aren't yet running successfully (hence the CI failure here). This will be merged very soon.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a hiccup in permissions. LGTM.

@achirkin
Copy link
Contributor Author

achirkin commented Feb 4, 2022

rerun tests

@cjnolet
Copy link
Member

cjnolet commented Feb 5, 2022

rerun tests

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @achirkin for this work! It looks good to me. I have just a small suggestion for improving the documentation, pre-approving.

* in code from outside of the thread. In particular, it provides an interruptible version of the
* blocking CUDA synchronization function, that allows dropping a long-running GPU work.
*
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider copying parts of the PR description here:

Interruptible execution is facilitated using the following three functions:

static void synchronize(rmm::cuda_stream_view stream);
static void yield();
static void cancel(std::thread::id thread_id);

synchronize and yield serve as cancellation points for the executing CPU thread. cancel allows to throw an async exception in a target CPU thread, which is observed in the nearest cancellation point. Altogether, these allow to cancel a long-running job without killing the OS process.

The key to make this work is an obvious observation that the CPU spends most of the time waiting on cudaStreamSynchronize. By replacing that with interruptible::synchronize, we introduce cancellation points in all critical places in code. If that is not enough in some edge cases (the cancellation points are too far apart), a developer can use yield to ensure that a cancellation request is received sooner rather than later.

cpp/include/raft/interruptible.hpp Show resolved Hide resolved
@achirkin
Copy link
Contributor Author

achirkin commented Feb 7, 2022

rerun tests

@cjnolet
Copy link
Member

cjnolet commented Feb 8, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 1a49fc1 into rapidsai:branch-22.04 Feb 8, 2022
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request Feb 8, 2022
### Cooperative-style interruptible C++ threads.

This proposal attempts to make cuml experience more responsive by allowing easier way to interrupt/cancel long running cuml tasks. It replaces calls `cudaStreamSynchronize` with `raft::interruptible::synchronize`, which serve as a cancellation points in the algorithms. With a small extra hook on the python side, Ctrl+C requests now can interrupt the execution (almost) immediately. At this moment, I adapted just a few models as a proof-of-concept.

Example:
```python
import sklearn.datasets
import cuml.svm

X, y = sklearn.datasets.fetch_olivetti_faces(return_X_y=True)
model = cuml.svm.SVC()
print("Data loaded; fitting... (try Ctrl+C now)")
try:
    model.fit(X, y)
    print("Done! Score:", model.score(X, y))
except Exception as e:
    print("Canceled!")
    print(e)
```
#### Implementation details
rapidsai/raft#433

#### Adoption costs
From the changeset in this PR you can see that I introduce two types of changes:
  1. Change `cudaStreamSynchronize` to either `handle.sync_thread` or `raft::interruptible::synchronize`
  2. Wrap the cython calls with  [`cuda_interruptible`](https://github.com/rapidsai/raft/blob/36e8de5f73e9ec7e604b38a4290ac82bc35be4b7/python/raft/common/interruptible.pyx#L28) and `nogil`

Change (1) is straightforward and can mostly be automated.

Change (2) is a bit more involved. You definitely have to wrap a C++ call with `interruptibleCpp` to make `Ctrl+C` work, but that is also rather simple. The tricky part is adding `nogil`, because you have to make sure there is no python objects within `with nogil` block. However, `nogil` does not seem to be strictly required for the signal handler to successfully interrupt the C++ thread. It worked in my tests without `nogil` as well. Yet, I chose to add `nogil` in the code where possible, because in theory it should reduce the interrupt latency and enable more multithreading.

#### Motivation
In general, this proposal makes executing threads (and thus algos/models) more controllable. The main use cases I see:

  1. Being able to Ctrl+C the running model using signal handlers.
  2. Stopping the thread programmatically, e.g. we can create the tests of sort "if running for more than n seconds, stop and fail".

Resolves #4384

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4463
@achirkin achirkin deleted the fea-interruptible branch March 31, 2022 06:09
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
### Cooperative-style interruptible C++ threads.

This proposal attempts to make cuml experience more responsive by allowing easier way to interrupt/cancel long running cuml tasks. It replaces calls `cudaStreamSynchronize` with `raft::interruptible::synchronize`, which serve as a cancellation points in the algorithms. With a small extra hook on the python side, Ctrl+C requests now can interrupt the execution (almost) immediately. At this moment, I adapted just a few models as a proof-of-concept.

Example:
```python
import sklearn.datasets
import cuml.svm

X, y = sklearn.datasets.fetch_olivetti_faces(return_X_y=True)
model = cuml.svm.SVC()
print("Data loaded; fitting... (try Ctrl+C now)")
try:
    model.fit(X, y)
    print("Done! Score:", model.score(X, y))
except Exception as e:
    print("Canceled!")
    print(e)
```
#### Implementation details
rapidsai/raft#433

#### Adoption costs
From the changeset in this PR you can see that I introduce two types of changes:
  1. Change `cudaStreamSynchronize` to either `handle.sync_thread` or `raft::interruptible::synchronize`
  2. Wrap the cython calls with  [`cuda_interruptible`](https://github.com/rapidsai/raft/blob/36e8de5f73e9ec7e604b38a4290ac82bc35be4b7/python/raft/common/interruptible.pyx#L28) and `nogil`

Change (1) is straightforward and can mostly be automated.

Change (2) is a bit more involved. You definitely have to wrap a C++ call with `interruptibleCpp` to make `Ctrl+C` work, but that is also rather simple. The tricky part is adding `nogil`, because you have to make sure there is no python objects within `with nogil` block. However, `nogil` does not seem to be strictly required for the signal handler to successfully interrupt the C++ thread. It worked in my tests without `nogil` as well. Yet, I chose to add `nogil` in the code where possible, because in theory it should reduce the interrupt latency and enable more multithreading.

#### Motivation
In general, this proposal makes executing threads (and thus algos/models) more controllable. The main use cases I see:

  1. Being able to Ctrl+C the running model using signal handlers.
  2. Stopping the thread programmatically, e.g. we can create the tests of sort "if running for more than n seconds, stop and fail".

Resolves rapidsai#4384

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Interruptible execution
4 participants