Interruptible execution #433

achirkin · 2021-12-22T07:15:20Z

Cooperative-style interruptible C++ threads.

This proposal introduces raft::interruptible introducing three functions:

static void synchronize(rmm::cuda_stream_view stream);
static void yield();
static void cancel(std::thread::id thread_id);

synchronize and yield serve as cancellation points for the executing CPU thread. cancel allows to throw an async exception in a target CPU thread, which is observed in the nearest cancellation point. Altogether, these allow to cancel a long-running job without killing the OS process.

The key to make this work is an obvious observation that the CPU spends most of the time waiting on cudaStreamSynchronize. By replacing that with interruptible::synchronize, we introduce cancellation points in all critical places in code. If that is not enough in some edge cases (the cancellation points are too far apart), a developer can use yield to ensure that a cancellation request is received sooner rather than later.

Implementation

C++

raft::interruptible keeps an std::atomic_flag in the thread-local storage in each thread, which tells whether the thread can continue executing (being in non-cancelled state). cancel clears this flag, and yield checks it and resets to the signalled state (throwing a raft::interrupted_exception exception if necessary). synchronize implements a spinning lock querying the state of the stream and yielding on each iteration. I also add an overload sync_stream to the raft handle type, to make it easier to modify the behavior of all synchronization calls in raft and cuml.

python

This proposal adds a context manager cuda_interruptible to handle Ctrl+C requests during C++ calls (using posix signals). cuda_interruptible simply calls raft::interruptible::cancel on the target C++ thread.

Motivation

See rapidsai/cuml#4463

Resolves rapidsai/cuml#4384

cpp/include/raft/interruptible.hpp

…cel()

cjnolet

Overall I'm really excited about having this feature in RAFT and I think this is going to find use in many different RAPIDS projects. Here's my feedback so far.

python/raft/common/interruptible.pyx

cpp/include/raft/interruptible.hpp

…fea-interruptible

achirkin · 2022-02-03T14:08:24Z

rerun tests

cjnolet

I think this looks good. We can continue to iterate on it and fix additional issues as they may arise.

Using new raft codeowner approval

cjnolet

I think this looks good. We can continue to iterate on it and fix additional issues as they may arise.

cjnolet · 2022-02-03T21:03:18Z

@achirkin, this is ready to merge but RAFT's CPU builds were just enabled and aren't yet running successfully (hence the CI failure here). This will be merged very soon.

cjnolet

Had a hiccup in permissions. LGTM.

achirkin · 2022-02-04T06:18:51Z

rerun tests

cjnolet · 2022-02-05T13:11:43Z

rerun tests

tfeher

Thanks @achirkin for this work! It looks good to me. I have just a small suggestion for improving the documentation, pre-approving.

tfeher · 2022-02-07T13:57:15Z

cpp/include/raft/interruptible.hpp

+ * in code from outside of the thread. In particular, it provides an interruptible version of the
+ * blocking CUDA synchronization function, that allows dropping a long-running GPU work.
+ *
+ *


Consider copying parts of the PR description here:

Interruptible execution is facilitated using the following three functions: static void synchronize(rmm::cuda_stream_view stream); static void yield(); static void cancel(std::thread::id thread_id); synchronize and yield serve as cancellation points for the executing CPU thread. cancel allows to throw an async exception in a target CPU thread, which is observed in the nearest cancellation point. Altogether, these allow to cancel a long-running job without killing the OS process. The key to make this work is an obvious observation that the CPU spends most of the time waiting on cudaStreamSynchronize. By replacing that with interruptible::synchronize, we introduce cancellation points in all critical places in code. If that is not enough in some edge cases (the cancellation points are too far apart), a developer can use yield to ensure that a cancellation request is received sooner rather than later.

cpp/include/raft/interruptible.hpp

achirkin · 2022-02-07T18:16:24Z

rerun tests

…ptible

cjnolet · 2022-02-08T12:25:39Z

@gpucibot merge

### Cooperative-style interruptible C++ threads. This proposal attempts to make cuml experience more responsive by allowing easier way to interrupt/cancel long running cuml tasks. It replaces calls `cudaStreamSynchronize` with `raft::interruptible::synchronize`, which serve as a cancellation points in the algorithms. With a small extra hook on the python side, Ctrl+C requests now can interrupt the execution (almost) immediately. At this moment, I adapted just a few models as a proof-of-concept. Example: ```python import sklearn.datasets import cuml.svm X, y = sklearn.datasets.fetch_olivetti_faces(return_X_y=True) model = cuml.svm.SVC() print("Data loaded; fitting... (try Ctrl+C now)") try: model.fit(X, y) print("Done! Score:", model.score(X, y)) except Exception as e: print("Canceled!") print(e) ``` #### Implementation details rapidsai/raft#433 #### Adoption costs From the changeset in this PR you can see that I introduce two types of changes: 1. Change `cudaStreamSynchronize` to either `handle.sync_thread` or `raft::interruptible::synchronize` 2. Wrap the cython calls with [`cuda_interruptible`](https://github.com/rapidsai/raft/blob/36e8de5f73e9ec7e604b38a4290ac82bc35be4b7/python/raft/common/interruptible.pyx#L28) and `nogil` Change (1) is straightforward and can mostly be automated. Change (2) is a bit more involved. You definitely have to wrap a C++ call with `interruptibleCpp` to make `Ctrl+C` work, but that is also rather simple. The tricky part is adding `nogil`, because you have to make sure there is no python objects within `with nogil` block. However, `nogil` does not seem to be strictly required for the signal handler to successfully interrupt the C++ thread. It worked in my tests without `nogil` as well. Yet, I chose to add `nogil` in the code where possible, because in theory it should reduce the interrupt latency and enable more multithreading. #### Motivation In general, this proposal makes executing threads (and thus algos/models) more controllable. The main use cases I see: 1. Being able to Ctrl+C the running model using signal handlers. 2. Stopping the thread programmatically, e.g. we can create the tests of sort "if running for more than n seconds, stop and fail". Resolves #4384 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #4463

### Cooperative-style interruptible C++ threads. This proposal attempts to make cuml experience more responsive by allowing easier way to interrupt/cancel long running cuml tasks. It replaces calls `cudaStreamSynchronize` with `raft::interruptible::synchronize`, which serve as a cancellation points in the algorithms. With a small extra hook on the python side, Ctrl+C requests now can interrupt the execution (almost) immediately. At this moment, I adapted just a few models as a proof-of-concept. Example: ```python import sklearn.datasets import cuml.svm X, y = sklearn.datasets.fetch_olivetti_faces(return_X_y=True) model = cuml.svm.SVC() print("Data loaded; fitting... (try Ctrl+C now)") try: model.fit(X, y) print("Done! Score:", model.score(X, y)) except Exception as e: print("Canceled!") print(e) ``` #### Implementation details rapidsai/raft#433 #### Adoption costs From the changeset in this PR you can see that I introduce two types of changes: 1. Change `cudaStreamSynchronize` to either `handle.sync_thread` or `raft::interruptible::synchronize` 2. Wrap the cython calls with [`cuda_interruptible`](https://github.com/rapidsai/raft/blob/36e8de5f73e9ec7e604b38a4290ac82bc35be4b7/python/raft/common/interruptible.pyx#L28) and `nogil` Change (1) is straightforward and can mostly be automated. Change (2) is a bit more involved. You definitely have to wrap a C++ call with `interruptibleCpp` to make `Ctrl+C` work, but that is also rather simple. The tricky part is adding `nogil`, because you have to make sure there is no python objects within `with nogil` block. However, `nogil` does not seem to be strictly required for the signal handler to successfully interrupt the C++ thread. It worked in my tests without `nogil` as well. Yet, I chose to add `nogil` in the code where possible, because in theory it should reduce the interrupt latency and enable more multithreading. #### Motivation In general, this proposal makes executing threads (and thus algos/models) more controllable. The main use cases I see: 1. Being able to Ctrl+C the running model using signal handlers. 2. Stopping the thread programmatically, e.g. we can create the tests of sort "if running for more than n seconds, stop and fail". Resolves rapidsai#4384 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4463

achirkin added 5 commits December 16, 2021 08:26

First take

c18cab1

Merge branch 'branch-22.02' into fea-interruptible

34d4023

Some refactoring and yield function

e1a0c3a

Fix a typo

f6222dc

Added a python Ctrl+C handler wrapper

ee99523

achirkin requested review from a team as code owners December 22, 2021 07:15

github-actions bot added cpp python labels Dec 22, 2021

achirkin marked this pull request as draft December 22, 2021 07:15

achirkin changed the title ~~[POCFea interruptible~~ [POC] Fea interruptible Dec 22, 2021

achirkin changed the title ~~[POC] Fea interruptible~~ [POC] Interruptible execution Dec 22, 2021

achirkin mentioned this pull request Dec 22, 2021

Interruptible execution rapidsai/cuml#4463

Merged

achirkin added feature request New feature or request non-breaking Non-breaking change labels Dec 22, 2021

achirkin added 2 commits December 22, 2021 08:56

Fix linter

a07edae

Fix linter

b3119bb

achirkin mentioned this pull request Dec 22, 2021

[FEA] Interruptible execution rapidsai/cuml#4384

Closed

jrhemstad reviewed Jan 4, 2022

View reviewed changes

cpp/include/raft/interruptible.hpp Outdated Show resolved Hide resolved

achirkin added 2 commits January 10, 2022 16:57

Initialize cuda primitives lazily and add a mutex-free non-static can…

54a0599

…cel()

Fix relative import

db5adfd

cjnolet requested changes Jan 10, 2022

View reviewed changes

python/raft/common/interruptible.pyx Show resolved Hide resolved

python/raft/common/interruptible.pyx Outdated Show resolved Hide resolved

cpp/include/raft/interruptible.hpp Outdated Show resolved Hide resolved

cpp/include/raft/interruptible.hpp Outdated Show resolved Hide resolved

achirkin added 6 commits January 11, 2022 10:09

Fix deallocation issue with shared_ptr + unordered_map

5539984

Refactor names

4b95859

Merge branch 'branch-22.02' of https://github.com/rapidsai/raft into …

36e8de5

…fea-interruptible

Make comms sync_stream interruptible

a2610d1

Enable OpenMP in raft

53155e9

Add gtests

396beda

achirkin requested review from a team as code owners February 3, 2022 06:40

Update docs

3e67ec0

cjnolet removed the request for review from a team February 3, 2022 20:13

cjnolet previously approved these changes Feb 3, 2022

View reviewed changes

cjnolet approved these changes Feb 3, 2022

View reviewed changes

cjnolet and others added 3 commits February 4, 2022 15:12

Merge branch 'branch-22.04' into fea-interruptible

fc81823

Merge branch 'branch-22.04' into fea-interruptible

c1a7070

Add 'cudart' to cython libs

98c9035

Don't use __nanosleep on older archs

dbcdcf0

tfeher approved these changes Feb 7, 2022

View reviewed changes

Add a comment about using thread-local storage.

853b5c3

achirkin added 2 commits February 8, 2022 08:20

Merge remote-tracking branch 'rapidsai/branch-22.04' into fea-interru…

d32f4df

…ptible

Replace more cudaStreamSynchronize with handle.sync_stream

e8b7b54

rapids-bot bot merged commit 1a49fc1 into rapidsai:branch-22.04 Feb 8, 2022

achirkin deleted the fea-interruptible branch March 31, 2022 06:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interruptible execution #433

Interruptible execution #433

achirkin commented Dec 22, 2021 •

edited

Loading

cjnolet left a comment

achirkin commented Feb 3, 2022

cjnolet left a comment

cjnolet left a comment

cjnolet commented Feb 3, 2022

cjnolet left a comment

achirkin commented Feb 4, 2022

cjnolet commented Feb 5, 2022

tfeher left a comment

tfeher Feb 7, 2022

achirkin commented Feb 7, 2022

cjnolet commented Feb 8, 2022

Interruptible execution #433

Interruptible execution #433

Conversation

achirkin commented Dec 22, 2021 • edited Loading

Cooperative-style interruptible C++ threads.

Implementation

C++

python

Motivation

cjnolet left a comment

Choose a reason for hiding this comment

achirkin commented Feb 3, 2022

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet commented Feb 3, 2022

cjnolet left a comment

Choose a reason for hiding this comment

achirkin commented Feb 4, 2022

cjnolet commented Feb 5, 2022

tfeher left a comment

Choose a reason for hiding this comment

tfeher Feb 7, 2022

Choose a reason for hiding this comment

achirkin commented Feb 7, 2022

cjnolet commented Feb 8, 2022

achirkin commented Dec 22, 2021 •

edited

Loading