Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge/replace with_${python_obj} context managers with use of allow_threads #11722

Closed
stuhood opened this issue Mar 17, 2021 · 6 comments · Fixed by #13617
Closed

Merge/replace with_${python_obj} context managers with use of allow_threads #11722

stuhood opened this issue Mar 17, 2021 · 6 comments · Fixed by #13617

Comments

@stuhood
Copy link
Member

stuhood commented Mar 17, 2021

The with_$x context managers add a bunch of noise, and are much less central than calling allow_threads, which we should try to do in nearly all cases.

Not calling allow_threads before calling various methods of the Scheduler/Session (particularly anything to do with the Graph) can cause deadlocks... not to mention meaning that we're holding the GIL over native code that might not need it. For example: #11723

It's possible that these methods could be merged, replaced, or enhanced to also release the GIL when called. But a challenge is that each of the with_$x methods cannot call allow_threads, because each of them need the GIL to dereference the pointer.


One strawdesign that might work would be something like:

fn with_unwrapped<U, F, T>(py: Python, unwrappable: U, f: F) -> T
where
  U: Unwrappable,
  F: FnOnce(U::Output) -> T,
{
  let unwrapped = unwrappable.unwrapped();
  py.allow_threads(|| {
    f(unwrapped)
  })
}

We'd then implement the hyptothetical Unwrappable trait for tuples of 1, 2, 3-ish python objects (PyScheduler, PySession, etc) allowing you to make a single call to unwrap all of them at once. The single call to the method would allow it to call allow_threads, since you would not need to nest them.

@Eric-Arellano
Copy link
Contributor

Will this need to be done all-or-nothing, or can be incremental?

I'd love to work on this one, I'm enjoying learning more about FFI and the GIL.

@stuhood
Copy link
Member Author

stuhood commented Mar 17, 2021

Will this need to be done all-or-nothing, or can be incremental?

It could be incremental: we could add function(s) next to the with_$x methods, and move code to them.

I've updated the description to call out one of the challenges and a potential design.

@stuhood
Copy link
Member Author

stuhood commented Mar 17, 2021

OH. 😮 .. Alternatively, if we're able to implement FromPyObject for &Scheduler and etc, I think that we could change the signature of e.g. garbage_collect_store to:

fn garbage_collect_store(py: Python, scheduler: &Scheduler, target_size_bytes: usize) -> PyUnitResult {
    py.allow_threads(|| {
      scheduler
        .core
        .store()
        .garbage_collect(target_size_bytes, store::ShrinkBehavior::Fast)
    })
    .map_err(|e| PyErr::new::<exc::Exception, _>(py, (e,)))
    .map(|()| None)
}

...and get rid of with_$x! Doesn't help with allow_threads, but.

EDIT: Filed dgrunwald/rust-cpython#260 about a potential way to remove allow_threads in some cases.

stuhood added a commit that referenced this issue Mar 17, 2021
### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See #11722 for how we might make this less likely in the future.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Mar 17, 2021
…sbuild#11723)

### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See pantsbuild#11722 for how we might make this less likely in the future.

[ci skip-build-wheels]
stuhood added a commit to stuhood/pants that referenced this issue Mar 17, 2021
…sbuild#11723)

### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See pantsbuild#11722 for how we might make this less likely in the future.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Mar 17, 2021
…errypick of #11723)

### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See #11722 for how we might make this less likely in the future.

[ci skip-build-wheels]
stuhood added a commit that referenced this issue Mar 17, 2021
…errypick of #11723)

### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See #11722 for how we might make this less likely in the future.

[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Jun 9, 2021
We no longer use the `with_executor` mechanism from #11722, as we can directly store an `Executor` on the `PyNailgunClient` cheaply. This allows us to use `Python.allow_threads()` for better parallelism.

It's awkward that our Python code now has `PyExecutor` from Rust-Cpython and PyO3, and that those aren't compatible. But this is less offensive than the risk of one big change rather than an incremental migration.

We can't yet migrate `PyNailgunSession` because `PyCancellationLatch` is used by `PySession`.

--

This also improves our file organization. We use a new pattern of having a `register()` function in each `interface/` file to register that file's functions and classes on the single native extension module, which makes `interface.rs` less bloated.
@benjyw benjyw removed the tech-debt label Sep 9, 2021
stuhood added a commit to stuhood/pants that referenced this issue Oct 6, 2021
…sbuild#11723)

### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See pantsbuild#11722 for how we might make this less likely in the future.

[ci skip-build-wheels]# Delete this line to force CI to run the JVM tests.
[ci skip-jvm-tests]
stuhood added a commit that referenced this issue Oct 7, 2021
…errypick of #11723) (#13149)

### Problem

By chance, a garbage collection thread started up and called `lease_files_in_graph` while I happened to be running Pants in the foreground. This caused a deadlock because `lease_files_in_graph` was not releasing the GIL before touching the `Graph` (via the `Scheduler`)... and foreground interactions with the `Graph` frequently need to acquire the GIL to check for equality.

### Solution

Move the problematic call to `scheduler.all_digests()` under `allow_threads` and add a note. Additionally, move one other potentially-problematic method call under `allow_threads`.

### Result

A rare (hopefully?) deadlock is prevented. See #11722 for how we might make this less likely in the future.

[ci skip-build-wheels]
[ci skip-jvm-tests]
@Eric-Arellano
Copy link
Contributor

@stuhood I'm confused on this one. The pyclasses already implement FromPyObject. I'm wanting to close this issue out as prework for PyO3, starting with this because it's entirely self-contained fwict:

fn nailgun_server_await_shutdown(py: Python, nailgun_server_ptr: PyNailgunServer) -> PyUnitResult {
with_nailgun_server(py, nailgun_server_ptr, |nailgun_server, executor| {
let executor = executor.clone();
let shutdown_result = if let Some(server) = nailgun_server.borrow_mut().take() {
py.allow_threads(|| executor.block_on(server.shutdown()))
} else {
Ok(())
};
shutdown_result.map_err(|e| PyErr::new::<exc::Exception, _>(py, (e,)))?;
Ok(None)
})
}

@stuhood
Copy link
Member Author

stuhood commented Nov 8, 2021

The pyclasses already implement FromPyObject.

They implement FromPyObject for themselves, but not for their inner type: the change to the method that you linked would be to directly take a nailgun::Server or a reference to a nailgun::Server. That one is a bit more challenging because it has multiple fields.

fn nailgun_server_await_shutdown(py: Python, nailgun_server: &nailgun::Server) -> PyUnitResult {

@Eric-Arellano
Copy link
Contributor

I think I'm going to try to stick to the current implementation while porting to PyO3. Once that lands, it should be easier to make this change.

Eric-Arellano added a commit that referenced this issue Nov 15, 2021
Closes #11722. There's less indirection now.

This only refactors. It does not yet add more `py.allow_threads(||)` like #11722 suggested we should do. It does call `.clone()` fewer times than before though.

[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants