-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable concurrent runs for pantsd in v2 #7654
Comments
### Problem Pants' runtime overhead is influenced by multiple factors: 1) packaging overhead in virtualenvs and pexes 2) import time for python code 3) time to walk the filesystem and (re)fingerprint inputs 4) time to run `@rule`s with unchanged inputs Pantsd helps to address all of these issues, and has just (re-)reached maturity. ### Solution Prepare to enable `pantsd` by default by deprecating not setting the `enable_pantsd` flag. Also, set the flag for pantsbuild/pants. ### Result noop time when running in a virtualenv (such as in [github.com/pantsbuild/example-python](https://github.com/pantsbuild/example-python)) drops from `~2.4s` to `~1.6s`. Followup work (moving fingerprinting to the server, porting the client to rust) is expected to push this below `500ms`. There are a collection of known issues that might impact users tracked in https://github.com/pantsbuild/pants/projects/5, some of which we will address via dogfooding. Others are matters of expectation: "I wouldn't expect that to work". One of the most significant issues though is #7654: we should consider making a push before `1.29.0` to remove use of global mutable state to allow for concurrent pants runs with pantsd under v2. Fixes #4438.
### Problem The `PantsDaemon` class was playing double duty as both the entrypoint for the pantsd server, and as a launcher for the client. We will soon need to make significant changes there in order to support #8200 and #7654. ### Solution Extract a `PantsDaemonProcessManager` base class for process metadata reads/writes and a `PantsDaemonClient` subclass to consume the metadata to decide whether to launch the server. This is 100% code moves... no logic changed. ### Result The `PantsDaemon` class is now exclusively a server, and `PantsDaemonClient` is now exclusively a client. [ci skip-rust-tests] [ci skip-jvm-tests]
### Problem Currently we restart pantsd for most configuration changes, and exclude a small set of bootstrap options (by marking them `daemon=False`) that should not trigger a restart. But in most cases, restarting is heavy-handed. We'd like to be able to keep more and more of our state alive over time as we continue to remove global mutable state (in order to allow us to tackle #7654, among other things). Additionally, the pantsd client currently implements the fingerprinting that decides when the server should restart, which blocks moving the pantsd client to rust. We'd like the client to only need to interact with a small set of options to simplify its implementation. ### Solution Move the nailgun server out of the `PantsService` model and directly into the `PantsDaemon` class. Share a `PantsDaemonCore` between the daemon and `DaemonPantsRunner` that encapsulates the current `Scheduler` and all live services. Finally, have the `PantsDaemonCore` implement fingerprinting to decide when to reinitialize/recreate the `Scheduler` (without restarting) and trim down the options that trigger a restart (`daemon=True`) to only those that are used to start the daemon itself (rather than to create the `Scheduler`). ### Result `pantsd` will stay up through the vast majority of options changes (with the exception of a handful of "micro-bootstrap" options), and will instead reinitialize the `Scheduler` for bootstrap options changes with some useful output when it does so. Example: ``` $ ./pants help 23:26:22 [INFO] initializing pantsd... 23:26:24 [INFO] pantsd initialized. Pants 1.30.0.dev0 https://pypi.org/pypi/pantsbuild.pants/1.30.0.dev0 Usage: <snip> $ ./pants --no-v1 help 23:26:31 [INFO] initialization options changed: reinitializing pantsd... 23:26:32 [INFO] pantsd initialized. Pants 1.30.0.dev0 https://pypi.org/pypi/pantsbuild.pants/1.30.0.dev0 Usage: <snip> ``` This prepares to port the client to rust, and unblocks a fix for #8200, by having the `PantsDaemon` class tear down the nailgun server cleanly in the foreground if any services exit. Fixes #6114, fixes #7573, and fixes #10041.
Got another request for this. Possibly worth tackling pre-2.0 in case there are API changes necessary? Unclear. |
…k in future versions (via pantsbuild#7654) that implementation will not require nesting to validate: only concurrent runs.
### Problem Since #8265, we have been running `PantsRunIntegrationTest`s from loose sources in the repository, which are included via `src/python/pants/testutil:int-test`'s dependency on `src/python/pants/bin:pants_local_binary`. That change thus removed the need for the `pants.pex` from our wrapper scripts. ### Solution Remove the "`pants.pex` for integration tests" mechanism. Post #8625, pants is run from the PYTHONPATH of the test target, which will automatically include either loose sources or a `pants_requirement` target, depending on whether the test is run in or out of the pantsbuild/pants repo. Additionally, remove one set of tests that was attempting to test that nested runs of pants do not deadlock. #7654 will eventually allow that issue to be resolved by allowing the concurrent run rather than via any sort of automatic disabling of `pantsd`, and that is much easier to test via concurrent runs _without_ nesting. [ci skip-rust-tests]
…#10320) ### Problem To prepare for #7654, we need to remove dependence on signal handling between the pantsd client and server. Signals do not provide enough information to decide which run to abort, and in general are not particularly subtle. ### Solution Rather than signals, we will instead rely on nailgun connection liveness via the "Heartbeat" extension to the nailgun protocol (implemented in [nails 0.6.0](stuhood/nails@fb72699...070fe03)), which allows for requiring regular heartbeats from the client, and aborting a run if they do not arrive. In addition to heartbeats, `nails` now also makes guarantees about how a `Nail` will be notified that the connection has closed (namely that the input stream will remain open until the connection dies). ### Result Followup changes (including #10004) will be able to take advantage of these new signals to cancel ongoing work without killing the pantsd server.
### Problem #11536 moves from using POSIX-level replacement of the `stdio` file descriptors `{0, 1, 2}`, to replacing `sys.std*` with a thread-local implementation. Unfortunately, Python `subprocess`/`Popen` APIs hardcode `{0, 1, 2}` rather than actually inspecting `sys.std*.fileno()`, and so usages of those APIs that use `std*=None` (i.e. "inherit" mode), will inherit intentionally dead/closed file handles under `pantsd`. PEX uses inherit mode when spawning `pip` (in order to pass through `stderr` to the parent process), and this runs afoul of the above behavior. Pants has used PEX as a library for a very long time, but 95% of usecases have now migrated to using PEX as a binary, with wrappers exposed via the `@rule` API. The `PluginResolver` that Pants uses to resolve its own code is the last usage of PEX as a library. ### Solution In a series of commits, introduce a "bootstrap" `Scheduler` that is able to resolve plugins after an `OptionsBootstrapper` has been created, but before creating the `BuildConfiguration` (which contains plugin information). ### Result Although Pants has some stray references to PEX APIs, it no longer uses PEX as a library to resolve dependencies. #11536 and #7654 are unblocked. In future, if the options required to construct a minimal scheduler can be further pruned, the bootstrap `Scheduler` might also be able to be used to create the `OptionsBootstrapper`, which would allow for addressing #10360.
Hey folks! In the homestretch on #11536, with two test failures and a console-width-reporting issue to look into. Very optimistic that it will land next week. Once it does, I'm optimistic that the final PR for this change will be significantly smaller. Thanks for your patience. |
) ### Problem In order to support concurrent `pantsd` clients, work spawned by each client (even into background threads) must interact with the relevant `stdio` file handles. And when a client disconnects, spawned work should be able to continue to log to the `pantsd.log`. ### Solution Extract the thread/task-local aspects of our `logging` crate into a `stdio` crate that provides: 1) a `Console`-aware `logging` destination 2) exclusive access to a `Console` while a UI or `InteractiveProcess` is running 3) Python-level `sys.std*` replacements ### Result No user-visible impact, but one of the largest remaining blockers for #7654 is removed. [ci skip-build-wheels]
#11536 landed yesterday, and appears stable so far. I've drafted the change to enable concurrent runs, and so far it looks good! Very optimistic that this will be able to land before the end of the week. |
### Problem In order to allow concurrent runs of Pants in #11639 (part of #7654), all global mutable singletons must be either 1) thread-local, 2) replaced with explicitly passed values. `os.environ` and `sys.argv` are two of the last cases we're concerned with. ### Solution Although we could hypothetically redefine `os.environ` and `os.getenv` as thread-local, they seem to be a better fit to be explicitly passed. So this change: 1. Renames (and moves) `PantsEnvironment` to `CompleteEnvironment` 2. Adds `Environment` and `EnvironmentRequest`, which filter the `CompleteEnvironment` to a smaller subset, and should be used more frequently by `@rules`. 3. Fixes a few cases of implicit `os.environ` usage in the `PythonSetup` subsystem by explicitly selecting interpreter-selection env vars (`HOME`/`PATH`/`PYENV_ROOT`) Finally, we empty the environment for `pantsd` runs to enforce explicit usage of `CompleteEnvironment`/`Environment`. ### Result The last (?) global mutable variable blocking #7654 is removed. [ci skip-build-wheels]
So, unfortunately, #10827 was a bit too optimistic about being able to support concurrent sessions with But I think that I have a new solution that would replace So, in short: this ticket is blocked on #11269. I have a few days progress on #11269, and I'm optimistic that it will be simpler than our existing rule graph construction, as well as unlocking things like this. |
In order for the |
Is the end goal for this issue solely that simultaneous invocations of |
Hey @Moortiii : yes, that is the goal. It would not be expected to impact the performance of un-contended runs. |
This would be great for use case 1 as discussed in #20642 (comment). We currently set |
pantsd
currently prevents concurrent runs because of global mutable singletons that were pervasive in v1. In v2, there are none of these left, and so only concurrent access to stdio needs to be managed.We would like to allow for concurrent runs under pantsd, without the use of the
PANTS_CONCURRENT=True
flag (ie, that flag would become a noop and then be deprecated).A sketch of what this will likely involve, in at least four PRs:
Move to the rust nailgun client, to avoid needing to implement any client-side logic twice.Fixed in Re-land the port of Pants' nailgun client to Rust #11147.cancellation needs to move to heartbeat-basedAdd a "canceled" bool/sync primitive (probably a watch) toSession
, and propagate it in from the read half of the connection closing.Client closes the write half of the socket, then waits for the server to close the other half (might require additional support innails
).Cancellation bool consumed in all relevant places:InteractiveProcess: should move to spawning the process and then asynchronously waiting for completion.Graph: should cancel ongoing work if the client/Session that started it goes away (and let any existing clients restart it).pantsd
lifecycle somewhere.remaining singletons need to be located and fixed
All access to stdio in both the rust code and the python code should be replaced with access to Session-specific files, and thesys.std*
file handles should be closed, poisoned, or replaced with synthetic thread-local files (á la).input
method to Console #11398.All remaining usages ofSubsystem.global_instance
should be removed.allow concurrent runs in DaemonPantsRunner
Scheduler
s (rather than the current singularScheduler
) to allow for concurrent access with different options. Also, consider porting the concurrentScheduler
management to Rust?The text was updated successfully, but these errors were encountered: