-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run LSP handlers consecutively by default #361
Conversation
70a53ef
to
096e23e
Compare
}, | ||
Err(err) => log::error!("Can't retrieve console inputs: {err:?}"), | ||
} | ||
self.send_lsp(LspEvent::RefreshAllDiagnostics()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am noting that this has disappeared, but presumably is added back in elsewhere some other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it has been absorbed into this
pub(crate) fn did_change_console_inputs(
inputs: ConsoleInputs,
state: &mut WorldState,
) -> anyhow::Result<()> {
state.console_scopes = inputs.console_scopes;
state.installed_packages = inputs.installed_packages;
// We currently rely on global console scopes for diagnostics, in particular
// during package development in conjunction with `devtools::load_all()`.
// Ideally diagnostics would not rely on these though, and we wouldn't need
// to refresh from here.
lsp::spawn_diagnostics_refresh_all(state.clone());
Ok(())
}
}); | ||
} | ||
// Wait for response from main loop | ||
response_rx.recv().await.unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this await
allows tower-lsp to switch away and start another handler, but all that can really do is "relay" an additional event to the main loop, which is in charge of keeping the requests and their responses in order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly right.
fn new_jsonrpc_error(message: String) -> jsonrpc::Error { | ||
jsonrpc::Error { | ||
code: jsonrpc::ErrorCode::ServerError(-1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know where these errors end up getting logged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine in VS Code's LSP client, they are probably turned into notifications or maybe console messages.
lsp::spawn_blocking(|| { | ||
indexer::start(folders); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a reminder here about doing the diagnostic refresh after each indexer finishes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up doing the refresh on did_open, just like we do in did_update.
Note that there's a race condition between the initial indexer and our state handlers. There was already one for did_update and now there is one too for did_open. Nothing bad can happen AFAICS but it's possible we get the indexer in a weird state in rare cases. I expect that we'll completely review the way this works for RC though (move the index to the world state and compute it on demand with salsa caching).
I've confirmed that the the failure case you brought up in the last PR is now fixed.
// The global instance of the auxiliary event channel, used for sending log | ||
// messages or spawning threads from free functions. Since this is an unbounded | ||
// channel, sending a log message is not async nor blocking. Tokio senders are | ||
// Send and Sync so this global variable can be safely shared across threads. | ||
static mut AUXILIARY_EVENT_TX: std::cell::OnceCell<TokioUnboundedSender<AuxiliaryEvent>> = | ||
std::cell::OnceCell::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it would be possible, but annoying, to instead have auxiliary_event_tx
and auxiliary_event_rx
live in GlobalState
as well, rather than being a global object?
Like, every call to log_error!()
would have to pass through auxiliary_event_tx
, which would get passed through to main_loop::log()
so that it could then send the message along. And to be able to do that you'd also have to pass auxiliary_event_tx
through any handlers that log.
As annoying as that is, it might be worth considering it if it removes this global state. This is the only thing left that I'm a little uncomfortable with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would force all function calls that log an event to have a state
parameter which would add a lot of friction. I think this is the sort of things for which the ergonomic gain from a global context is worth it.
fba54fa
to
f01c670
Compare
To avoid write handlers blocking for too long when there are sharing refs (e.g. snapshots on a background thread)
Co-authored-by: Davis Vaughan <[email protected]>
Branched from #360 (Follow this link to see preparations for this PR)
Addresses posit-dev/positron#2692.
Closes posit-dev/positron#2999.
Supersedes and closes #340.
This PR refactors our LSP server to solve persistent ordering issues of message ordering and corruption of internal state that have caused Ark to crash periodically (posit-dev/positron#271, posit-dev/positron#340). We've implemented a number of workarounds over the last year (ffd8b27, #45) but we still observe crashes caused by stale cursors and ranges (posit-dev/positron#2692). This has also been brought up by beta testers.
@DavisVaughan found out that the ruff project recently switched from tower-lsp to lsp-server (from the rust-analyzer project) for reasons relevant to us here: astral-sh/ruff#10158. See also this discussion on the tower-lsp repo: ebkalderon/tower-lsp#284. The gist is that while tower-lsp does call our message handlers in a single task (and thus on a single thread) in the correct order, any
await
point within the handler will cause a transfer of control to the next handler. In particular, we used to send log messages on handler entry and this was an await point so the ordering of our handlers was doomed from the get-go.My first attempt to fix this was #340 which takes the approach of synchronising the handlers with a read-write lock. This lock allowed concurrent read handlers but forced write handlers that update the state to wait for the read handlers to finish before running. Any incoming requests or notifications at that point would also be queued by the RwLock until the write handler was done. However I ended up deciding against this approach for these reasons:
To allow Ark to scale to more complex code analysis, we need to preserve the ability to spawn longer-running tasks. And we can't wait for these to complete everytime we get a write request on our lock, as that would overly reduce our throughput. (This reason is no longer relevant now that
WorldState
is safely clonable though.)I think it's safer to turn off all concurrency between handlers by default and enable it on a case by case basis after giving appropriate consideration. While the LSP protocol allows for out of order responses, it vaguely stipulates that doing so should not affect the correctness of the responses. I interpret this as meaning that requests responding with text edits (formatting, refactoring, but also some special completions) should not be reordered. Handling messages sequentially by default is a safer stance. The new setup is easier to reason about as a result.
In this PR, we now handle each message in turn. The handlers can still be async (though almost all of them are now synchronous) but they must resolve completely before the next handler can run. This is supported by a "main loop" to which we relay messages from the client (and the Jupyter kernel, see #359) via a channel. Handlers must return an
anyhow::Result
and errors are automatically logged (and propagated as a jsonrpc error response). The loop is very close to the one running in rust-analyzer: https://github.com/rust-lang/rust-analyzer/blob/83ba42043166948db91fcfcfe30e0b7eac10b3d5/crates/rust-analyzer/src/main_loop.rs#L155-L164. This loop owns the world state (see discussion in #358) and dispatches it to handlers.The second part of fixing integrity issues is that the world state has become a pure value. All synchronisation and interior mutability (e.g. through the dash map of documents) has been removed. This means that we can clone the state to create a snapshot and for handlers running on long blocking tasks. If a document update arrives concurrently, it will not affect the integrity of these background tasks.
Long-running handlers on spawned tasks will respond to the client in arbitrary order. In the future we could synchronise these responses if needed, for all tasks or a subset of them.
There is also a separate auxiliary loop for latency sensitive tasks, in particular logging, but also things like diagnostics publication. The main loop is not appropriate for these because each tick might take milliseconds. We don't want log messages to be queued there as the latency would make it harder to understand the causality of events. This loop is also in charge of joining background tasks to immediately log any errors or panics that might have occurred.
Logging is no longer async nor blocking, and no longer requires a reference to the backend. I've added new macros such as
lsp::log_error!()
that can now be used anywhere in the LSP, including in synchronous contexts. I propose that we now consistently use these to log messages from the LSP. This will unclutter the Jupyter kernel log and allow to see the messages in their context (logged requests).I've also added some utils to the
lsp::
module, likelsp::spawn_blocking()
orlsp::publish_diagnostics()
. All of these are intended to be called with thelsp::
prefix.I've removed the workarounds we implemented in
Document::on_did_update()
. These should no longer be necessary and were also making the incorrect assumption that document versions were consecutively increasing, whereas the LSP protocol allows clients to skip versions. The only guarantee is that the versions are monotonically increasing. We still check for this invariant and panic if that's not the case. I think there is no way to keep running with out of sync state. If this panic comes up in practice and is not the result of a synchronisation bug, we could replace the panic with an orderly shutdown to start over.Orderly shutdowns should be easy to implement as both async loops, and all their associated tasks and state, are automatically dropped when the tower-lsp backend is dropped.
I've organised files in the following way:
main_loop.rs
: Implements the main and auxiliary loops.state.rs
: DefinesWorldState
, the source of inputs for LSP handlers.state_handlers.rs
: Implements handlers for state-altering notifications. These require an exclusive reference to the world state.handlers.rs
: Implements read-only handlers. These take LSP inputs and prepare them before calling other entry points of the LSP.