-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language server concurrency and functionality upgrades #979
Language server concurrency and functionality upgrades #979
Conversation
bd1bbd5
to
9ecdf02
Compare
6357849
to
2b43c77
Compare
fc74841
to
fcdd0d9
Compare
bd01409
to
e31c0be
Compare
Notes from review with @sbillig and @Y-Nak:
|
This is not correct. What #[salsa::tracked]
// Sender needs to implement `Clone` and `Hash` to be an argument of
// a salsa-tracked function, but I ignore the fact for simplicity.
fn rename(db: &dyn Db, rename_event: Event, tx: Sender ) {
// Perform Reneming.
// ...
// Send an event to the main thread, and the main thread will try to mutate the database.
// This might cause a deadlock.
tx.send(Event::SourceTextChanged)
// ...
} Another possibility for the deadlock is not related to mutability thing, i.e., the deadlock situation might happen even if we only use Please refer to the below links for more information. |
1c2dbb2
to
d6e07fb
Compare
fc567e1
to
a34114e
Compare
TODO: figure out how to load the standard lib into the salsa db and how to include it in diagnostics
f4ad0bf
to
9e296ce
Compare
Result<Option<lsp_types::Hover>, tower_lsp::jsonrpc::Error>, | ||
>, | ||
) { | ||
let db = self.db.snapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to use a snapshot here, right? hover_handler(&self.db, ...
should suffice (unless this is changed so that the task is spawned off to run separately)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is also vestigial, I'd been handling hover in a worker but took it out and forgot to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the PR. I'll probably add more comments later.
@@ -889,7 +889,7 @@ enum DefKind { | |||
Adt(AdtDef), | |||
Trait(TraitDef), | |||
ImplTrait(Implementor), | |||
Impl(HirImpl, TyId), | |||
Impl(HirImpl, #[allow(dead_code)] TyId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impl(HirImpl, #[allow(dead_code)] TyId), | |
Impl(HirImpl, TyId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I upgraded rust with rustup
clippy complained about TyId
being unused. I added this temporarily but wasn't sure how to proceed.
@@ -24,6 +24,7 @@ pub trait DiagnosticVoucher: Send { | |||
fn error_code(&self) -> GlobalErrorCode; | |||
/// Makes a [`CompleteDiagnostic`]. | |||
fn to_complete(&self, db: &dyn SpannedHirDb) -> CompleteDiagnostic; | |||
fn clone_box(&self) -> Box<dyn DiagnosticVoucher>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need clone_box
? It seems you don't use this method in the LSP implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it's the correct way of doing it, but this allowed me to to implement snapshotting (the ParallelDatabase
trait) without the compiler complaining about not being able to clone stuff.
} | ||
|
||
impl salsa::Database for LanguageServerDatabase { | ||
fn salsa_event(&self, _: salsa::Event) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we could get tracing logs when salsa events happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger has been showing some salsa events already even without implementing this, I guess from the other crates? I actually disabled them temporarily here in order to focus on messages from the language-server crate.
It could be cool to have separate outputs for these or some other way of toggling.
} | ||
} | ||
|
||
pub fn run_server() -> Result<()> { | ||
let (connection, io_threads) = Connection::stdio(); | ||
#[language_server_macros::message_channels] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why we need as many channels as LSP request kinds.
It feels that this macro is an abuse of proc macro, decreases readability and maintainability, and also the generated structs already contain unused channels.
Is there any specific reason that a message enum doesn't work in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed the part;
Proof of concept stream-based LSP event handling on the backend side
Intended for dealing with LSP event handler execution order, cancellation, and debouncing of expensive event handlers
Simple use case: aggregating document updates and handling them with a single on_change handler
See ebkalderon/tower-lsp#284 for examples of more complex scenarios that could arise
But the macro does not seem so good...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, I don't think we need one-to-one correspondence between channels and request kinds.
- I would like to avoid
async
in our implementation. LSPs are mostly
computationally bound rather than I/O bound, andasync
adds a lot of
complexity to the API, while also making harder to reason about
execution order. This leads into the second reason, which is...- Any handlers that mutate state should be blocking and run in the
event loop, and the state should be lock-free. This is the approach that
rust-analyzer
uses (also with thelsp-server
/lsp-types
crates as a
framework), and it gives us assurances about data mutation and execution
order.tower-lsp
doesn't support this, which has caused some
issues around data
races and out-of-order handler execution.- In general, I think it makes sense to have tight control over
scheduling and the specifics of our implementation, in exchange for a
slightly higher up-front cost of writing it ourselves. We'll be able to
fine-tune it to our needs and support future LSP features without
depending on an upstream maintainer.
This is the reasoning about why ruff chose lsp-server
instead of tower-lsp
.
It seems reasonable to me.
astral-sh/ruff#10158
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 I see how the macro isn't exactly a simple solution, and the motivation for it was also messy (getting around those tower-lsp LanguageServer
trait limitations). My intention was to reduce the surface area of boilerplate when creating these channels to bridge to the Backend
, but the tradeoff is opacity and complexity. Plus, even with this macro there's still a lot of boilerplate in setting up all the stream handlers.
As fun as it was to write I agree that it'd be good not to have it if possible.
As you mentioned on discord one possibility could be to use the support-mutable-methods branch, at least until they figure out a more permanent solution. I like that suggestion a lot, it'd seriously simplify things. That way there'd be no requirement to make a channel/stream just to handle an LSP event directly. We could still get the flexibility and control of the async Stream
ecosystem as needed and the lock-free state but without needing to set up all those channels/streams by default. I also appreciate that the tower-lsp
maintainers seem to be taking this issue seriously. Another possibility is this async-lsp crate which seems quite solid, even though the API is less readable.
As for the decision of lsp-server
vs tower-lsp
: I really like tower-lsp
's high level API and the built-in task management inherent to rust async. Using lsp-server
would require us to build some kind of task manager regardless, which to me seems like an inherently complex thing to reason about and maintain.
The thing that keeps bothering me is how LSP is an async protocol by design (1, 2) in nuanced ways that I'm not convinced can be adequately dealt with just by faithfully processing events in the order they're received... LSP events don't always have a well defined total order. Processing some events in order is essential (e.g. didChange
events), but there are situations when a single user action can trigger multiple loosely coupled LSP events more or less concurrently. For example, renaming a file can trigger:
willRename
(a request that the client waits on a response for before renaming)
Followed by:
didRename
(after the client renames)didChangeWatchedFiles
(from the filesystem watcher with two changes, a create and delete, seems to be delayed by ~1 second on my computer)didOpen
(when the file with the new name is opened)didClose
(when the file of the previous name is closed)
Great care is needed to ensure that the handlers for each of these does exactly what it needs to do without overlap or conflict, especially in cases where state gets mutated.
I'm anticipating edge cases that will be tricky to reason about imperatively, where we may want to condition event handling on the presence or absence of several other recent or pending events, perhaps over a given debounce period. Ideally we won't need to do this, but it doesn't seem unlikely, especially when we get into code actions or refactoring features.
The idea of trying to manage these complexities imperatively/synchronously with a custom task manager makes me nervous, regardless of how faithfully we adhere to the order these events arrive in.
In spite of the current rigidity of tower-lsp
and the potential complexity of choosing rust async, I do think the async ecosystem (especially streams) could bring a lot of flexibility in managing execution order and in making sense of concurrent events, if we use it carefully.
A few more notes about it before letting that proc macro go (RIP 🪦):
- It only makes channels for trait methods that are implemented, so only for the those that are used
- I thought about using an enum or doing some kind of "multiplexing" over a single channel but my the amount of boilerplate code it would take to define the enums and join the data and split the channel separate streams seemed comparable to the amount it would take to just have a single channel per LSP event type. Which either way felt like too much.
- The macro was inspired by the one used to generate the routes in
tower-lsp
(here) but I admit that it got quite a bit more complex, a work of art
tokio::select! { | ||
// setup logging | ||
_ = handle_log_messages(rx, server.client.clone()) => {}, | ||
// start the server | ||
_ = tower_lsp::Server::new(stdin, stdout, socket) | ||
.serve(service) => {} | ||
// backend | ||
_ = functionality::streams::setup_streams(&mut backend, message_receivers) => {} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you need tokio::select
here.
Is there any reason that you can't do the one below?
// Prepare channels for `Server` <-> `Backend` communication.
let (tx, rx) = make_channels(...);
let backend = Backend::new(rx, ...);
// Run backend in a child thread.
tokio::spawn(backend.run());
// Make `Server`, `Backend` and the thread for logging is managed by
// this server via channels for e.g., graceful shutdown.
let server = ...
// Run server.
server.serve(service).await;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a review for the implementation of goto.
An additional note for future reference:
For optimization purposes, it would be beneficial to narrow the possible range before collecting actual paths when we know the passed cursor is included in ItemKind::Body
.
This is because:
- The Body contains numerous paths (even local variables are defined as paths in HIR).
Hir
maintainsExpr <=> Span
andStmt <=> Span
mapping
Therefore, it would be more efficient to narrow down the possible range before collecting paths. Implementing find_enclosing_stmt
and find_enclosing_expr
would be required for this((I think these are necessary in the near future either way).
This narrowing-down-range feature should probably be generalized, e.g.,
/// Find the closest HIR element that includes the cursor,
/// by traversing the elem-span hierarchy.
fn find_closest_hir_elem<T: HirElem>(db: &dyn LanguageServerDb, cursor: Cursor) -> Option<T> {
...
}
fn visit_ident( | ||
&mut self, | ||
ctxt: &mut VisitorCtxt<'_, hir::visitor::prelude::LazySpanAtom>, | ||
ident: hir::hir_def::IdentId, | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PathSpanCollector::visit_item
is called; this means all idents are collected as a path segment even if the ident is not a segment(e.g. if the item is Func,
then func name is also collected).
It's necessary to collect segments manually by iterating them in visit_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see. That will be more straight to the point.
pub(super) async fn handle_deleted( | ||
&mut self, | ||
params: lsp_types::FileEvent, | ||
tx_needs_diagnostics: tokio::sync::mpsc::UnboundedSender<String>, | ||
) { | ||
let path = params.uri.to_file_path().unwrap(); | ||
info!("file deleted: {:?}", path); | ||
let path = path.to_str().unwrap(); | ||
let _ = self | ||
.workspace | ||
.remove_input_for_file_path(&mut self.db, path); | ||
let _ = tx_needs_diagnostics.send(path.to_string()); | ||
} | ||
|
||
pub(super) async fn handle_change( | ||
&mut self, | ||
doc: TextDocumentItem, | ||
tx_needs_diagnostics: tokio::sync::mpsc::UnboundedSender<String>, | ||
) { | ||
info!("change detected: {:?}", doc.uri); | ||
let path_buf = doc.uri.to_file_path().unwrap(); | ||
let path = path_buf.to_str().unwrap(); | ||
let contents = Some(doc.text); | ||
if let Some(contents) = contents { | ||
let input = self | ||
.workspace | ||
.touch_input_for_file_path(&mut self.db, path) | ||
.unwrap(); | ||
let _ = input.sync_from_text(&mut self.db, contents); | ||
} | ||
let _ = tx_needs_diagnostics.send(path.to_string()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Associated change for the removed of the stream forking)
pub(super) async fn handle_deleted( | |
&mut self, | |
params: lsp_types::FileEvent, | |
tx_needs_diagnostics: tokio::sync::mpsc::UnboundedSender<String>, | |
) { | |
let path = params.uri.to_file_path().unwrap(); | |
info!("file deleted: {:?}", path); | |
let path = path.to_str().unwrap(); | |
let _ = self | |
.workspace | |
.remove_input_for_file_path(&mut self.db, path); | |
let _ = tx_needs_diagnostics.send(path.to_string()); | |
} | |
pub(super) async fn handle_change( | |
&mut self, | |
doc: TextDocumentItem, | |
tx_needs_diagnostics: tokio::sync::mpsc::UnboundedSender<String>, | |
) { | |
info!("change detected: {:?}", doc.uri); | |
let path_buf = doc.uri.to_file_path().unwrap(); | |
let path = path_buf.to_str().unwrap(); | |
let contents = Some(doc.text); | |
if let Some(contents) = contents { | |
let input = self | |
.workspace | |
.touch_input_for_file_path(&mut self.db, path) | |
.unwrap(); | |
let _ = input.sync_from_text(&mut self.db, contents); | |
} | |
let _ = tx_needs_diagnostics.send(path.to_string()); | |
} | |
pub(super) async fn handle_change( | |
&mut self, | |
change: FileChange, | |
tx_needs_diagnostics: tokio::sync::mpsc::UnboundedSender<String>, | |
) { | |
let path = change.uri.to_string(); | |
match change.kind { | |
ChangeKind::Open(contents) => { | |
info!("file opened: {:?}", &path); | |
self.update_input_file_text(&path, contents); | |
} | |
ChangeKind::Create => { | |
info!("file created: {:?}", &path); | |
let contents = tokio::fs::read_to_string(&path).await.unwrap(); | |
self.update_input_file_text(&path, contents) | |
} | |
ChangeKind::Edit(contents) => { | |
info!("file edited: {:?}", &path); | |
let contents = if let Some(text) = contents { | |
text | |
} else { | |
tokio::fs::read_to_string(&path).await.unwrap() | |
}; | |
self.update_input_file_text(&path, contents); | |
} | |
ChangeKind::Delete => { | |
info!("file deleted: {:?}", path); | |
self.workspace | |
.remove_input_for_file_path(&mut self.db, &path) | |
.unwrap(); | |
} | |
} | |
tx_needs_diagnostics.send(path).unwrap(); | |
} | |
fn update_input_file_text(&mut self, path: &str, contents: String) { | |
let input = self | |
.workspace | |
.touch_input_for_file_path(&mut self.db, path) | |
.unwrap(); | |
let _ = input.sync_from_text(&mut self.db, contents); | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used tokio::fs::read_to_string
here, but that might introduce the possibility of changes being handled out of order.
I share Yoshi's skepticism of using async because of these tricky situations, but I remain open minded. If we did drop async, I think the "streamy" essence of the code could remain, we'd just eg explicitly spawn a thread (or "actor") that sequentially handles all change events that arrive on a channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried it yet but this looks great. Can you help me understand how these changes might get handled out of order? My thought is that by merging all those change events into a single stream we will enforce that they're processed serially in the order they were merged, at least relative to one another. The tokio::fs::read_to_string.await
will yield to other non-change-stream events but block further change handling until finished.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(By the way, I shared my complex thoughts on the async stuff here: #979 (comment))
Co-authored-by: Yoshitomo Nakanishi <[email protected]> Co-authored-by: Sean Billig <[email protected]>
Closing in favor of #1022 |
This PR introduces significant changes to the language server, focusing on improving concurrency and LSP functionality. The language server has been rewritten based on
tower-lsp
to support concurrent execution of tasks.Here's a detailed overview of the key components and their interconnections:
server
moduleServer
: This struct handles the I/O of the Language Server Protocol (LSP). It receives requests and notifications from the client and sends responses and notifications back to the client. TheServer
struct uses thetower-lsp
crate, which provides a clean and easy-to-use API for implementing LSP servers. Because state is omitted from theServer
struct, it functions essentially as an async task manager for LSP events.MessageSenders
andMessageReceivers
: These structs are generated by a procedural macro inlanguage-server-macro
using information from thetower_lsp::LanguageServer
trait implementation. They are used for communication between theServer
andBackend
via tokio MPSC and oneshot channels.backend
moduleBackend
: This struct has exclusive ownership of language server state, which is stored inLanguageServerDatabase
andWorkspace
members, and mediates all state modifications. Struct methods defined onBackend
are used to handle stream events set up in thestreams
module.Workspace
: This struct has been refined to avoid excessive reliance on mutable references. It represents the current state of the user's workspace, including the open files and their contents. It functions as an index of salsa inputs to be used withLanguageServerDatabase
.LanguageServerDatabase
: This struct provides access to functionality in compiler crates. It implements Salsa'sParallelDatabase
trait to enable parallel salsa queries via theSnapshot
mechanism.functionality
modulefunctionality::streams
: This module sets up streams for handling LSP events on the backend side. It uses thetokio::select!
macro to concurrently listen to multiple streams and handle events as they arrive. Streams allow for declarative control over the order in which LSP events are processed and concise expression of additional reactive behavior. I'm especially happy with the chunked+debounced diagnostics stream and its handler, to give an example.functionality::handlers
: This module contains functions for handling different LSP events. Each function corresponds to a specific LSP event and contains the logic for processing that event and producing an appropriate response.functionality::handlers::Backend::handle_diagnostics
functionality
support implementations of various LSP functionality, including improved go-to definition, hover info and diagnostics functionalityThis architecture is designed to handle concurrent execution of tasks efficiently and to provide a clean separation between the I/O handling (
Server
) and the actual processing of LSP events (Backend
). This separation makes concurrency easier to reason about and allows for parallel execution of expensive tasks. The use of tokio channels and streams allows for efficient handling of concurrent tasks and provides control over the order of task execution.Changes and features
tower-lsp
crateServer
) and language server state (Backend
) via tokio channelson_change
handlerMore changes, following the initial review
select!
loop for mutating handlers and multiple spawns for read-only handlerstracing
for logging to ensure logs are correctly orderedconsole_subscriber
for tokio console supportBackend
exclusive ownership of workspace and database state; no locks neededFunctionality upgrades
Not urgent/maybe
Initial impressions (previously)
Update: the uncertainties below have been addressed by a channel/stream augmentation to the tower-lsp implementation
Managing request handler execution order
tower-lsp
doesn't really provide a way of managing the order of handler execution. Neither doeslsp-server
for that matter. What exactly should this execution dependency graph look like in various cases? It's hard to foresee what this will look like as more and more LSP functionality gets implemented, but it's clear that we need a way to control the order of task execution somehow or another.As a simple example, if I were to rename a file in vscode it could trigger multiple handlers (
did_open
,did_close
,watched_files_did_change
) concurrently.How can we ensure that this pair of LSP events gets handled appropriately? We're not just opening any file, we're opening a renamed file and we need to ensure that the workspace cache is updated to reflect this before executing diagnostics. The
watched_files_did_change
handler ends up being redundant in this case if it gets executed after thedid_open
handler, but in the case where a file gets directly renamed outside of the LSP client, it's still important.In this example, it's not a big deal to check for deleted (renamed) files in both handlers, since those checks are relatively short lived given how infrequently they'd be executed. But it'll be important to ensure that e.g. diagnostics are run carefully, that the salsa inputs are set up correctly before running diagnostics or other complex tasks. It's also important to ensure that expensive tasks aren't triggered redundantly in parallel.
Shared state and deadlocks
Related to the issue of concurrency... How do we manage sharing of the salsa database and inputs cache in a concurrent environment?
In this prototype the salsa db and workspace cache are currently shared via
std::sync::Mutex
locks and the language server client is shared viatokio::sync::Mutex
locks. This works just fine but it requires care not to cause deadlocks. The tokio shared state docs were very helpful for me in understanding this better.Useful info
tower-lsp
andlsp-server
tower-lsp