-
Notifications
You must be signed in to change notification settings - Fork 255
Conversation
ba80d25
to
8d33e48
Compare
Thread sleep is inelegant, but is it a big problem in tests? You wait a bit for what you expect and fail if it never arrives, or arrives wrongly. The absolute worst case is the tests can take a while to fail, and the current timeout seems a little high. But I'm not sure it shows that we lack good control of concurrency in RLS. Here I'd be tempted to define "good" as sufficient. Shouldn't we add this ability only when we have a production code need for it? The other issue is I think we'd generally want to move the feature tests out into integration, or whatever you want to call it, style tests. Tests that start rls as a process and talk to it like a lsp client. If that's still true then in-process hooks won't be of any use to the tests. |
It certainly is a problem: in #918 (comment), quite some cycles were spent on figuring out that test fails on CI but not locally due to the timing issue, and not because this is some windows specific behavior. In other projects I've worked on unrestricted concurrency also posed challenges to writing robust and fast tests. In general, I strongly believe that logical clocks and causality should be used instead of timeouts if possible, and that APIs should be designed in such a way that using logical clocks is possible (i.e, if something spawns a concurrent thread of execution, it should return a handle which can be used to wait for that thread to finish). I don't have concrete strong arguments here except that it,... feels right I guess? :)
I agree that we should add tests that literally spawn the RLS process and communicate with it via standard streams: currently this layer seems untested? However, while I in general a strong believer in integration tests, I personally don't think that moving all feature tests to integration testing will be a good idea, for two reasons:
Long-term, I think we should split RLS into the "rls-library" layer, which is more or less oblivious of the details of the LSP and "rls-lsp binary", which wraps the API of the library into a specific protocol. That way, we'll have a smallish number of integration tests for binary, which spawn process and check that output indeed conforms to LSP, and most feature tests would be done on the library level. |
Well I do agree that eliminating the messaging timeout issue in the tests would be really nice. I'm just not sure keeping track of the jobs with this much extra prod-code is worth it. For me the test code is there to make the prod code better, this feels like the other way around. On the other hand perhaps it will serve the greater good by allowing better testing, and I do feel rls test code isn't really good enough at the moment. |
Agree that this brings-in a significant amount of "clever" code with tricks like uninhabited enums and drop-bombs, which is a big drawback. A simpler alternative is possible: instead of keeping track of jobs, we can just drain all the work queues explicitly. That is, in I think that explicit tracking of all concurrent tasks is better, long term, than tracking of potential sources of concurrency, although it's more code:
|
@nrc curious what do you think about the problem in general and this specific approach? Is it something reasonable, or should we just stick with |
Sorry for the delay. My initial opinions (I haven't looked at the code yet):
I do agree with this sentiment too - there is something to be said for 'if it ain't broke, don't fix it'. |
src/actions/notifications.rs
Outdated
@@ -73,7 +74,7 @@ impl BlockingNotificationAction for DidOpenTextDocument { | |||
} | |||
|
|||
impl BlockingNotificationAction for DidChangeTextDocument { | |||
fn handle<O: Output>(params: Self::Params, ctx: &mut InitActionContext, out: O) -> Result<(), ()> { | |||
fn handle<O: Output>(params: Self::Params, jobs: &mut Jobs, ctx: &mut InitActionContext, out: O) -> Result<(), ()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could jobs
be part of the context here, rather than adding a variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved jobs to the Context. Initially I've avoided that because Context
is clone, and so Arc<Mutex>
is required. And having explicit &mut Jobs
argument and -> ConcurrentJob
return types gives better idea about when concurrency happens. However, that is indeed a lot of extra typing compared to just stuffing jobs into the ctx.
src/test/lens.rs
Outdated
expect_messages( | ||
results.clone(), | ||
&[ | ||
ExpectedMessage::new(Some(0)).expect_contains(r#""codeLensProvider":{"resolveProvider":false}"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably check for this message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Restored other messages as well to try to debug #925
src/concurrency.rs
Outdated
|
||
impl Drop for ConcurrentJob { | ||
fn drop(&mut self) { | ||
if self.is_abandoned || self.is_completed() || thread::panicking() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we panic on abandoned jobs? Seems like a bad thing if Jobs
is dropped without completing the jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, indeed, for normal termination process it is reasonable to require that all the jobs are awaited for!
Still not sure what's the best choice for abnormal termination (some thoughts in af2534f), but probably just abandoning jobs is fine, as things have gone south anyway.
8d33e48
to
ad66472
Compare
I think we should do this. I'm pretty excited to improve our flaky tests. Once we land it I'll publish a new version and that should show up any bugs pretty soon. @matklad could you rebase and address the review comments, then I'll merge. Thanks! |
@nrc rebased before your comment :) Fixing the comments now! |
ee1b647
to
4962bb7
Compare
I think I've addressed all comments. |
CI should be fixed by #944 |
4962bb7
to
dd1f805
Compare
Rebased! |
bors try We've tried to setup bors here, let's see if is working... |
🔒 Permission denied Existing reviewers: click here to make matklad a reviewer |
I get a bors try cc @nrc |
🔒 Permission denied Existing reviewers: click here to make Xanewok a reviewer |
Sorry about that. I updated my bors-ng instance's code and synced reviewer+members with people on this repo who have push permission. Try again? |
bors try (I don't need the @?) |
tryBuild succeeded |
@nrc yep, this the another bors, it does’t need an @. And you need bors r+ to do the merge. I’ve tried try just to check if bors is listening :-) |
bors r+ |
923: Managed concurrency r=nrc a=matklad Hi! In tests, we currently do synchronization via `thread::sleep`. It is rather problematic by itself, but it also shows that we don't have a very good control of concurrency inside RLS :( I think this is a big problem: RLS is going to be a highly concurrent application, so it's better to establish reliable concurrency practices from the start. I don't have too much experience with designing concurrent though, so I am not sure if the proposed approach of dealing with the problem is not insane :D In my experience, a lot of problems with managing concurrency stem from "fire an forget" approach: if a function schedules some work to be executed outside the dynamic extent of the function call itself (for example, by spawning a thread, or by submitting the work to some queue), and doesn't return a handle to the yet-to-be-computed result, then such function will be impossible to test without resorting to busy-waiting. Current PR proposes to use a future-like `ConcurrentJob` object to keep track of tasks. Any background operation returns this object, which can be used to wait for that operation to finish. Dropping `ConcurrentJob` without waiting for the result in general results in a panic, to protect from accidentally forgetting about a background operation. The end result is that `LsService` now has a `wait_for_background_jobs` which can be used to make sure that all background ops are finished. Co-authored-by: Aleksey Kladov <[email protected]>
Build failed |
Ok, one more round of clippy bumping! |
RLS is going to be a highly concurrent application, with three main concurrency sources: * user input/requests from the client editor * changes from the file system * internal parallelisation of CPU-heavy tasks Because developing, debugging and testing concurrent applications is hard, it makes sense to try to establish good concurrency patterns from the start. One concurrency anti-pattern is "fire & forget" spawning of background tasks. This is problematic: without the ability to check if the task is finished, you have to resort to busy-waiting in tests. `concurrency` module introduces a `ConcurrentJob` object, which is a handle to background job, which can be used to check if the job is running and to wait for it to finish. The idea is that any function, which schedules some job off the current thread, should return a `ConcurrentJob`. Jobs are stored inside the `Jobs` table, which gives a nice overview of all currently active tasks and can be used to wait for all active tasks to finish. All `ConcurrentJob`s must be stored in a `Jobs` table, it's and error to drop a job on the floor.
dd1f805
to
833a2a9
Compare
Resolved conflict, squashing changes to a single commit in the process. |
Nice! Didn’t mean to add you extra work with that 2018 edition commit... 😅 |
Hi!
In tests, we currently do synchronization via
thread::sleep
. It is rather problematic by itself, but it also shows that we don't have a very good control of concurrency inside RLS :(I think this is a big problem: RLS is going to be a highly concurrent application, so it's better to establish reliable concurrency practices from the start. I don't have too much experience with designing concurrent though, so I am not sure if the proposed approach of dealing with the problem is not insane :D
In my experience, a lot of problems with managing concurrency stem from "fire an forget" approach: if a function schedules some work to be executed outside the dynamic extent of the function call itself (for example, by spawning a thread, or by submitting the work to some queue), and doesn't return a handle to the yet-to-be-computed result, then such function will be impossible to test without resorting to busy-waiting.
Current PR proposes to use a future-like
ConcurrentJob
object to keep track of tasks. Any background operation returns this object, which can be used to wait for that operation to finish. DroppingConcurrentJob
without waiting for the result in general results in a panic, to protect from accidentally forgetting about a background operation.The end result is that
LsService
now has await_for_background_jobs
which can be used to make sure that all background ops are finished.