-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI failure after upgrading actix #3925
Comments
Oh dear.
That's ... some C++ code? |
@bowenwang1996 could you link a specific build where this happens? I've downloaded a couple of recent build logs, but didn't find the error |
@matklad sorry I somehow forgot to include the link, but here it is https://buildkite.com/nearprotocol/nearcore/builds/5548#c1917194-dff1-423d-ba38-930896e2046c |
Might be facebook/rocksdb#649? Local repro: |
Ok, I think I've diagnosed the root cause: we didn't properly wait for diff --git a/chain/network/src/peer_manager.rs b/chain/network/src/peer_manager.rs
index efb35cd7..2fb16197 100644
--- a/chain/network/src/peer_manager.rs
+++ b/chain/network/src/peer_manager.rs
@@ -1233,6 +1233,12 @@ impl Actor for PeerManagerActor {
Running::Stop
}
+
+ fn stopped(&mut self, ctx: &mut Self::Context) {
+ eprintln!("WILL STOP PEER MANAGER");
+ std::thread::sleep_ms(250);
+ eprintln!("DID STOP PEER MANAGER");
+ }
}
impl Handler<NetworkRequests> for PeerManagerActor { and run
I get the following output:
That is, at the end of the test Why does it manifests as that pure virtual function message? That's because Why does this only manifests with a new actix? It doesn't actually! I managed to
while hammering my CPU with rust compilation to create more erratic behavior. It The problem here I think is slightly bigger than just weird message in tests:
|
I've looked into fixing this by joining all the threads, and seems like this is To recap, the problem is that Actix runtime leaks threads. Any time Here's a short demo: I believe this is going to be hard to fix, as the actix actor framework isn't Note that the When I tried to write a simple fix here manually joining the Arbiters, I got I am not sure how to proceed further, as it feels like this should touch the It might also be that everything is fine, and I am just overly-sensitive to |
I think a practical concern for us is that this causes CI to fail randomly from time to time, which is fairly annoying. Did you think it could be fixed inside actix @matklad? Should we submit an issue there? |
Submitted a "fix" in #3958 and opened an issue aginst the actix repository. My current thinking is that actix actor framework (not the actix-web http framework) isn't really needed, and we are better of by using something like https://ryhl.io/blog/actors-with-tokio/ |
Co-authored-by: Vlad Frolov <[email protected]>
We occasionally see CI failure like the following after #3869:
It looks like that even though the tests passed, something went wrong afterwards.
The text was updated successfully, but these errors were encountered: