-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast_forward
sometimes randomly hangs forever
#266
Comments
that's quite weird indeed. There is a slow down that can happen when it tries to cross an epoch boundary, but it shouldn't hang. I played with it, but I'm not able to reproduce this when I changed the following But that's a pretty simple Was this only hanging with a single test/node running? Or was this in combination with a bunch of other tests being ran as well? |
Please, check out this repo https://github.com/andrew-sol/workspaces-reproduction It hangs pretty often. Though it may take several minutes until the test reaches the 'fast_forward` loop point.
If it did not hang at this point (after skipping 3 epochs), it will continue executing. The restart is needed then. It hung 4 times out of 10 when I tested it today. |
I have been able to reproduce the problem using the code from this repo https://github.com/andrew-sol/workspaces-reproduction. Seems that the program is hanging in the near-jsonrpc-client request send. A possible solution that I would like to ask about would be adding the NEAR_RPC_TIMEOUT_SECS which should at least throw an error. https://github.com/near/workspaces-rs/blob/main/README.md?plain=1#L340 |
@Catmanpooh Have you tried to set that timeout? The client hanging on waiting for the response should definitely have a timeout, but also we need to investigate why the server does not respond and whether we should just re-try sending a request. Please, identify which call the client hangs on, and what is the state of the server (sandbox |
@frol how do I check the (sandbox neard process) and how would I add this variable NEAR_RPC_TIMEOUT_SECS? I am making the assumption that these line are the retry for the call. https://github.com/near/workspaces-rs/blob/ea65434c1e8b5424acda4643539b158c1a149aab/workspaces/src/rpc/client.rs#LL402C1-L412C2 |
@Catmanpooh workspaces-rs spawns If you are still stuck, please, schedule a call at https://calendly.com/frol-at-near/30min |
Hi @frol, as discusses on reddit, please assign this to me |
@andrew-sol Thanks for the reproducible example! It hangs indeed, and inspecting the /status page of the near-sandbox (neard / nearcore) while the workspaces-rs test hangs I see that the block height gets stuck (see the block height is 1998) 🤔 I am suspicious that it is an issue on the nearcore side. |
@frol 1st time I run the repro example from @andrew-sol its not hanged, see my screenshot: but after some multiple times of attempts, it really hanged. And in my case, it also hangs at block height 1998. Its quite weird. |
I feel this might be related to near/nearcore#8328 (mentioned in #253). The reproduction script hits the issue on my M1 MacBook consistently, but on Ubuntu x86 server (and @thaodt also has Arch Linux) it hangs only occasionally. |
The occasional hanging in the example happens when Edit: added issue to nearcore |
Hi, when would this issue be resolved? |
There is a chance the next
fast_forward
call will hang forever. It's not reproducible every time, it's just random.I call
fast_forward(100)
in a loop and it happens quite often. Also, I was able to make it hang every single time after 15 iterations by just adding a few "view" calls to a contract far before thefast_forward
loop starts executing, which is weird.Reproduces on both Mac (M1) and Linux (Intel).
Workspaces version: 0.7.0
The text was updated successfully, but these errors were encountered: