-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quickly completed commands sometimes stall #344
Comments
Hi there, Thanks for the interest. It sounds like the client is doing connection retries after authentication failed rather than the command stalling. Retries can be disabled with
SSH servers also have limits on number of sessions they allow, as well as number of startups they allow at one time. See |
Hmm, it's not the connection retries, I've tried running with enable_debug_logger() and it only shows the single connection.
The stall happens between |
Thank you for the debug output. The library does not do anything between In your authentication/SSH server logs, you should see lines like (this will vary depending on the system)
PAM logs will have similar lines for the opening of a PTY and there are limits on those as well. I expect running the same command with To be closed unless an issue specific to the library can be reproduced. |
Closing looks good. I'm currently thinking this is an issue with the native client somehow tickling a driver issue. I've tried again with the other client and haven't been able to hit the stall, but when running with the native client I see log messages related to a driver we have some known issues with, although since it has nothing to do with networking I'm not sure how. |
Do you mean |
With pssh.clients.ssh.SSHClient and no other changes. |
Thanks for the feedback. That is a difference in behaviour in the two clients, which I consider a bug. The underlying libraries are different, but the purpose of the clients is to normalise that different behaviour. I have a good idea of the cause and it should be able to be handled in the native client as well. The two libraries handle the stdout/stderr streams differently and it looks like that can cause a race condition in the native client when they are combined, as they are when a PTY is used. Thanks for reporting. |
I believe I am encountering the same issue, however
I am not using a pty, and I read stdout and stderr separately. It seems related to setting any kind of timeout in the SSHClient constructor. If timeout is not specified, I do not observe any kind of stalling. However, since I require a timeout (as in my use case, the target can stop responding completely) I dug into the source and I tried hacking my way into setting a timeout for my use case without triggering the issue (as unfortunately I do not have the time to try to create a proper patch that fixes the actual issue). My "solution" is to keep specifying parallel-ssh/pssh/clients/base/single.py Line 549 in 5cea5c1
For example, with GTimeout(seconds=15): .
This successfully works around the stalls for me, while still preserving the general timeout in case the ssh server stops responding. Hope this helps in some way. If it is actually a different problem, I can open a new issue. |
That's very helpful in trying to reproduce this, sounds like it's the same issue, thank you. |
After playing around a bit with this (really nice library btw :) ) I think this issue comes from this line: Changing this to On a sidenote, if you are doing really short calls like touching a file on a local network or so, the 100ms read sleeps are very high, I get a lot higher requests per seconds throughput by setting this line |
Yes, you are right, it is a CPU/latency tradeoff. I have a branch with performance enhancements I am experimenting with and the above is one of the changes it makes. There are some very promising results, but I want to do a lot more testing with real-world environments before merging those changes. Watch out for updates soon. On this issue, thank you for the investigation, have been able to replicate. |
…is used and running short lived commands. Resolves #344.
Thanks for the investigation @SvanT |
I've done some work to convert from paramiko to parallel-ssh, but have hit an issue where I'm sometimes seeing very short commands stall and take several minutes to complete. In this case, we're running cat on a small fio config file. I've been able to reproduce this with a simple script, getting results like this:
I'm running this on an Ubuntu 20.04 system with the target also being an Ubuntu 20.04 system. I have not seen this issue with commands that take longer to run.
Script:
Contents of the red-bdev-rand-rw.fio file:
The text was updated successfully, but these errors were encountered: