-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added port argument for ssh #4117
Conversation
deepspeed/launcher/runner.py
Outdated
@@ -564,6 +566,7 @@ def main(args=None): | |||
else: | |||
cmd = runner.get_cmd(env, active_resources) | |||
|
|||
env["PDSH_SSH_ARGS_APPEND"] += f" -p {args.port} " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be moved to this part of the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree w. this suggestion from @loadams. The deepspeed launcher supports different multi-node backends, pdsh
is the default and most common but it would be good to move the PDSH_SSH_ARGS_APPEND
variable into the PDSHRunner
class inside multinode_runner.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. I've moved this part of the code into the PDSHRunner class as specified, and made the necessary changes.
deepspeed/launcher/runner.py
Outdated
@@ -432,7 +434,7 @@ def main(args=None): | |||
if multi_node_exec and not args.no_ssh_check: | |||
first_host = list(active_resources.keys())[0] | |||
try: | |||
subprocess.check_call(f'ssh -o PasswordAuthentication=no {first_host} hostname', | |||
subprocess.check_call(f'ssh -o PasswordAuthentication=no -p {args.port} {first_host} hostname', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only want to pass -p {args.port}
iff --ssh_port is set. In many cases we have users setting up a ssh config that sets specific port/connection parameters. If we force passing -p 22
as the default then this will break a ssh config host that may have a port that is not 22.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. I've added an if statement to make sure the -p
is only added on when args.ssh_port
is specified.
@microsoft-github-policy-service agree |
Hi @Hiromasa-H - FYI, you'll have to update test_pdsh_runner as well since it needs the changes from here. |
deepspeed/launcher/runner.py
Outdated
@@ -193,6 +193,8 @@ def parse_args(args=None): | |||
"numbers and range. i.e. 1,3-5,7 => [1,3,4,5,7]. When not " | |||
"specified, all cores on system would be used rank binding") | |||
|
|||
parser.add_argument("--ssh_port", type=int, default=22, help="SSH port to use for remote connections") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the later checks for if args.ssh_port is not None
will never trigger since the default here is 22
. Can we set the default to None
?
Hi @Hiromasa-H - apologies for the slow review, wanted to test one thing on our side first, but will review this today. Thanks for your patience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid copying code with subprocess.check_call
(see my suggestion).
Otherwise, LGTM
@loadams Sorry, I did not intend to rush you. Thank you for your taking your time with a thorough review. |
This is the PR for issue #4116.