Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading 1.10.0 to 1.11.1 breaks connecting to new hosts. #362

Closed
xavierholt opened this issue Jul 21, 2016 · 11 comments
Closed

Upgrading 1.10.0 to 1.11.1 breaks connecting to new hosts. #362

xavierholt opened this issue Jul 21, 2016 · 11 comments

Comments

@xavierholt
Copy link
Contributor

Upgraded SSHKit this morning and found that I could no longer connect to hosts that were not in my ~/.ssh/known_hosts file. I could downgrade and connect then upgrade again, and connecting worked just fine.

[kevin.burk@KBURK-02 argus]$ bundle update | grep sshkit
Using sshkit 1.11.1
[kevin.burk@KBURK-02 argus]$ bundle exec rake k8sdev uruk run cmd="uptime"
INFO   [28c16682] Running /usr/bin/env uptime on 10.52.186.21
DEBUG  [28c16682] Command: /usr/bin/env uptime
INFO   [e4ed48aa] Running /usr/bin/env uptime on 10.52.186.22
DEBUG  [e4ed48aa] Command: /usr/bin/env uptime
INFO   [a8d5f190] Running /usr/bin/env uptime on 10.52.186.23
DEBUG  [a8d5f190] Command: /usr/bin/env uptime
rake aborted!
Net::SSH::HostKeyMismatch: fingerprint 41:31:33:63:66:9d:34:a9:de:35:f2:4d:03:ee:4d:b4 does not match for "10.52.186.21,127.0.0.1"
/Users/kevin.burk/Code/argus/lib/tasks/helpers.rb:40:in `block (3 levels) in <top (required)>'
Tasks: TOP => run
(See full trace by running task with --trace)
[kevin.burk@KBURK-02 argus]$ vim Gemfile
[kevin.burk@KBURK-02 argus]$ bundle update | grep sshkit
Using sshkit 1.10.0 (was 1.11.1)
[kevin.burk@KBURK-02 argus]$ bundle exec rake k8sdev uruk run cmd="uptime"
INFO   [7178e030] Running /usr/bin/env uptime on 10.52.186.23
DEBUG  [7178e030] Command: /usr/bin/env uptime
INFO   [5dfe4842] Running /usr/bin/env uptime on 10.52.186.22
DEBUG  [5dfe4842] Command: /usr/bin/env uptime
INFO   [c8d074b0] Running /usr/bin/env uptime on 10.52.186.21
DEBUG  [c8d074b0] Command: /usr/bin/env uptime
DEBUG  [7178e030]    19:57:56 up  1:37,  0 users,  load average: 0.02, 0.05, 0.05
INFO   [7178e030] Finished in 0.508 seconds with exit status 0 (successful).
DEBUG  [5dfe4842]    19:57:56 up  1:37,  0 users,  load average: 0.08, 0.08, 0.06
INFO   [5dfe4842] Finished in 0.561 seconds with exit status 0 (successful).
DEBUG  [c8d074b0]    19:57:56 up  1:37,  0 users,  load average: 0.00, 0.01, 0.05
INFO   [c8d074b0] Finished in 0.591 seconds with exit status 0 (successful).
[kevin.burk@KBURK-02 argus]$ vim Gemfile
[kevin.burk@KBURK-02 argus]$ bundle update | grep sshkit
Using sshkit 1.11.1 (was 1.10.0)
[kevin.burk@KBURK-02 argus]$ bundle exec rake k8sdev uruk run cmd="uptime"
INFO   [2296d57f] Running /usr/bin/env uptime on 10.52.186.21
DEBUG  [2296d57f] Command: /usr/bin/env uptime
INFO   [597c3be7] Running /usr/bin/env uptime on 10.52.186.22
DEBUG  [597c3be7] Command: /usr/bin/env uptime
INFO   [15b8bf36] Running /usr/bin/env uptime on 10.52.186.23
DEBUG  [15b8bf36] Command: /usr/bin/env uptime
DEBUG  [597c3be7]    19:58:14 up  1:37,  0 users,  load average: 0.12, 0.09, 0.06
INFO   [597c3be7] Finished in 0.476 seconds with exit status 0 (successful).
DEBUG  [2296d57f]    19:58:14 up  1:37,  0 users,  load average: 0.00, 0.01, 0.05
INFO   [2296d57f] Finished in 0.524 seconds with exit status 0 (successful).
DEBUG  [15b8bf36]    19:58:14 up  1:37,  0 users,  load average: 0.01, 0.04, 0.05
INFO   [15b8bf36] Finished in 0.518 seconds with exit status 0 (successful).
@xavierholt
Copy link
Contributor Author

Seems to be related to #330 - turning on known hosts caching in 1.10.0 (and removing those servers from my known hosts) recreated the problem.

@mattbrictson
Copy link
Member

@byroot Do you mind taking a look at this bug, since it seems to be related to the known hosts caching you added in #330? Thanks!

@byroot
Copy link
Contributor

byroot commented Jul 23, 2016

So I wrote this reproduction code: https://gist.github.com/byroot/37d40fd9e2883c40aef00bfc40c0931d

So far things don't add up. Both 1.10.0 and 1.11.1 with known_host caching enabled are totally able to add a new key to the known_hosts file. I even tested to make the file read only, and net-ssh don't care and set it as writeable.

Also Net::SSH::HostKeyMismatch indicate that the fingerprint the server sent doesn't match the one in the known_hosts file, which totally indicate you had a fingerprint for that server.

So maybe I'm missing some info here:

  • What is your net-ssh version?
  • Any chance you could drop your known_host file in a gist?
  • Could we also see your ~/.ssh/config? Or at least tell us if you have any option enabled, I'm thinking of HashKnownHosts particularly.

@xavierholt
Copy link
Contributor Author

It's starting to look like a peculiarity of my setup. Corporate network weirdness demands I go through multiple SSH tunnel hops to get to the machines I care about. The ones I have SSH set up to route with purely ProxyCommand directives work fine. But I've also got Proxifier set up on my Mac, and the connections that have their last hop routed by Proxifier are the ones that are breaking.

The interesting bit is that when SSHKit and Proxifier interact, "hostnames" (IPs?) are reported weirdly. Say I'm trying to get to IP 10.52.186.21 - leaving DNS out of this to avoid complication. I start from a blank known hosts file each time, and SSH mean standard command-line ssh:

  • SSH + ProxyCommand: Host 10.52.186.21 (ECDSA) is added to known hosts.
  • SSHKit 1.10.0 + ProxyCommand: Host 10.52.186.21 (SSH RSA) is added to known hosts.
  • SSHKit 1.11.1 + ProxyCommand: Host 10.52.186.21 (SSH RSA) is added to known hosts.
  • SSH + Proxifier: Host 10.52.186.21 (ECDSA) is added to known hosts.
  • SSHKit 1.10.0 + Proxifier: Host 10.52.186.21,127.0.0.1 (SSH RSA) is added to known hosts.
  • SSHKit 1.11.1 + Proxifier: Host 10.52.186.21,127.0.0.1 (SSH RSA) is added to known hosts.

Actually, that last one surprised me by working - that's the one that always broke for me when I wasn't starting from a blank known hosts file. And sure enough, leaving that entry in known hosts and then trying to SSHKit + Proxifier into the next host in that series (10.52.186.22) recreated the crash.

So it seems like the new code is getting its first hostname match on the 127.0.0.1 part, returning a different server's key, and failing with a key mismatch. Is there a spec on what to do with comma-separated names in ~/.ssh/known_hosts?

And the info for byroot, in case it's still relevant: I'm using Net::SSH 3.2.0. I'm starting from a blank known hosts file. My SSH config looks like this when using Proxifier (over port 39999); to use pure SSH proxying I uncomment those last two lines and turn Proxifier off:

ControlMaster auto
ControlPath ~/.ssh/tmp/%r@%h:%p

Host hop-1
  HostName [redacted]
  ForwardAgent yes

Host hop-2
  HostName [redacted]
  DynamicForward 0.0.0.0:39999
  ProxyCommand ssh -T hop-1 nc %h %p
  ForwardAgent yes

# Host 10.52.186.21
#   ProxyCommand ssh -T hop-2 nc %h %p

@leehambley
Copy link
Member

Xavier, Jean - I just wanted to congratulate you both on some excellent debugging and open source peership, 🎩-off to both of you.

@xavierholt
Copy link
Contributor Author

I believe this is the logic that makes this work in the original Net::SSH implementation:

found = entries.all? { |entry| hostlist.include?(entry) } ||
            known_host_hash?(hostlist, entries, scanner)
next unless found

entries are the individual comma-separated components of the string passed in as host when searching for a host key. hostlist are the individual comma-separated components of the host name on any single line of a known hosts file. hostlist has to be a superset of entries in Net::SSH; I think this is the breaking change.

@xavierholt
Copy link
Contributor Author

Thanks @leehambley! You guys make some some awesome tools - glad I can give back.

@byroot
Copy link
Contributor

byroot commented Jul 27, 2016

Sorry I didn't have much time to tackle this in the last few days. I just submitted #364 which I think solves the issue.

@xavierholt
Copy link
Contributor Author

Thanks @byroot! Busy right now, but I should be able to give it a try this afternoon. Tomorrow at the latest.

@xavierholt
Copy link
Contributor Author

So I lied about how much free time I'd have... But I was able to test the patch this morning, and it was a success. Worked great, with or without Proxifier. Thanks to everyone for helping me out!

@mattbrictson
Copy link
Member

Fixed via #364.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants