Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsh ssh when authenticating eats the first two keypresses #11709

Closed
alexmv opened this issue Apr 4, 2022 · 9 comments · Fixed by #11781
Closed

tsh ssh when authenticating eats the first two keypresses #11709

alexmv opened this issue Apr 4, 2022 · 9 comments · Fixed by #11781
Assignees
Labels
bug tsh tsh - Teleport's command line tool for logging into nodes running Teleport.

Comments

@alexmv
Copy link

alexmv commented Apr 4, 2022

Description

What happened:

When not currently authenticated, either via explicit tsh logout, or auth having expired, the first tsh ssh connection successfully re-auths, but consumes the first two keypresses of input to the host. Here's me connecting, pasting in my password, pressing my Yubikey, and then typing "12345" once I see the remote host's prompt appear:

$ tsh logout
Logged out all users from all proxies.

$ tsh ssh --proxy teleport.example.net --user alexmv -l username hostname.example.net
Enter password for Teleport user alexmv:
Tap any security key or enter a code from a OTP device:
username@hostname:~$ 345

It's not just visible characters; if I wait until the username@hostname:~$ prompt appears, I have to press Ctrl-D three times before it closes the connection.

What you expected to happen: Once the remote prompt appears, all input should get sent to the remote server.

Reproduction Steps

As minimally and precisely as possible, describe step-by-step how to reproduce the problem.

  1. tsh logout
  2. tsh ssh --proxy teleport.example.net --user alexmv -l username --debug hostname.example.net

Server Details

  • Teleport version (run teleport version): Teleport v9.0.2 git:v9.0.2-0-g354b8c037 go1.17.7
  • Server OS (e.g. from /etc/os-release): Ubuntu 20.04.4 LTS (Focal Fossa)
  • Where are you running Teleport? (e.g. AWS, GCP, Dedicated Hardware): AWS

Client Details

  • Tsh version (tsh version): Teleport v9.0.3 git: go1.18
  • Computer OS (e.g. Linux, macOS, Windows): macOS
  • Installed via (e.g. apt, yum, brew, website download) brew
@alexmv alexmv added the bug label Apr 4, 2022
@zmb3 zmb3 added the tsh tsh - Teleport's command line tool for logging into nodes running Teleport. label Apr 4, 2022
@zmb3
Copy link
Collaborator

zmb3 commented Apr 4, 2022

@codingllama I wonder if any of your recent changes to the OTP/MFA prompts resolve this?

@codingllama
Copy link
Contributor

@zmb3, it's probably worth a shot, but this looks like a lingering symptom of the stdin hijacking, which still has to happen. My guess is that the ContextReader thread/goroutine lingers a bit and "eats" a couple key presses before the system shuts it down.

Relevant PRs: #11436 and #11346. I didn't backport them to v9. #11436 would be a clean cherry pick, but #11346 not so much.

@alexmv
Copy link
Author

alexmv commented Apr 5, 2022

Worth noting that this doesn't seem to be a time-based race condition -- it doesn't matter if I wait seconds or minutes after the prompt appears.

Let me know if there's anything else I can do to help debug! The output from --debug isn't particularly interesting, but here it is for completeness (mildly redacted):

DEBU [TSH]       Web proxy port was not set. Attempting to detect port number to use. tsh/tsh.go:2172
DEBU [TSH]       Resolving default proxy port (insecure: false) tsh/resolve_default_addr.go:121
DEBU [TSH]       Trying teleport.example.net:3080... tsh/resolve_default_addr.go:109
DEBU [TSH]       Trying teleport.example.net:443... tsh/resolve_default_addr.go:109
DEBU [TSH]       Address teleport.example.net:443 succeeded. Selected as canonical proxy address tsh/resolve_default_addr.go:195
DEBU [TSH]       Waiting for all in-flight racers to finish tsh/resolve_default_addr.go:144
DEBU [TSH]       Race request failed error:[Get "https://teleport.example.net:3080/webapi/ping": context canceled] tsh/resolve_default_addr.go:83
INFO [CLIENT]    [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.xDIuesmivR/Listeners" client/api.go:3113
DEBU [CLIENT]    Activating relogin on no SSH auth methods loaded, are you logged in?. client/api.go:547
DEBU [CLIENT]    not using loopback pool for remote proxy addr: teleport.example.net:443 client/api.go:3074
DEBU             Attempting GET teleport.example.net:443/webapi/ping webclient/webclient.go:105
Enter password for Teleport user alexmv:
DEBU [CLIENT]    not using loopback pool for remote proxy addr: teleport.example.net:443 client/api.go:3074
DEBU [CLIENT]    HTTPS client init(proxyAddr=teleport.example.net:443, insecure=false) client/weblogin.go:223
DEBU [CLIENT]    WebAuthn: prompting U2F devices with origin "https://teleport.example.net:443" client/mfa.go:122
Tap any security key or enter a code from a OTP device: DEBU             Error interacting with U2F devices error:[
ERROR REPORT:
Original Error: trace.aggregate hid: not permitted
Stack Trace:
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:134 github.com/gravitational/teleport/lib/auth/webauthncli.runOnU2FDevicesOnce
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:76 github.com/gravitational/teleport/lib/auth/webauthncli.RunOnU2FDevices
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/login.go:116 github.com/gravitational/teleport/lib/auth/webauthncli.Login
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/client/mfa.go:123 github.com/gravitational/teleport/lib/client.PromptMFAChallenge.func2
	/usr/local/Cellar/go/1.18/libexec/src/runtime/asm_amd64.s:1571 runtime.goexit
User Message: hid: not permitted] webauthncli/u2f.go:80

DEBU [KEYAGENT]  Adding CA key for teleport.example.net client/keyagent.go:313
DEBU [KEYSTORE]  Adding known host teleport.example.net with proxy teleport.example.net and key: SHA256:7hV1SB+8ndqouW/i/N97XdywlZM0tXDXU2Uv7Q1JI0s client/keystore.go:573
ERRO [KEYSTORE]  open /Users/chmrr/.tsh/keys/teleport.example.net/alexmv: no such file or directory client/keystore.go:269
INFO [KEYAGENT]  Loading SSH key for user "alexmv" and cluster "teleport.example.net". client/keyagent.go:191
INFO [CLIENT]    Connecting to proxy=teleport.example.net:3023 login="root" client/api.go:2322
DEBU             No valid environment variables found. client/proxy.go:116
DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:268
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYAGENT]  "Checking key: [email protected] AAAAredacted\n." client/keyagent.go:365
DEBU [KEYAGENT]  Validated host teleport.example.net:3023. client/keyagent.go:371
INFO [CLIENT]    Successful auth with proxy teleport.example.net:3023. client/api.go:2327
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [KEYAGENT]  Adding CA key for teleport.example.net client/keyagent.go:313
DEBU [KEYSTORE]  Adding known host teleport.example.net with proxy teleport.example.net and key: SHA256:7hV1SB+8ndqouW/i/N97XdywlZM0tXDXU2Uv7Q1JI0s client/keystore.go:573
INFO [CLIENT]    Connecting to proxy=teleport.example.net:3023 login="root" client/api.go:2322
DEBU             No valid environment variables found. client/proxy.go:116
DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:268
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYAGENT]  "Checking key: [email protected] AAAAredacted\n." client/keyagent.go:365
DEBU [KEYAGENT]  Validated host teleport.example.net:3023. client/keyagent.go:371
INFO [CLIENT]    Successful auth with proxy teleport.example.net:3023. client/api.go:2327
DEBU [CLIENT]    Found clusters: [{"name":"teleport.example.net","lastconnected":"2022-04-05T17:19:11.548880912Z","status":"online"}] client/client.go:127
INFO [CLIENT]    Client= connecting to node=hostname.example.net on cluster teleport.example.net client/client.go:1074
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYSTORE]  Reading certificates from path "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-ssh/teleport.example.net-cert.pub". client/keystore.go:330
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [CLIENT]    MFA not required for access. client/client.go:377
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYAGENT]  "Checking key: [email protected] AAAAredacted\n." client/keyagent.go:365
DEBU [KEYAGENT]  Validated host hostname.example.net:0@[email protected]. client/keyagent.go:371
DEBU [CLIENT]    Found clusters: [{"name":"teleport.example.net","lastconnected":"2022-04-05T17:19:12.986670295Z","status":"online"}] client/client.go:127
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [CLIENT]    No Key Agent selected. client/session.go:269
username@hostname:~$ 345

@codingllama
Copy link
Contributor

Worth noting that this doesn't seem to be a time-based race condition -- it doesn't matter if I wait seconds or minutes after the prompt appears.

This is interesting info, thanks for sharing.

I'll try to get a quick repro just to see if the commits mentioned above have an effect on it - if they do, then the path is clear, otherwise we may need to get someone to take a closer look.

@codingllama
Copy link
Contributor

I can confirm repros for Teleport 8 and 9. I haven't gone further down, but any version that has U2F and ContextReader/prompt.Stdin has this issue.

The issue is the stdin hijack from PromptMFAChallenge, which only happens if the user has both OTP and U2F/WebAuthn devices registered. It's also made worse by the PRs above, because of the hijack via term.ReadPassword (I lose about every other keystroke forever).

A simple mitigation, like asking the user to press enter, works for master/v10 because we can then "reclaim" the lost read (it has to be explicitly coded in, it won't just work otherwise). This works on master because ContextReader doesn't loop eagerly anymore. It's not clear to me why earlier versions only eat two chars, but it is an interesting detail.

Broadly, I'd say we are looking at two possible "buckets" of fixes:

  1. UX changes to the tsh login ceremony
  2. Some clever hack that "unblocks" the stdin read

In terms of UX changes, a reasonable change would be to ask the user to choose between OTP or MFA beforehand. This makes it easy to ensure we won't fire a read that may be ignored. A possible optimization for v10/master is to default to MFA if we detect a device capable of doing the ceremony right away (this could be nice regardless, but doesn't eliminate the issue).

As for clever hacks, I haven't found anything good yet. A possibility is running login in a separate process, but that feels like a big bump in complexity for tsh.

@alexmv
Copy link
Author

alexmv commented Apr 5, 2022

It's not clear to me why earlier versions only eat two chars, but it is an interesting detail.

I generally use the U2F key, but if I use the OTP entry, it only eats one keypress.

Probably unrelated, but I notice that if I use the OTP, the light on the Yubikey keeps blinking. And with --debug, it keeps spewing endlessly, even once connected:

DEBU             Error interacting with U2F devices error:[
ERROR REPORT:
Original Error: trace.aggregate hid: not permitted
Stack Trace:
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:134 github.com/gravitational/teleport/lib/auth/webauthncli.runOnU2FDevicesOnce
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:76 github.com/gravitational/teleport/lib/auth/webauthncli.RunOnU2FDevices
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/login.go:116 github.com/gravitational/teleport/lib/auth/webauthncli.Login
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/client/mfa.go:123 github.com/gravitational/teleport/lib/client.PromptMFAChallenge.func2
	/usr/local/Cellar/go/1.18/libexec/src/runtime/asm_amd64.s:1571 runtime.goexit
User Message: hid: not permitted] webauthncli/u2f.go:80

@codingllama
Copy link
Contributor

Yes, the blinking is something else. Using master they stop blinking after a short while for me, even when compiled with the legacy U2F code.

The "hid: not permitted" errors are red herrings, they just happen sometimes (but ideally not after you authenticate). I may take a look at those, but I'll focus on the main issue first.

@codingllama codingllama self-assigned this Apr 5, 2022
codingllama added a commit that referenced this issue Apr 8, 2022
Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
   `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
   enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
   prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Fixes #11709.

* Add UseStrongestAuth flag to PromptMFAChallenge
* Add TeleportClient.UseStrongestAuth and set it true for relogin
* Proper testing
* Address review comments
codingllama added a commit that referenced this issue Apr 8, 2022
Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
   `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
   enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
   prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Fixes #11709.

* Add UseStrongestAuth flag to PromptMFAChallenge
* Add TeleportClient.UseStrongestAuth and set it true for relogin
* Proper testing
* Address review comments
codingllama added a commit that referenced this issue Apr 8, 2022
Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
   `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
   enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
   prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Issue #11709.
codingllama added a commit that referenced this issue Apr 11, 2022
Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
   `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
   enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
   prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Issue #11709.
codingllama added a commit that referenced this issue Apr 11, 2022
…1848)

Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
    `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
    enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
    prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Issue #11709.

* Make relogin attempts use the strongest auth method (#11781)
* Fix conflicts for v8
codingllama added a commit that referenced this issue Apr 11, 2022
Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
   `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
   enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
   prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Fixes #11709.

* Add UseStrongestAuth flag to PromptMFAChallenge
* Add TeleportClient.UseStrongestAuth and set it true for relogin
* Proper testing
* Address review comments
codingllama added a commit that referenced this issue Apr 11, 2022
…1847)

Fixes a potential stdin hijacking bug by making relogin attempts default to a
single MFA method (the strongest available).

The problematic scenario is as follows:

1. User has both OTP and security keys registered
2. "Relogin" is triggered via a tsh command (say,
   `tsh logout; tsh ssh --proxy=example.com llama@myserver`)
3. User is prompted to pick either OTP or security key ("Tap any security key or
   enter a code from a OTP device")
4. An stdin read is fired in the background to read the OTP code (via
   prompt.Stdin)
5. User picks the security method, thus the stdin read is "abandoned"

In most cases this is fine, as the program ends right after. The issue is when a
relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in
this case the stdin hijack causes input to be swallowed.

Forcing a single MFA option avoids the potential stdin hijack, fixing the
problem for all relogin invocations. `tsh login` behavior remains the same.

Note that we have to default to cluster's most secure method _without_ checking
the user devices. The user is not logged in yet, thus the backend cannot reveal
any information about that user.

Issue #11709.

* Make relogin attempts use the strongest auth method (#11781)
* Fix conflicts for v9
@codingllama
Copy link
Contributor

Hey @alexmv, this is now fixed on master and backported to v8 and v9 - whenever we get a new release it should include the fix.

The fix is essentially a UX change: we avoid the bug on tsh ssh by defaulting to the strongest auth method available. For finer control over authentication you can still use tsh login. I hope this makes it better for you.

@alexmv
Copy link
Author

alexmv commented Apr 11, 2022

Thanks for the update, and the quick fix! That seems like a totally fine compromise to me -- looking forward to 9.0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants