`tsh ssh` when authenticating eats the first two keypresses #11709

alexmv · 2022-04-04T17:34:10Z

Description

What happened:

When not currently authenticated, either via explicit tsh logout, or auth having expired, the first tsh ssh connection successfully re-auths, but consumes the first two keypresses of input to the host. Here's me connecting, pasting in my password, pressing my Yubikey, and then typing "12345" once I see the remote host's prompt appear:

$ tsh logout
Logged out all users from all proxies.

$ tsh ssh --proxy teleport.example.net --user alexmv -l username hostname.example.net
Enter password for Teleport user alexmv:
Tap any security key or enter a code from a OTP device:
username@hostname:~$ 345

It's not just visible characters; if I wait until the username@hostname:~$ prompt appears, I have to press Ctrl-D three times before it closes the connection.

What you expected to happen: Once the remote prompt appears, all input should get sent to the remote server.

Reproduction Steps

As minimally and precisely as possible, describe step-by-step how to reproduce the problem.

tsh logout
tsh ssh --proxy teleport.example.net --user alexmv -l username --debug hostname.example.net

Server Details

Teleport version (run teleport version): Teleport v9.0.2 git:v9.0.2-0-g354b8c037 go1.17.7
Server OS (e.g. from /etc/os-release): Ubuntu 20.04.4 LTS (Focal Fossa)
Where are you running Teleport? (e.g. AWS, GCP, Dedicated Hardware): AWS

Client Details

Tsh version (tsh version): Teleport v9.0.3 git: go1.18
Computer OS (e.g. Linux, macOS, Windows): macOS
Installed via (e.g. apt, yum, brew, website download) brew

The text was updated successfully, but these errors were encountered:

zmb3 · 2022-04-04T22:56:31Z

@codingllama I wonder if any of your recent changes to the OTP/MFA prompts resolve this?

codingllama · 2022-04-05T17:11:17Z

@zmb3, it's probably worth a shot, but this looks like a lingering symptom of the stdin hijacking, which still has to happen. My guess is that the ContextReader thread/goroutine lingers a bit and "eats" a couple key presses before the system shuts it down.

Relevant PRs: #11436 and #11346. I didn't backport them to v9. #11436 would be a clean cherry pick, but #11346 not so much.

alexmv · 2022-04-05T17:27:13Z

Worth noting that this doesn't seem to be a time-based race condition -- it doesn't matter if I wait seconds or minutes after the prompt appears.

Let me know if there's anything else I can do to help debug! The output from --debug isn't particularly interesting, but here it is for completeness (mildly redacted):

DEBU [TSH]       Web proxy port was not set. Attempting to detect port number to use. tsh/tsh.go:2172
DEBU [TSH]       Resolving default proxy port (insecure: false) tsh/resolve_default_addr.go:121
DEBU [TSH]       Trying teleport.example.net:3080... tsh/resolve_default_addr.go:109
DEBU [TSH]       Trying teleport.example.net:443... tsh/resolve_default_addr.go:109
DEBU [TSH]       Address teleport.example.net:443 succeeded. Selected as canonical proxy address tsh/resolve_default_addr.go:195
DEBU [TSH]       Waiting for all in-flight racers to finish tsh/resolve_default_addr.go:144
DEBU [TSH]       Race request failed error:[Get "https://teleport.example.net:3080/webapi/ping": context canceled] tsh/resolve_default_addr.go:83
INFO [CLIENT]    [KEY AGENT] Connected to the system agent: "/private/tmp/com.apple.launchd.xDIuesmivR/Listeners" client/api.go:3113
DEBU [CLIENT]    Activating relogin on no SSH auth methods loaded, are you logged in?. client/api.go:547
DEBU [CLIENT]    not using loopback pool for remote proxy addr: teleport.example.net:443 client/api.go:3074
DEBU             Attempting GET teleport.example.net:443/webapi/ping webclient/webclient.go:105
Enter password for Teleport user alexmv:
DEBU [CLIENT]    not using loopback pool for remote proxy addr: teleport.example.net:443 client/api.go:3074
DEBU [CLIENT]    HTTPS client init(proxyAddr=teleport.example.net:443, insecure=false) client/weblogin.go:223
DEBU [CLIENT]    WebAuthn: prompting U2F devices with origin "https://teleport.example.net:443" client/mfa.go:122
Tap any security key or enter a code from a OTP device: DEBU             Error interacting with U2F devices error:[
ERROR REPORT:
Original Error: trace.aggregate hid: not permitted
Stack Trace:
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:134 github.com/gravitational/teleport/lib/auth/webauthncli.runOnU2FDevicesOnce
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:76 github.com/gravitational/teleport/lib/auth/webauthncli.RunOnU2FDevices
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/login.go:116 github.com/gravitational/teleport/lib/auth/webauthncli.Login
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/client/mfa.go:123 github.com/gravitational/teleport/lib/client.PromptMFAChallenge.func2
	/usr/local/Cellar/go/1.18/libexec/src/runtime/asm_amd64.s:1571 runtime.goexit
User Message: hid: not permitted] webauthncli/u2f.go:80

DEBU [KEYAGENT]  Adding CA key for teleport.example.net client/keyagent.go:313
DEBU [KEYSTORE]  Adding known host teleport.example.net with proxy teleport.example.net and key: SHA256:7hV1SB+8ndqouW/i/N97XdywlZM0tXDXU2Uv7Q1JI0s client/keystore.go:573
ERRO [KEYSTORE]  open /Users/chmrr/.tsh/keys/teleport.example.net/alexmv: no such file or directory client/keystore.go:269
INFO [KEYAGENT]  Loading SSH key for user "alexmv" and cluster "teleport.example.net". client/keyagent.go:191
INFO [CLIENT]    Connecting to proxy=teleport.example.net:3023 login="root" client/api.go:2322
DEBU             No valid environment variables found. client/proxy.go:116
DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:268
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYAGENT]  "Checking key: [email protected] AAAAredacted\n." client/keyagent.go:365
DEBU [KEYAGENT]  Validated host teleport.example.net:3023. client/keyagent.go:371
INFO [CLIENT]    Successful auth with proxy teleport.example.net:3023. client/api.go:2327
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [KEYAGENT]  Adding CA key for teleport.example.net client/keyagent.go:313
DEBU [KEYSTORE]  Adding known host teleport.example.net with proxy teleport.example.net and key: SHA256:7hV1SB+8ndqouW/i/N97XdywlZM0tXDXU2Uv7Q1JI0s client/keystore.go:573
INFO [CLIENT]    Connecting to proxy=teleport.example.net:3023 login="root" client/api.go:2322
DEBU             No valid environment variables found. client/proxy.go:116
DEBU [HTTP:PROX] No proxy set in environment, returning direct dialer. proxy/proxy.go:268
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYAGENT]  "Checking key: [email protected] AAAAredacted\n." client/keyagent.go:365
DEBU [KEYAGENT]  Validated host teleport.example.net:3023. client/keyagent.go:371
INFO [CLIENT]    Successful auth with proxy teleport.example.net:3023. client/api.go:2327
DEBU [CLIENT]    Found clusters: [{"name":"teleport.example.net","lastconnected":"2022-04-05T17:19:11.548880912Z","status":"online"}] client/client.go:127
INFO [CLIENT]    Client= connecting to node=hostname.example.net on cluster teleport.example.net client/client.go:1074
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYSTORE]  Reading certificates from path "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-ssh/teleport.example.net-cert.pub". client/keystore.go:330
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [CLIENT]    MFA not required for access. client/client.go:377
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [KEYAGENT]  "Checking key: [email protected] AAAAredacted\n." client/keyagent.go:365
DEBU [KEYAGENT]  Validated host hostname.example.net:0@[email protected]. client/keyagent.go:371
DEBU [CLIENT]    Found clusters: [{"name":"teleport.example.net","lastconnected":"2022-04-05T17:19:12.986670295Z","status":"online"}] client/client.go:127
DEBU [KEYSTORE]  Returning Teleport TLS certificate "/Users/chmrr/.tsh/keys/teleport.example.net/alexmv-x509.pem" valid until "2022-04-06 05:19:09 +0000 UTC". client/keystore.go:307
DEBU [CLIENT]    Client  is connecting to auth server on cluster "teleport.example.net". client/client.go:969
DEBU [CLIENT]    No Key Agent selected. client/session.go:269
username@hostname:~$ 345

codingllama · 2022-04-05T18:11:28Z

Worth noting that this doesn't seem to be a time-based race condition -- it doesn't matter if I wait seconds or minutes after the prompt appears.

This is interesting info, thanks for sharing.

I'll try to get a quick repro just to see if the commits mentioned above have an effect on it - if they do, then the path is clear, otherwise we may need to get someone to take a closer look.

codingllama · 2022-04-05T21:17:14Z

I can confirm repros for Teleport 8 and 9. I haven't gone further down, but any version that has U2F and ContextReader/prompt.Stdin has this issue.

The issue is the stdin hijack from PromptMFAChallenge, which only happens if the user has both OTP and U2F/WebAuthn devices registered. It's also made worse by the PRs above, because of the hijack via term.ReadPassword (I lose about every other keystroke forever).

A simple mitigation, like asking the user to press enter, works for master/v10 because we can then "reclaim" the lost read (it has to be explicitly coded in, it won't just work otherwise). This works on master because ContextReader doesn't loop eagerly anymore. It's not clear to me why earlier versions only eat two chars, but it is an interesting detail.

Broadly, I'd say we are looking at two possible "buckets" of fixes:

UX changes to the tsh login ceremony
Some clever hack that "unblocks" the stdin read

In terms of UX changes, a reasonable change would be to ask the user to choose between OTP or MFA beforehand. This makes it easy to ensure we won't fire a read that may be ignored. A possible optimization for v10/master is to default to MFA if we detect a device capable of doing the ceremony right away (this could be nice regardless, but doesn't eliminate the issue).

As for clever hacks, I haven't found anything good yet. A possibility is running login in a separate process, but that feels like a big bump in complexity for tsh.

alexmv · 2022-04-05T21:56:53Z

It's not clear to me why earlier versions only eat two chars, but it is an interesting detail.

I generally use the U2F key, but if I use the OTP entry, it only eats one keypress.

Probably unrelated, but I notice that if I use the OTP, the light on the Yubikey keeps blinking. And with --debug, it keeps spewing endlessly, even once connected:

DEBU             Error interacting with U2F devices error:[
ERROR REPORT:
Original Error: trace.aggregate hid: not permitted
Stack Trace:
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:134 github.com/gravitational/teleport/lib/auth/webauthncli.runOnU2FDevicesOnce
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/u2f.go:76 github.com/gravitational/teleport/lib/auth/webauthncli.RunOnU2FDevices
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/auth/webauthncli/login.go:116 github.com/gravitational/teleport/lib/auth/webauthncli.Login
	/private/tmp/teleport-20220403-80235-10rntqq/teleport-9.0.3/lib/client/mfa.go:123 github.com/gravitational/teleport/lib/client.PromptMFAChallenge.func2
	/usr/local/Cellar/go/1.18/libexec/src/runtime/asm_amd64.s:1571 runtime.goexit
User Message: hid: not permitted] webauthncli/u2f.go:80

codingllama · 2022-04-05T22:31:17Z

Yes, the blinking is something else. Using master they stop blinking after a short while for me, even when compiled with the legacy U2F code.

The "hid: not permitted" errors are red herrings, they just happen sometimes (but ideally not after you authenticate). I may take a look at those, but I'll focus on the main issue first.

Fixes a potential stdin hijacking bug by making relogin attempts default to a single MFA method (the strongest available). The problematic scenario is as follows: 1. User has both OTP and security keys registered 2. "Relogin" is triggered via a tsh command (say, `tsh logout; tsh ssh --proxy=example.com llama@myserver`) 3. User is prompted to pick either OTP or security key ("Tap any security key or enter a code from a OTP device") 4. An stdin read is fired in the background to read the OTP code (via prompt.Stdin) 5. User picks the security method, thus the stdin read is "abandoned" In most cases this is fine, as the program ends right after. The issue is when a relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in this case the stdin hijack causes input to be swallowed. Forcing a single MFA option avoids the potential stdin hijack, fixing the problem for all relogin invocations. `tsh login` behavior remains the same. Note that we have to default to cluster's most secure method _without_ checking the user devices. The user is not logged in yet, thus the backend cannot reveal any information about that user. Fixes #11709. * Add UseStrongestAuth flag to PromptMFAChallenge * Add TeleportClient.UseStrongestAuth and set it true for relogin * Proper testing * Address review comments

Fixes a potential stdin hijacking bug by making relogin attempts default to a single MFA method (the strongest available). The problematic scenario is as follows: 1. User has both OTP and security keys registered 2. "Relogin" is triggered via a tsh command (say, `tsh logout; tsh ssh --proxy=example.com llama@myserver`) 3. User is prompted to pick either OTP or security key ("Tap any security key or enter a code from a OTP device") 4. An stdin read is fired in the background to read the OTP code (via prompt.Stdin) 5. User picks the security method, thus the stdin read is "abandoned" In most cases this is fine, as the program ends right after. The issue is when a relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in this case the stdin hijack causes input to be swallowed. Forcing a single MFA option avoids the potential stdin hijack, fixing the problem for all relogin invocations. `tsh login` behavior remains the same. Note that we have to default to cluster's most secure method _without_ checking the user devices. The user is not logged in yet, thus the backend cannot reveal any information about that user. Issue #11709.

…1848) Fixes a potential stdin hijacking bug by making relogin attempts default to a single MFA method (the strongest available). The problematic scenario is as follows: 1. User has both OTP and security keys registered 2. "Relogin" is triggered via a tsh command (say, `tsh logout; tsh ssh --proxy=example.com llama@myserver`) 3. User is prompted to pick either OTP or security key ("Tap any security key or enter a code from a OTP device") 4. An stdin read is fired in the background to read the OTP code (via prompt.Stdin) 5. User picks the security method, thus the stdin read is "abandoned" In most cases this is fine, as the program ends right after. The issue is when a relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in this case the stdin hijack causes input to be swallowed. Forcing a single MFA option avoids the potential stdin hijack, fixing the problem for all relogin invocations. `tsh login` behavior remains the same. Note that we have to default to cluster's most secure method _without_ checking the user devices. The user is not logged in yet, thus the backend cannot reveal any information about that user. Issue #11709. * Make relogin attempts use the strongest auth method (#11781) * Fix conflicts for v8

Fixes a potential stdin hijacking bug by making relogin attempts default to a single MFA method (the strongest available). The problematic scenario is as follows: 1. User has both OTP and security keys registered 2. "Relogin" is triggered via a tsh command (say, `tsh logout; tsh ssh --proxy=example.com llama@myserver`) 3. User is prompted to pick either OTP or security key ("Tap any security key or enter a code from a OTP device") 4. An stdin read is fired in the background to read the OTP code (via prompt.Stdin) 5. User picks the security method, thus the stdin read is "abandoned" In most cases this is fine, as the program ends right after. The issue is when a relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in this case the stdin hijack causes input to be swallowed. Forcing a single MFA option avoids the potential stdin hijack, fixing the problem for all relogin invocations. `tsh login` behavior remains the same. Note that we have to default to cluster's most secure method _without_ checking the user devices. The user is not logged in yet, thus the backend cannot reveal any information about that user. Fixes #11709. * Add UseStrongestAuth flag to PromptMFAChallenge * Add TeleportClient.UseStrongestAuth and set it true for relogin * Proper testing * Address review comments

…1847) Fixes a potential stdin hijacking bug by making relogin attempts default to a single MFA method (the strongest available). The problematic scenario is as follows: 1. User has both OTP and security keys registered 2. "Relogin" is triggered via a tsh command (say, `tsh logout; tsh ssh --proxy=example.com llama@myserver`) 3. User is prompted to pick either OTP or security key ("Tap any security key or enter a code from a OTP device") 4. An stdin read is fired in the background to read the OTP code (via prompt.Stdin) 5. User picks the security method, thus the stdin read is "abandoned" In most cases this is fine, as the program ends right after. The issue is when a relogin is triggered by a long living tsh invocation (again, `tsh ssh ...`): in this case the stdin hijack causes input to be swallowed. Forcing a single MFA option avoids the potential stdin hijack, fixing the problem for all relogin invocations. `tsh login` behavior remains the same. Note that we have to default to cluster's most secure method _without_ checking the user devices. The user is not logged in yet, thus the backend cannot reveal any information about that user. Issue #11709. * Make relogin attempts use the strongest auth method (#11781) * Fix conflicts for v9

codingllama · 2022-04-11T18:22:51Z

Hey @alexmv, this is now fixed on master and backported to v8 and v9 - whenever we get a new release it should include the fix.

The fix is essentially a UX change: we avoid the bug on tsh ssh by defaulting to the strongest auth method available. For finer control over authentication you can still use tsh login. I hope this makes it better for you.

alexmv · 2022-04-11T20:40:10Z

Thanks for the update, and the quick fix! That seems like a totally fine compromise to me -- looking forward to 9.0.5.

alexmv added the bug label Apr 4, 2022

zmb3 added the tsh tsh - Teleport's command line tool for logging into nodes running Teleport. label Apr 4, 2022

codingllama self-assigned this Apr 5, 2022

codingllama mentioned this issue Apr 6, 2022

Make relogin attempts use the strongest auth method #11781

Merged

codingllama closed this as completed in #11781 Apr 8, 2022

codingllama mentioned this issue Apr 8, 2022

[v9] Make relogin attempts use the strongest auth method (#11781) #11847

Merged

codingllama mentioned this issue Apr 8, 2022

[v8] Make relogin attempts use the strongest auth method (#11781) #11848

Merged

This was referenced May 16, 2022

tsh kube credentials exec plugin fails when not logged in #6630

Closed

UseStrongestAuth "relogin" problematic on remote environments #12675

Closed

gabrielcossette mentioned this issue May 30, 2022

Per-session MFA eats 1-2 keypresses (depending on devices) #13021

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`tsh ssh` when authenticating eats the first two keypresses #11709

`tsh ssh` when authenticating eats the first two keypresses #11709

alexmv commented Apr 4, 2022

zmb3 commented Apr 4, 2022

codingllama commented Apr 5, 2022

alexmv commented Apr 5, 2022

codingllama commented Apr 5, 2022

codingllama commented Apr 5, 2022

alexmv commented Apr 5, 2022

codingllama commented Apr 5, 2022

codingllama commented Apr 11, 2022

alexmv commented Apr 11, 2022

tsh ssh when authenticating eats the first two keypresses #11709

tsh ssh when authenticating eats the first two keypresses #11709

Comments

alexmv commented Apr 4, 2022

Description

Reproduction Steps

Server Details

Client Details

zmb3 commented Apr 4, 2022

codingllama commented Apr 5, 2022

alexmv commented Apr 5, 2022

codingllama commented Apr 5, 2022

codingllama commented Apr 5, 2022

alexmv commented Apr 5, 2022

codingllama commented Apr 5, 2022

codingllama commented Apr 11, 2022

alexmv commented Apr 11, 2022

`tsh ssh` when authenticating eats the first two keypresses #11709

`tsh ssh` when authenticating eats the first two keypresses #11709