Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFD 63: Add tsh daemon for Teleport Terminal #10286

Merged
merged 4 commits into from
Apr 1, 2022
Merged

RFD 63: Add tsh daemon for Teleport Terminal #10286

merged 4 commits into from
Apr 1, 2022

Conversation

alex-kovoy
Copy link
Contributor

@alex-kovoy alex-kovoy commented Feb 10, 2022

This PR contains a current state of the Teleport Terminal backend implementation (alpha). Here is the rendered version of rfd to get familiar with the design choices.

Currently only a portion of that rfd is implemented and work is still in progress. Nevertheless it's still worth checking it in to master as the size of the changes is growing. It should safe us time that we spend constantly addressing breaking changes to makefiles (like $ make grpc) :)

The good part is that all changes are scoped to teleterm package, this should make it safe to merge. This PR does not effect the size of tsh binary.

I am currently adding unit-tests which I will update this PR with later.

@github-actions github-actions bot requested review from kimlisa and rudream February 10, 2022 22:11
@github-actions github-actions bot added rfd Request for Discussion tsh tsh - Teleport's command line tool for logging into nodes running Teleport. labels Feb 10, 2022
if err := c.ReissueDBCerts(ctx, params.TargetUser, db); err != nil {
return nil, trace.Wrap(err)
}

Copy link
Contributor

@smallinsky smallinsky Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReissueDBCerts support only cert generation when MFA flow is disabled. Wonder if MFA Db access flow should be also supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DB MFA should be supported but currently this is not implemented. It will be done in a separate PR.

}

if req.Local != nil {
if err := cluster.LocalLogin(ctx, req.Local.User, req.Local.Password, req.Local.Token); err != nil {
Copy link
Contributor

@smallinsky smallinsky Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default the MFA LocalLogin-> localMFALogin->SSHAgentMFALogin->PromptMFAChallenge flow will try to use os.Stdin pipe to obtain Token or generate MFA dev challange resp. Thus os.Stdin pipe belongs to demon process there is no way to enter challange response and only supported MFA method right now is OTP token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the TOTP is also handled by client.SSHAgentLogin where the prompt is skipped if token is part of the request. This is what teleterm uses to handle TOTPs.

In case of webauthn or utf, it uses SSHAgentMFALogin where PromptMFAChallenge is called that awaits for a device to be activated. Both methods do not require CLI input.

}

// CreateGateway creates a gateway to given targetURI
func (s *Service) CreateGateway(ctx context.Context, params CreateGatewayParams) (*Gateway, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FE distinguish if teleport proxy is running with TLSRouting mode before creating Database Gateway and starting local.alpnProxy ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently teleterm gateways only work with proxies with TLSRouting mode enabled. The expectation is that eventually it becomes the new defaults for Teleport. If this is not the case then teleterm would need to account for it.

gw, err := gateway.New(gateway.Config{
LocalPort: params.LocalPort,
TargetURI: params.TargetURI,
TargetUser: params.TargetUser,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is the purpose of TargetUser field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TargetUser is part of the request to create a localproxy to MongoDB. It's not required for alpn proxy itself but it's part of the Gateway abstraction that describes the gateway to the DB. TargetUser will be used later when we need to regenerate the certs.

Copy link
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First iteration, I have left some general comments though I need to take a closer look.

}

if req.Sso != nil {
if err := cluster.SSOLogin(ctx, req.Sso.ProviderType, req.Sso.ProviderName); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if some SSO Login steps should be placed in teleterm demon process for example SSOLogin triggers exec and opens a browser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the first iteration, we tried to do SSO in place (Electron) but Electron Renderer processes runs against localHost that required us to look into additional proxy layer. So to keep things simple, both processes: tsh and teleterm use the same flow that require a new browser window to complete the auth flow.

@ravicious
Copy link
Member

It looks like make grpc doesn't work on arm64 because:

  1. Node is hardcoded to x64 version (but that's an easy fix with ${BUILDARCH}.
  2. The grpc-tools JS package doesn't have prebuilt grpc_node_plugin & protoc binaries for arm64. Which requires us to pull in cmake into the buildbox… But I'll see what I can do to make it go through.

Comment on lines +69 to +75

ifeq ("$(RUNTIME_ARCH)","arm64")
GRPC_NODE_PLUGIN_BINARY_TYPE := compiled
else
GRPC_NODE_PLUGIN_BINARY_TYPE := prebuilt
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Dockerfile-teleterm we need to conditionally execute a bunch of stuff. Instead of prepending each RUN with a conditional, I grouped those instructions into separate images and then at the end choose an appropriate image based on the build arg.

FROM base as grpc_node_plugin_binary_prebuilt
ONBUILD RUN (npm install --global [email protected])
FROM base as grpc_node_plugin_binary_compiled
ONBUILD RUN apt-get update -y && \
apt-get install -q -y --no-install-recommends build-essential cmake jq && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/*
ONBUILD RUN (npm install --global --ignore-scripts [email protected])
ONBUILD COPY teleterm_linux_arm64.toolchain.cmake ./linux_arm64.toolchain.cmake
ONBUILD RUN git clone --depth=1 [email protected] https://github.com/grpc/grpc-node.git && \
mv linux_arm64.toolchain.cmake grpc-node/packages/grpc-tools/. && \
cd grpc-node && \
git submodule update --init --recursive && \
cd packages/grpc-tools && \
cmake -DCMAKE_TOOLCHAIN_FILE=linux_arm64.toolchain.cmake . && \
cmake --build . --target clean && cmake --build . --target grpc_node_plugin -- -j 12 && \
cp grpc_node_plugin $(npm root -g)/grpc-tools/bin/. && \
# grpc-tools requires both protoc and grpc_node_plugin, but protoc is already installed by
# the buildbox image.
ln -s $(which protoc) $(npm root -g)/grpc-tools/bin/protoc && \
cd ../../.. && \
rm -rf grpc-node
# Choose an appropriate image and run ONBUILD instructions from it.
FROM grpc_node_plugin_binary_${GRPC_NODE_PLUGIN_BINARY_TYPE}

AFAIK Docker ARG and ENV instructions can't really include any complex logic, so I couldn't encapsulate all of this within the Dockefile. Hence this ifeq here.

Comment on lines +211 to +213
buildbox-teleterm: buildbox
@if [[ $${DRONE} == "true" ]] && ! docker inspect --type=image $(BUILDBOX_TELETERM) 2>&1 >/dev/null; then docker pull $(BUILDBOX_TELETERM) || true; fi;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the other buildbox targets have an additional conditional:

buildbox:
if [[ "$(BUILDBOX_NAME)" == "$(BUILDBOX)" ]]; then \

I'm not 100% sure if buildbox-teleterm needs one as well, but since for now it's going to be executed only on our dev machines, I think I can skip it.

@ravicious
Copy link
Member

@smallinsky @r0mant Could you take another look and tell me if this PR needs any more changes?

I'd like to get it merged to master so that we can avoid having to deal with conflicts on this branch.

4. Command Palette for quick access and navigation.

### tsh daemon
`tsh daemon` is a `tsh` tool that runs as a service. A hidden flag launches `tsh` in a background process that exposes gRPC API over unix-socket (similarly to docker daemon).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're especially sensitive to tsh latency, so let's make sure that all this extra code we add to tsh doesn't have a material impact on startup time or other common (non-teleterm) operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we go about doing this? Is there a benchmark suite I could run to test this? I see some stuff in /examples/bench and /lib/benchmark, but I'm not sure if there's something more suited for what we want to achieve here.

As far as I understood what Alexey told me during onboarding, tsh daemon start is the only entrypoint through which the Teleterm code can be executed, so I believe it shouldn't have impact on startup time for other commands. But yeah if we have some benchmark suites available then I definitely want to run them.

teleport/tool/tsh/tsh.go

Lines 772 to 773 in c681dd5

case daemonStart.FullCommand():
err = onDaemonStart(&cf)

// onDaemonStart implements "tsh daemon start" command.
func onDaemonStart(cf *CLIConf) error {
homeDir := profile.FullProfilePath(cf.HomePath)
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
err := teleterm.Start(ctx, teleterm.Config{
HomeDir: homeDir,
Addr: cf.DaemonAddr,
InsecureSkipVerify: cf.InsecureSkipVerify,
ShutdownSignals: []os.Signal{os.Interrupt, syscall.SIGTERM},
})
if err != nil {
return trace.Wrap(err)
}
return nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being a major issue (at least not yet). I just wanted to raise it early so that we keep thinking about it as we expand the amount of tsh code needed for terminal to function. It just seemed like an interesting design choice to ship extra code to all users in tsh that is only needed for the terminal use case.

Out of curiosity, what's the size of the tsh binary before and after adding the teleterm GRPC server?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 65.062MiB on master (b996135, the base of this branch at the moment)
  • 65.577MiB on HEAD of teleterm (5f1b0e7)

So Teleterm code adds 527.297KiB to the size of tsh. Built with make build/tsh.

(Also TIL that in Go usually there's no "release" build other than striping debug symbols.)


It just seemed like an interesting design choice to ship extra code to all users in tsh that is only needed for the terminal use case.

Yeah, I'm not sure if long term this is a good strategy either.

We didn't discuss an update strategy, the preview release won't have an update feature anyway and the RFD says that "tsh is packaged together with the rest of the application and installed into the "app data" folder". I'll ask the rest of the team if they made any plans in that area or if Teleterm will be tied to the general release schedule.

If Teleterm is always going to be bundled with tsh, then it would make sense to put the daemon code behind a build tag. If it's not going to pose any additional problems during development, then it's an easy way to reduce the attack surface and to just avoid shipping code that's not going to be used anyway.

@ravicious
Copy link
Member

Thanks for the review @zmb3! That's more issues than I anticipated, so I'll probably take care of them after March 15th.

@ravicious ravicious self-assigned this Mar 21, 2022
@ravicious ravicious force-pushed the teleterm branch 2 times, most recently from c681dd5 to 6a37be2 Compare March 24, 2022 11:57
ravicious added a commit to gravitational/webapps that referenced this pull request Mar 24, 2022
This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] gravitational/teleport#10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features
ravicious added a commit to gravitational/webapps that referenced this pull request Mar 24, 2022
This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] gravitational/teleport#10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features
@ravicious
Copy link
Member

@smallinsky @r0mant What else would you like me to take care of before we can merge this PR? Do you remember what the outcome of your call with Alexey was?

I just addressed the feedback that @zmb3 left, one remaining thing to do is to check if the new code has any impact on overall tsh performance.

I saw above that you discussed features like per session MFA for databases. Generally speaking, at the moment the code in this PR does everything we need it to do for the preview release. We plan to release it by the end of April. There’s a bunch of stuff left to do, but I think there will be no major changes (if any at all) to lib/teleterm.

I’d like to merge this PR as soon as possible because of its size. We should’ve taken care of it a month ago but we’re trying to do our best with what we’ve got. ;)

@ravicious ravicious changed the title Add tsh daemon for Teleport Terminal RFD 63: Add tsh daemon for Teleport Terminal Mar 31, 2022
@ravicious ravicious enabled auto-merge (rebase) March 31, 2022 10:25
@smallinsky
Copy link
Contributor

I have tested db access and found following issues:

  • trusted cluster db access:
    Screenshot 2022-03-31 at 13 55 04

  • Seems that during db access cluster name is taken from proxy address where actually proxy address and cluster name can differ so following case teleterm db connect command tries to use ice-berg.dev where the cluster name is leaf1
    Screenshot 2022-03-31 at 14 04 19

tree /Users/marek/Library/Application\ Support/Electron/tsh/keys/ice-berg.dev/cas
/Users/marek/Library/Application\ Support/Electron/tsh/keys/ice-berg.dev/cas
├── leaf1.pem
└── main.pem
  • Some CLI client doesn't handle correctly the space in in cert args like: /Users/marek/Library/Application Support/Electron/tsh/keys/ice-berg.dev
Failed global initialisation: InvalidSSLConfiguration Unable to load PEM from '/Users/marek/Library/Application': Failed opening file
  • Unknown database protocol in case of ms server access
    Screenshot 2022-03-31 at 10 39 32

@ravicious
Copy link
Member

@smallinsky

  • Trusted cluster DB access: thanks, I just filed an issue for it. I think it might be a bug in the JS code where we don't pass the appropriate cluster before attempting to connect to the DB.
  • Spaces in cert args: I suppose on the frontend we could wrap the path in quotes in case someone wants to pass this path to a CLI tool. On macOS the app will always store its data in Application Support. Created an issue for that too.
  • Unknown database protocol in case of ms server access: yeah, I just reported that to the rest of the team today. It looks like Teleterm doesn't support any kind of database protocol that was added after the work on Teleterm has started.
    • For now only MongoDB, MySQL and Postgres are supported. We're missing support for MS SQL Server, CockroachDB and Redis.
    • That being said, it seems like it's mostly an issue with the frontend code not supporting those protocols. There doesn't seem to be anything specific to MySQL/Postgres under lib/teleterm.

I created/reported those frontend issues and I'm going to merge this PR once I manage to fix the issue with Teleport-UnitTest that I'm currently having.

@ravicious ravicious disabled auto-merge March 31, 2022 13:46
@ravicious ravicious enabled auto-merge (rebase) March 31, 2022 14:10
@ravicious ravicious force-pushed the teleterm branch 2 times, most recently from 7ce459c to 5a8dd67 Compare March 31, 2022 15:01
@ravicious ravicious disabled auto-merge March 31, 2022 15:05
alex-kovoy and others added 4 commits April 1, 2022 11:41
The grpc-tools package is needed to generate gRPC files for JavaScript.
However, at the moment it can't be installed on M1 MacBooks because of
missing prebuilt binaries for arm64. [1]

One of them, protoc, is already installed in our buildbox. We still need
to compile grpc_node_plugin from source though. This adds significant
overhead as we need to pull in cmake, build-essential and then about
300 MB of git repos from protocolbuffers/protobuf.

Initially, those Teleterm gRPC were generated within `make grpc` with other
files. M1 users who don't work on Teleterm would not be happy about incurring
that additional overhead, hence I extracted everything into separate target
and Dockerfile.

Teleterm proto files don't depend on any other proto files. Once grpc-tools
adds support for arm64, we'll be able to essentially almost revert this
commit and generate Teleterm gRPC files within `make grpc`.

[1] grpc/grpc-node#1405
The login is either local or SSO but not both.
The previous version of the code used GetHostId return value for the URI.
That caused problems as a single host can run multiple database servers.
This in turn resulted in stuff like Teleterm not listing all databases.

There's `Database.GetURI` function which I decided not to use, because it's
an URI on its own which might include stuff like port numbers and what not.
I wanted to avoid a situation in which the database URI creates some potential
conflicts with the Teleterm URIs.

I noticed that the Web UI code runs `DeduplicateDatabases` already and it
uses `Database.GetName` underneath, so I deemed it a good candidate to be
a part of a database URI in Teleterm.

Fixes gravitational/webapps.e#127
@ravicious ravicious enabled auto-merge (rebase) April 1, 2022 09:42
@ravicious ravicious merged commit 3999b17 into master Apr 1, 2022
@ravicious ravicious deleted the teleterm branch April 1, 2022 11:02
ravicious added a commit to gravitational/webapps that referenced this pull request Apr 26, 2022
This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] gravitational/teleport#10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features
ravicious added a commit to gravitational/webapps that referenced this pull request Apr 27, 2022
This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] gravitational/teleport#10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features
ravicious added a commit to gravitational/webapps that referenced this pull request Apr 27, 2022
* Take `localClusterUri` into account in `QuickInput` (#679)

* Display cluster name for each connection

* Automatically try to connect a connection when possible

* Update connection icon

* Always use root cluster URI to obtain `documentsService` in `useServerConnect`

* Don't close the tab on non-zero exit code

* Autocomplete database names for tsh proxy db

* Change placeholder text in `ClusterAdd`

* Show leaf cluster name when possible in Connections list

* Hide kubes and apps

* Force `TopBar` items to take full height

* Change shortcut to open `QuickInput`

* Update electron-builder to 23.0.3

Versions lower than 23.0.2 don't work on newest macOS, see:

* electron-userland/electron-builder#6732
* electron-userland/electron-builder#6606

* remove `Navigator` code (#685)

* Prevent breaking layout on long cluster name (#688)

* Show username when possible in identity list (#687)

* Update command for updating proto files

* Update protobufs for Teleterm (LoginRequest params)

This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] gravitational/teleport#10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features

* Render ssh menu item as `NavLink` only when URL is provided

* Use connection dropdown instead of modal for supplying SSH username

* Adjust `Identity` layout, combine `logout` and `clusterRemove` into a single action

* Change command `cluster-remove` to `cluster-logout`

* Apply `Identity` design changes

* Enable `babel-plugin-styled-components` in production and tests (#697)

* Make connections icon bigger

* Properly use `css` prop

Co-authored-by: Grzegorz Zdunek <[email protected]>
Co-authored-by: Grzegorz Zdunek <[email protected]>
hatched pushed a commit to hatched/teleport-merge that referenced this pull request Nov 30, 2022
This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] gravitational#10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features
hatched pushed a commit that referenced this pull request Dec 20, 2022
This commit removes a bunch of unused protobufs and also updates some of
those that got updated on the teleterm branch in the teleport repo.

The biggest change from all of them is that LoginRequest now has oneof
on Sso and Local. [1] This is because a login request should have either
Sso or Local params, but not both at the same time.

The previous implementation called both `setSso` and `setLogin` on the same
object. This no longer works with the use of `oneof`, because calling `setLogin`
after `setSso` would clear the `Sso` params. [2]

[1] #10286 (comment)
[2] https://developers.google.com/protocol-buffers/docs/proto3#oneof_features
@ravicious ravicious added the teleport-connect Issues related to Teleport Connect. label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfd Request for Discussion teleport-connect Issues related to Teleport Connect. tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants