Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teleport 11 Test Plan #16951

Closed
Tracked by #16949
r0mant opened this issue Oct 3, 2022 · 21 comments
Closed
Tracked by #16949

Teleport 11 Test Plan #16951

r0mant opened this issue Oct 3, 2022 · 21 comments
Labels
test-plan A list of tasks required to ship a successful product release.

Comments

@r0mant
Copy link
Collaborator

r0mant commented Oct 3, 2022

Manual Testing Plan

Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh install of the version to be released
as well as an upgrade of the previous version of Teleport.

  • Adding nodes to a cluster @EdwardDowling

    • Adding Nodes via Valid Static Token
    • Adding Nodes via Valid Short-lived Tokens
    • Adding Nodes via Invalid Token Fails
    • Revoking Node Invitation
  • Labels @EdwardDowling

    • Static Labels
    • Dynamic Labels
  • Trusted Clusters @lxea

    • Adding Trusted Cluster Valid Static Token
    • Adding Trusted Cluster Valid Short-lived Token
    • Adding Trusted Cluster Invalid Token
    • Removing Trusted Cluster
  • RBAC @atburke

    Make sure that invalid and valid attempts are reflected in audit log.

    • Successfully connect to node with correct role
    • Unsuccessfully connect to a node in a role restricting access by label
    • Unsuccessfully connect to a node in a role restricting access by invalid SSH login
    • Allow/deny role option: SSH agent forwarding
    • Allow/deny role option: Port forwarding
  • Verify that custom PAM environment variables are available as expected. @jakule

  • Users @codingllama

    With every user combination, try to login and signup with invalid second
    factor, invalid password to see how the system reacts.

    WebAuthn in the release tsh binary is implemented using libfido2 for
    linux/macOS. Ask for a statically built pre-release binary for realistic
    tests. (tsh fido2 diag should work in our binary.) Webauthn in Windows
    build is implemented using webauthn.dll. (tsh webauthn diag with
    security key selected in dialog should work.)

    Touch ID requires a signed tsh, ask for a signed pre-release binary so you
    may run the tests.

    Windows Webauthn requires Windows 10 19H1 and device capable of Windows
    Hello.

    • Adding Users Password Only

    • Adding Users OTP

    • Adding Users WebAuthn

      • macOS/Linux
      • Windows
    • Adding Users via platform authenticator

      • Touch ID
      • Windows Hello
    • Managing MFA devices

      • Add an OTP device with tsh mfa add
      • Add a WebAuthn device with tsh mfa add
        • macOS/Linux
        • Windows
      • Add platform authenticator device with tsh mfa add
        • Touch ID
        • Windows Hello
      • List MFA devices with tsh mfa ls
      • Remove an OTP device with tsh mfa rm
      • Remove a WebAuthn device with tsh mfa rm
      • Attempt removing the last MFA device on the user
        • with second_factor: on in auth_service, should fail
        • with second_factor: optional in auth_service, should succeed
    • Login Password Only

    • Login with MFA

      • Add an OTP, a WebAuthn and a Touch ID device with tsh mfa add
      • Login via OTP
      • Login via WebAuthn
        • macOS/Linux
        • Windows
      • Login via platform authenticator
        • Touch ID
        • Windows Hello
      • Login via WebAuthn using an U2F device

      U2F devices must be registered in a previous version of Teleport.

      Using Teleport v9, set auth_service.authentication.second_factor = u2f,
      restart the server and then register an U2F device (tsh mfa add). Upgrade
      the install to the current Teleport version (one major at a time) and try to
      login using the U2F device as your second factor - it should work.

    • Deleting Users

  • SSO @camscale

    • Login OIDC
    • Login SAML
    • Login GitHub
  • Backends @Joerger

    • Teleport runs with etcd
    • Teleport runs with dynamodb
    • Teleport runs with SQLite
    • Teleport runs with Firestore
  • Session Recording @strideynet

    • Session recording can be disabled
    • Sessions can be recorded at the node
      • Sessions in remote clusters are recorded in remote clusters
    • Sessions can be recorded at the proxy
      • Sessions on remote clusters are recorded in the local cluster
      • Enable/disable host key checking.
  • Audit Log @capnspacehook

    • Failed login attempts are recorded

    • Interactive sessions have the correct Server ID

      • Server ID is the ID of the node in "session_recording: node" mode
      • Server ID is the ID of the proxy in "session_recording: proxy" mode

      Node/Proxy ID may be found at /var/lib/teleport/host_uuid in the
      corresponding machine.

      Node IDs may also be queried via tctl nodes ls.

    • Exec commands are recorded

    • scp commands are recorded

    • Subsystem results are recorded

      Subsystem testing may be achieved using both
      Recording Proxy mode
      and
      OpenSSH integration.

      Assuming the proxy is proxy.example.com:3023 and node1 is a node running
      OpenSSH/sshd, you may use the following command to trigger a subsystem audit
      log:

      sftp -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" root@node1
  • Interact with a cluster using tsh @mdwn

    These commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.

    • tsh ssh <regular-node>
    • tsh ssh <node-remote-cluster>
    • tsh ssh -A <regular-node>
    • tsh ssh -A <node-remote-cluster>
    • tsh ssh <regular-node> ls
    • tsh ssh <node-remote-cluster> ls
    • tsh join <regular-node>
    • tsh join <node-remote-cluster>
    • tsh play <regular-node>
    • tsh play <node-remote-cluster>
    • tsh scp <regular-node>
    • tsh scp <node-remote-cluster>
    • tsh ssh -L <regular-node>
    • tsh ssh -L <node-remote-cluster>
    • tsh ls
    • tsh clusters
  • Interact with a cluster using ssh @tobiaszheller
    Make sure to test both recording and regular proxy modes.

    • ssh <regular-node>
    • ssh <node-remote-cluster>
    • ssh -A <regular-node>
    • ssh -A <node-remote-cluster>
    • ssh <regular-node> ls
    • ssh <node-remote-cluster> ls
    • scp <regular-node>
    • scp <node-remote-cluster>
    • ssh -L <regular-node>
    • ssh -L <node-remote-cluster>
  • Verify proxy jump functionality @Joerger
    Log into leaf cluster via root, shut down the root proxy and verify proxy jump works.

    • tls routing disabled
      • tsh ssh -J <leaf.proxy.example.com:3023>
      • ssh -J <leaf.proxy.example.com:3023>
    • tls routing enabled
      • tsh ssh -J <leaf.proxy.example.com:3080>
      • tsh proxy ssh -J <leaf.proxy.example.com:3080>
  • Interact with a cluster using the Web UI @capnspacehook

    • Connect to a Teleport node
    • Connect to a OpenSSH node
    • Check agent forwarding is correct based on role and proxy mode.

User accounting @jakule

  • Verify that active interactive sessions are tracked in /var/run/utmp on Linux.
  • Verify that interactive sessions are logged in /var/log/wtmp on Linux.

Combinations @nklaassen

For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.

  • Connect to a OpenSSH node in a local cluster using OpenSSH.
  • Connect to a OpenSSH node in a local cluster using Teleport.
  • Connect to a OpenSSH node in a local cluster using the Web UI.
  • Connect to a Teleport node in a local cluster using OpenSSH.
  • Connect to a Teleport node in a local cluster using Teleport.
  • Connect to a Teleport node in a local cluster using the Web UI.
  • Connect to a OpenSSH node in a remote cluster using OpenSSH.
  • Connect to a OpenSSH node in a remote cluster using Teleport.
  • Connect to a OpenSSH node in a remote cluster using the Web UI.
  • Connect to a Teleport node in a remote cluster using OpenSSH.
  • Connect to a Teleport node in a remote cluster using Teleport.
  • Connect to a Teleport node in a remote cluster using the Web UI.

Teleport with EKS/GKE @tigrato

  • Deploy Teleport on a single EKS cluster
  • Deploy Teleport on two EKS clusters and connect them via trusted cluster feature
  • Deploy Teleport Proxy outside of GKE cluster fronting connections to it (use this script to generate a kubeconfig)
  • Deploy Teleport Proxy outside of EKS cluster fronting connections to it (use this script to generate a kubeconfig)

Teleport with multiple Kubernetes clusters @AntonAM

Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.

  • Deploy combo auth/proxy/kubernetes_service outside of a Kubernetes cluster, using a kubeconfig
    • Login with tsh login, check that tsh kube ls has your cluster
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
    • Verify that the audit log recorded the above request and session
  • Deploy combo auth/proxy/kubernetes_service inside of a Kubernetes cluster
    • Login with tsh login, check that tsh kube ls has your cluster
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
    • Verify that the audit log recorded the above request and session
  • Deploy combo auth/proxy_service outside of the Kubernetes cluster and kubernetes_service inside of a Kubernetes cluster, connected over a reverse tunnel
    • Login with tsh login, check that tsh kube ls has your cluster
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh
    • Verify that the audit log recorded the above request and session
  • Deploy a second kubernetes_service inside of another Kubernetes cluster, connected over a reverse tunnel
    • Login with tsh login, check that tsh kube ls has both clusters
    • Switch to a second cluster using tsh kube login
    • Run kubectl get nodes, kubectl exec -it $SOME_POD -- sh on the new cluster
    • Verify that the audit log recorded the above request and session
  • Deploy combo auth/proxy/kubernetes_service outside of a Kubernetes cluster, using a kubeconfig with multiple clusters in it
    • Login with tsh login, check that tsh kube ls has all clusters
  • Test Kubernetes screen in the web UI (tab is located on left side nav on dashboard):
    • Verify that all kubes registered are shown with correct name and labels
    • Verify that clicking on a rows connect button renders a dialogue on manual instructions with Step 2 login value matching the rows name column
    • Verify searching for name or labels in the search bar works
    • Verify you can sort by name column

Kubernetes auto-discovery @tigrato

  • Test Kubernetes auto-discovery:
    • Verify that Azure AKS clusters are discovered and enrolled for different Azure Auth configs:
      • Local Accounts only
      • Azure AD
      • Azure RBAC
    • Verify that AWS EKS clusters are discovered and enrolled

Kubernetes Secret Storage @tigrato

  • Kubernetes Secret storage for Agent's Identity
    • Install Teleport 11 agent with a short-lived token
    • Upgrade from Teleport 10 agent with storage
      • Validate if agent identity was read from storage and stored in the secret without generating a new one
      • Validate if the Teleport is still running as a Statefulset resource and if it contains the new ENV variables
    • Upgrade from Teleport 10 agent without storage (Failing due to Helm fails installing T11 hook #17437)
      • Validate if the agent identity is created and stored in the secret using the long-lived token.
      • Validate if the Teleport Kubernetes Deployment was correctly converted into a Statefulset and if the old Deployment object was removed after a successful upgrade
    • Force cluster CA rotation
  • Test Kubernetes exec via websockets - client

Teleport with FIPS mode @alistanis

  • Perform trusted clusters, Web and SSH sanity check with all teleport components deployed in FIPS mode.

ACME @alistanis

  • Teleport can fetch TLS certificate automatically using ACME protocol.

Migrations @jakule

  • Migrate trusted clusters from 10 to 11
    • Migrate auth server on main cluster, then rest of the servers on main cluster
      SSH should work for both main and old clusters
    • Migrate auth server on remote cluster, then rest of the remote cluster
      SSH should work

Command Templates

When interacting with a cluster, the following command templates are useful:

OpenSSH

# when connecting to the recording proxy, `-o 'ForwardAgent yes'` is required.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# the above command only forwards the agent to the proxy, to forward the agent
# to the target node, `-o 'ForwardAgent yes'` needs to be passed twice.
ssh -o "ForwardAgent yes" \
  -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" \
  node.example.com

# when connecting to a remote cluster using OpenSSH, the subsystem request is
# updated with the name of the remote cluster.
ssh -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%[email protected]" \
  node.foo.com

Teleport

# when connecting to a OpenSSH node, remember `-p 22` needs to be passed.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -p 22 node.example.com

# an agent can be forwarded to the target node with `-A`
tsh --proxy=proxy.example.com --user=<username> --insecure ssh -A -p 22 node.example.com

# the --cluster flag is used to connect to a node in a remote cluster.
tsh --proxy=proxy.example.com --user=<username> --insecure ssh --cluster=foo.com -p 22 node.foo.com

Teleport with SSO Providers

  • G Suite install instructions work @AntonAM
    • G Suite Screenshots are up to date
  • Azure Active Directory (AD) install instructions work @alistanis
    • Azure Active Directory (AD) Screenshots are up to date
  • ActiveDirectory (ADFS) install instructions work @alistanis
    • Active Directory (ADFS) Screenshots are up to date
  • Okta install instructions work @camscale
    • Okta Screenshots are up to date
  • OneLogin install instructions work @hugoShaka
  • GitLab install instructions work @capnspacehook
    • GitLab Screenshots are up to date
  • OIDC install instructions work @camscale
    • OIDC Screenshots are up to date
  • All providers with guides in docs are covered in this test plan

tctl sso family of commands @Tener

For help with setting up sso connectors, check out the Quick GitHub/SAML/OIDC Setup Tips

tctl sso configure helps to construct a valid connector definition:

  • tctl sso configure github ... creates valid connector definitions
  • tctl sso configure oidc ... creates valid connector definitions
  • tctl sso configure saml ... creates valid connector definitions

tctl sso test test a provided connector definition, which can be loaded from
file or piped in with tctl sso configure or tctl get --with-secrets. Valid
connectors are accepted, invalid are rejected with sensible error messages.

  • Connectors can be tested with tctl sso test.
    • GitHub
    • SAML
    • OIDC
      • Google Workspace
      • Non-Google IdP

Teleport Plugins @hugoShaka

  • Test receiving a message via Teleport Slackbot
  • Test receiving a new Jira Ticket via Teleport Jira

AWS Node Joining @nklaassen

Docs

  • On EC2 instance with ec2:DescribeInstances permissions for local account:
    TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
  • On EC2 instance with any attached role:
    TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
  • EC2 Join method in IoT mode with node and auth in different AWS accounts
  • IAM Join method in IoT mode with node and auth in different AWS accounts

Passwordless @codingllama

Passwordless requires tsh compiled with libfido2 for most operations (apart
from Touch ID). Ask for a statically-built tsh binary for realistic tests.

Touch ID requires a properly built and signed tsh binary. Ask for a
pre-release binary, so you may run the tests.

This sections complements "Users -> Managing MFA devices". tsh binaries for
each operating system (Linux, macOS and Windows) must be tested separately for
FIDO2 items.

  • Diagnostics

    Commands should pass all tests.

    • tsh fido2 diag (macOS/Linux)
    • tsh touchid diag (macOS only)
    • tsh webauthnwin diag (Windows only)
  • Registration

    • Register a passworldess FIDO2 key (tsh mfa add, choose WEBAUTHN and
      passwordless)
      • macOS/Linux
      • Windows
    • Register a platform authenticator
      • Touch ID credential (tsh mfa add, choose TOUCHID)
      • Windows hello credential (tsh mfa add, choose WEBAUTHN and
        passwordless)
  • Login

    • Passwordless login using FIDO2 (tsh login --auth=passwordless)
      • macOS/Linux
      • Windows
    • Passwordless login using platform authenticator (tsh login --auth=passwordless)
      • Touch ID
      • Windows Hello
    • tsh login --auth=passwordless --mfa-mode=cross-platform uses FIDO2
      • macOS/Linux
      • Windows
    • tsh login --auth=passwordless --mfa-mode=platform uses platform authenticator
      • Touch ID
      • Windows Hello
    • tsh login --auth=passwordless --mfa-mode=auto prefers platform authenticator
      • Touch ID
      • Windows Hello
    • Passwordless disable switch works
      (auth_service.authentication.passwordless = false)
    • Cluster in passwordless mode defaults to passwordless
      (auth_service.authentication.connector_name = passwordless)
    • Cluster in passwordless mode allows MFA login
      (tsh login --auth=local)
  • Touch ID support commands

    • tsh touchid ls works
    • tsh touchid rm works (careful, may lock you out!)

Hardware Key Support @Joerger

Hardware Key Support is an Enterprise feature and is not available for OSS.

You will need a YubiKey 4.3+ to test this feature.

This feature has additional build requirements, so it should be tested with a pre-release build from Drone (eg: https://get.gravitational.com/teleport-ent-v11.0.0-alpha.2-linux-amd64-bin.tar.gz).

These tests should be carried out sequentially. tsh tests should be carried out on Linux, MacOS, and Windows.

  • tsh login as user with Webauthn login and no hardware key requirement.
  • Request a role with role.role_options.require_session_mfa: hardware_key - tsh login --request-roles=hardware_key_required
    • Assuming the role should force automatic re-login with yubikey
    • tsh ssh
      • Requires yubikey to be connected for re-login
      • Prompts for per-session MFA
  • Request a role with role.role_options.require_session_mfa: hardware_key_touch - tsh login --request-roles=hardware_key_touch_required
    • Assuming the role should force automatic re-login with yubikey
      • Prompts for touch if not cached (last touch within 15 seconds)
    • tsh ssh
      • Requires yubikey to be connected for re-login
      • Prompts for touch if not cached
  • tsh logout and tsh login as the user with no hardware key requirement.
  • Upgrade auth settings to auth_service.authentication.require_session_mfa: hardware_key
    • Using the existing login session (tsh ls) should force automatic re-login with yubikey
    • tsh ssh
      • Requires yubikey to be connected for re-login
      • Prompts for per-session MFA
  • Upgrade auth settings to auth_service.authentication.require_session_mfa: hardware_key_touch
    • Using the existing login session (tsh ls) should force automatic re-login with yubikey
      • Prompts for touch if not cached
    • tsh ssh
      • Requires yubikey to be connected for re-login
      • Prompts for touch if not cached

Performance @rosstimothy @fspmarshall

Perform all tests on the following configurations:

  • With default networking configuration
  • With Proxy Peering Enabled
  • With TLS Routing Enabled
  • Cluster with 10K direct dial nodes:
  • etcd
  • DynamoDB
  • Firestore
  • Cluster with 10K reverse tunnel nodes:
  • etcd
  • DynamoDB
  • Firestore
  • Cluster with 500 trusted clusters:
  • etcd
  • DynamoDB
  • Firestore

Soak Test

Run 30 minute soak test with a mix of interactive/non-interactive sessions for both direct and reverse tunnel nodes:

tsh bench --duration=30m user@direct-dial-node ls
tsh bench -i --duration=30m user@direct-dial-node ps uax

tsh bench --duration=30m user@reverse-tunnel-node ls
tsh bench -i --duration=30m user@reverse-tunnel-node ps uax

Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks

  • Verify that prometheus metrics are accurate.

Concurrent Session Test

  • Cluster with 1k reverse tunnel nodes

Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:

tsh bench sessions --max=5000 user ls
tsh bench sessions --max=5000 --web user ls 
  • Verify that all 5000 sessions are able to be established.
  • Verify that tsh and the web UI are still functional.

Teleport with Cloud Providers

AWS @hugoShaka

GCP @AntonAM

  • Deploy Teleport to GCP. Using Cloud Firestore & Cloud Storage
  • Deploy Teleport to GKE. Google Kubernetes engine.
  • Deploy Teleport Enterprise to GCP.

IBM @atburke

  • Deploy Teleport to IBM Cloud. Using IBM Database for etcd & IBM Object Store
  • Deploy Teleport to IBM Cloud Kubernetes.
  • Deploy Teleport Enterprise to IBM Cloud.

Application Access @mdwn

  • Run an application within local cluster.
    • Verify the debug application debug_app: true works.
    • Verify an application can be configured with command line flags.
    • Verify an application can be configured from file configuration.
    • Verify that applications are available at auto-generated addresses name.rootProxyPublicAddr and well as publicAddr.
  • Run an application within a trusted cluster.
    • Verify that applications are available at auto-generated addresses name.rootProxyPublicAddr.
  • Verify Audit Records.
    • app.session.start and app.session.chunk events are created in the Audit Log.
    • app.session.chunk points to a 5 minute session archive with multiple app.session.request events inside.
    • tsh play <chunk-id> can fetch and print a session chunk archive.
  • Verify JWT using verify-jwt.go.
  • Verify RBAC.
  • Verify CLI access with tsh app login.
  • Verify AWS console access.
    • Can log into AWS web console through the web UI.
    • Can interact with AWS using tsh aws commands.
  • Verify dynamic registration.
    • Can register a new app using tctl create.
    • Can update registered app using tctl create -f.
    • Can delete registered app using tctl rm.
  • Test Applications screen in the web UI (tab is located on left side nav on dashboard):
    • Verify that all apps registered are shown
    • Verify that clicking on the app icon takes you to another tab
    • Verify using the bash command produced from Add Application dialogue works (refresh app screen to see it registered)

Database Access @smallinsky + db access team

  • Connect to a database within a local cluster.
  • Connect to a database within a remote cluster via a trusted cluster.
  • Verify audit events. @GavinFrazar
    • db.session.start is emitted when you connect.
    • db.session.end is emitted when you disconnect.
    • db.session.query is emitted when you execute a SQL query.
  • Verify RBAC.
    • tsh db ls shows only databases matching role's db_labels. @gabrielcorado
    • Can only connect as users from db_users. @gabrielcorado
    • (Postgres only) Can only connect to databases from db_names. @gabrielcorado
      • db.session.start is emitted when connection attempt is denied.
    • (MongoDB only) Can only execute commands in databases from db_names. @gabrielcorado
      • db.session.query is emitted when command fails due to permissions.
    • Can configure per-session MFA. @GavinFrazar
      • MFA tap is required on each tsh db connect.
  • Verify dynamic registration. @GavinFrazar
    • Can register a new database using tctl create.
    • Can update registered database using tctl create -f.
    • Can delete registered database using tctl rm.
  • Verify discovery.
    • AWS
      • Can detect and register RDS instances. @GavinFrazar
      • Can detect and register Aurora clusters, and their reader and custom endpoints. @gabrielcorado
      • Can detect and register Redshift clusters. @smallinsky
      • Can detect and register ElastiCache Redis clusters. @greedy52
      • Can detect and register MemoryDB clusters. @greedy52
    • Azure
      • Can detect and register MySQL and Postgres instances. @GavinFrazar
      • Can detect and register Azure Cache for Redis servers. @smallinsky
  • Verify Teleport managed users (password rotation, auto 'auth' on connection, etc.). @greedy52
    • Can detect and manage ElastiCache users
    • Can detect and manage MemoryDB users
  • Test Databases screen in the web UI (tab is located on left side nav on dashboard): @Tener
    • Verify that all dbs registered are shown with correct name, description, type, and labels
    • Verify that clicking on a rows connect button renders a dialogue on manual instructions with Step 2 login value matching the rows name column
    • Verify searching for all columns in the search bar works
    • Verify you can sort by all columns except labels
  • Other
    • MySQL server version reported by Teleport is correct. @smallinsky

TLS Routing @smallinsky

  • Verify that teleport proxy v2 configuration starts only a single listener. @smallinsky
    version: v2
    teleport:
      proxy_service:
        enabled: "yes"
        public_addr: ['root.example.com']
        web_listen_addr: 0.0.0.0:3080
    
  • Run Teleport Proxy in multiplex mode auth_service.proxy_listener_mode: "multiplex" @smallinsky
    • Trusted cluster
      • Setup trusted clusters using single port setup web_proxy_addr == tunnel_addr
      kind: trusted_cluster
      spec:
        ...
        web_proxy_addr: root.example.com:443
        tunnel_addr: root.example.com:443
        ...
      
  • Database Access
  • Application Access @GavinFrazar
    • Verify app access through proxy running in multiplex mode
  • SSH Access @gabrielcorado
    • Connect to a OpenSSH server through a local ssh proxy ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" [email protected]
    • Connect to a OpenSSH server on leaf-cluster through a local ssh proxyssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" [email protected]
    • Verify tsh ssh access through proxy running in multiplex mode
  • Kubernetes access: @GavinFrazar
    • Verify kubernetes access through proxy running in multiplex mode

Desktop Access @ibeckermayer @probakowski @LKozlowski

  • Direct mode (set listen_addr):
    • Can connect to desktop defined in static hosts section.
    • Can connect to desktop discovered via LDAP
  • IoT mode (reverse tunnel through proxy):
    • Can connect to desktop defined in static hosts section.
    • Can connect to desktop discovered via LDAP
  • Connect multiple windows_desktop_services to the same Teleport cluster,
    verify that connections to desktops on different AD domains works. (Attempt to
    connect several times to verify that you are routed to the correct
    windows_desktop_service)
  • Verify user input
    • Download Keyboard Key Info and
      verify all keys are processed correctly in each supported browser. Known
      issues: F11 cannot be captured by the browser without
      special configuration
      on MacOS.
    • Left click and right click register as Windows clicks. (Right click on
      the desktop should show a Windows menu, not a browser context menu)
    • Vertical and horizontal scroll work.
      Horizontal Scroll Test
  • Locking
    • Verify that placing a user lock terminates an active desktop session.
    • Verify that placing a desktop lock terminates an active desktop session.
    • Verify that placing a role lock terminates an active desktop session.
  • Labeling
    • Set client_idle_timeout to a small value and verify that idle sessions
      are terminated (the session should end and an audit event will confirm it
      was due to idle connection)
    • All desktops have teleport.dev/origin label.
    • Dynamic desktops have additional teleport.dev labels for OS, OS
      Version, DNS hostname.
    • Regexp-based host labeling applies across all desktops, regardless of
      origin.
  • RBAC
    • RBAC denies access to a Windows desktop due to labels
    • RBAC denies access to a Windows desktop with the wrong OS-login.
  • Clipboard Support
    • When a user has a role with clipboard sharing enabled and is using a chromium based browser
      • Going to a desktop when clipboard permissions are in "Ask" mode (aka "prompt") causes the browser to show a prompt while the UI shows a spinner
      • X-ing out of the prompt (causing the clipboard permission to remain in "Ask" mode) causes the prompt to show up again
      • Denying clipboard permissions brings up a relevant error alert (with "Clipboard Sharing Disabled" in the top bar)
      • Allowing clipboard permissions allows you to see the desktop session, with "Clipboard Sharing Enabled" highlighted in the top bar
      • Copy text from local workstation, paste into remote desktop
      • Copy text from remote desktop, paste into local workstation
    • When a user has a role with clipboard sharing enabled and is not using a chromium based browser
      • The UI shows a relevant alert and "Clipboard Sharing Disabled" is highlighted in the top bar
    • When a user has a role with clipboard sharing disabled and is using a chromium and non-chromium based browser (confirm both)
      • The live session should show disabled in the top bar and copy/paste should not work between your workstation and the remote desktop.
  • Directory Sharing
    • On supported, non-chromium based browsers (Firefox/Safari)
      • Attempting to share directory shows a dismissible "Unsupported Action" dialog
    • On supported, chromium based browsers (Chrome/Edge)
      • Begin sharing works
        • The shared directory icon in the top right of the screen is highlighted when directory sharing is initiated
        • The shared directory appears as a network drive named "<directory_name> on teleport"
        • The share directory menu option dissapears from the menu
      • Navigation
        • The folders of the shared directory are navigable (move up and down the directory tree)
      • CRUD
        • A new text file can be created
        • The text file can be written to (saved)
        • The text file can be read (close it, check that its saved on the local machine, then open it again on the remote)
        • The text file can be deleted
      • File/Folder movement
        • In to out (make at least one of these from a non-top-level-directory)
          • A file from inside the shared directory can be drag-and-dropped outside the shared directory
          • A folder from inside the shared directory can be drag-and-dropped outside the shared directory (and its contents retained)
          • A file from inside the shared directory can be cut-pasted outside the shared directory
          • A folder from inside the shared directory can be cut-pasted outside the shared directory
          • A file from inside the shared directory can be copy-pasted outside the shared directory
          • A folder from inside the shared directory can be copy-pasted outside the shared directory
        • Out to in (make at least one of these overwrite an existing file, and one go into a non-top-level directory)
          • A file from outside the shared directory can be drag-and-dropped into the shared directory
          • A folder from outside the shared directory can be drag-and-dropped into the shared directory (and its contents retained)
          • A file from outside the shared directory can be cut-pasted into the shared directory
          • A folder from outside the shared directory can be cut-pasted into the shared directory
          • A file from outside the shared directory can be copy-pasted into the shared directory
          • A folder from outside the shared directory can be copy-pasted into the shared directory
        • Within
          • A file from inside the shared directory cannot be drag-and-dropped to another folder inside the shared directory: a dismissible "Unsupported Action" dialog is shown
          • A folder from inside the shared directory cannot be drag-and-dropped to another folder inside the shared directory: a dismissible "Unsupported Action" dialog is shown
          • A file from inside the shared directory cannot be cut-pasted to another folder inside the shared directory: a dismissible "Unsupported Action" dialog is shown
          • A folder from inside the shared directory cannot be cut-pasted to another folder inside the shared directory: a dismissible "Unsupported Action" dialog is shown
          • A file from inside the shared directory can be copy-pasted to another folder inside the shared directory
          • A folder from inside the shared directory can be copy-pasted to another folder inside shared directory (and its contents retained)
    • RBAC
      • Give the user one role that explicitly disables directory sharing (desktop_directory_sharing: false) and confirm that the option to share a directory doesn't appear in the menu
  • Per-Session MFA (try webauthn on each of Chrome, Safari, and Firefox; u2f only works with Firefox)
    • Attempting to start a session no keys registered shows an error message
    • Attempting to start a session with a u2f key registered shows an error message (N/A now that u2f support has been removed)
    • Attempting to start a session with a webauthn registered pops up the "Verify Your Identity" dialog
      • Hitting "Cancel" shows an error message
      • Hitting "Verify" causes your browser to prompt you for MFA
      • Cancelling that browser MFA prompt shows an error
      • Successful MFA verification allows you to connect
  • Session Recording
    • Verify sessions are not recorded if all of a user's roles disable recording
    • Verify sync recording (mode: node-sync or mode: proy-sync)
    • Verify async recording (mode: node or mode: proxy)
    • Sessions show up in session recordings UI with desktop icon
    • Sessions can be played back, including play/pause functionality
    • A session that ends with a TDP error message can be played back, ends by displaying the error message,
      and the progress bar progresses to the end.
    • Attempting to play back a session that doesn't exist (i.e. by entering a non-existing session id in the url) shows
      a relevant error message.
    • RBAC for sessions: ensure users can only see their own recordings when
      using the RBAC rule from our
      docs
  • Audit Events (check these after performing the above tests)
    • windows.desktop.session.start (TDP00I) emitted on start
    • windows.desktop.session.start (TDP00W) emitted when session fails to
      start (due to RBAC, for example)
    • windows.desktop.session.end (TDP01I) emitted on end
    • desktop.clipboard.send (TDP02I) emitted for local copy -> remote
      paste
    • desktop.clipboard.receive (TDP03I) emitted for remote copy -> local
      paste

Binaries compatibility @fheinecke

  • Verify that teleport/tsh/tctl/tbot run on:
    • CentOS 7
    • CentOS 8
    • Ubuntu 18.04
    • Ubuntu 20.04
    • Debian 9
  • Verify tsh runs on:
    • Windows 10
    • MacOS

Machine ID @timothyb89

SSH

With a default Teleport instance configured with a SSH node:

  • Verify you are able to create a new bot user with tctl bots add robot --roles=access. Follow the instructions provided in the output to start tbot
  • Verify you are able to connect to the SSH node using openssh with the generated ssh_config in the destination directory
  • Verify that after the renewal period (default 20m, but this can be reduced via configuration), that newly generated certificates are placed in the destination directory
  • Verify that sending both SIGUSR1 and SIGHUP to a running tbot process causes a renewal and new certificates to be generated
  • Verify that you are able to make a connection to the SSH node using the ssh_config provided by tbot after each phase of a manual CA rotation.

Ensure the above tests are completed for both:

  • Directly connecting to the auth server
  • Connecting to the auth server via the proxy reverse tunnel

DB Access

With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:

  • Verify you are able to connect to and interact with a database using tbot db while tbot start is running

Host users creation @lxea

Host users creation docs
Host users creation RFD

  • Verify host users creation functionality
    • non-existing users are created automatically
    • users are added to groups
      • non existing configured groups are created
      • created users are added to the teleport-system group
    • users are cleaned up after their session ends
      • cleanup occurs if a program was left running after session ends
    • sudoers file creation is successful
      • Invalid sudoers files are not created
    • existing host users are not modified
    • setting disable_create_host_user: true stops user creation from occurring

CA rotations @espadolini

  • Verify the CA rotation functionality itself (by checking in the backend or with tctl get cert_authority)
    • standby phase: only active_keys, no additional_trusted_keys
    • init phase: active_keys and additional_trusted_keys
    • update_clients and update_servers phases: the certs from the init phase are swapped
    • standby phase: only the new certs remain in active_keys, nothing in additional_trusted_keys
    • rollback phase (second pass, after completing a regular rotation): same content as in the init phase
    • standby phase after rollback: same content as in the previous standby phase
  • Verify functionality in all phases (clients might have to log in again in lieu of waiting for credentials to expire between phases)
    • SSH session in tsh from a previous phase
    • SSH session in web UI from a previous phase
    • New SSH session with tsh
    • New SSH session with web UI
    • New SSH session in a child cluster on the same major version
    • New SSH session in a child cluster on the previous major version
    • New SSH session from a parent cluster
    • Application access through a browser
    • Application access through curl with tsh app login
    • kubectl get po after tsh kube login
    • Database access (no configuration change should be necessary if the database CA isn't rotated, other Teleport functionality should not be affected if only the database CA is rotated)

EC2 Discovery @lxea

EC2 Discovery docs

  • Verify EC2 instance discovery
    • Only EC2 instances matching given AWS tags have the installer executed on them
    • Only the IAM permissions mentioned in the discovery docs are required for operation
    • Custom scripts specified in different matchers are executed
    • Custom SSM documents specified in different matchers are executed
    • New EC2 instances with matching AWS tags are discovered and added to the teleport cluster
      • Large numbers of EC2 instances (51+) are all successfully added to the cluster
    • Nodes that have been discovered do not have the install script run on the node multiple times

Resources

Quick GitHub/SAML/OIDC Setup Tips

@r0mant r0mant added the test-plan A list of tasks required to ship a successful product release. label Oct 3, 2022
@rosstimothy
Copy link
Contributor

I ran into some issues with the new config v3 changes: #17118

@nklaassen
Copy link
Contributor

nklaassen commented Oct 7, 2022

So far I am unable to ssh to an OpenSSH node using tsh or the Web UI.

On the ssh node, I see userauth_pubkey: unsupported public key algorithm: [email protected] [preauth]

tsh and the Web UI show ERROR: access denied to ec2-user connecting to <ip> on cluster <my cluster>.

The connection works using the OpenSSH client connecting through the teleport proxy

The SSH node is an ec2 instance running the latest amazon linux 2, sshd version is OpenSSH_7.4p1

edit: I get the same error running Teleport v10.0.0
edit 2: with a newer sshd, tsh begins to work but the openssh client stops working. Filed an issue with details: #17197

@mdwn
Copy link
Contributor

mdwn commented Oct 7, 2022

tsh ssh tests

tsh ssh host command spams an auditd error for regular or remote nodes running in docker: #17185

tsh play seems to have a default API domain of teleport.cluster.local when attempting to play a remote recording: #17192

application access tests

teleport app start outputs the wrong flags during a misconfiguration: #17264

teleport configure for app_servers produces invalid/deprecated YAML: #17268

general

tctl create with no arguments blocks forever: #17271

@Joerger
Copy link
Contributor

Joerger commented Oct 7, 2022

tsh proxy ssh -J <leaf-proxy> doesn't work with root shut down - #17184

@ibeckermayer
Copy link
Contributor

Desktop Access clipboard sharing is broken -- #17195

@jakule
Copy link
Contributor

jakule commented Oct 8, 2022

Enhanced recording, aka BPF, seems to be broken on v11.

#17203

@espadolini
Copy link
Contributor

v10 leaf clusters are mostly unusable from v11 roots: #17211

@rosstimothy
Copy link
Contributor

etcd Load Testing

Agent Mesh

10k Tunnel Nodes

image

https://teleportcoreteam.grafana.net/goto/c6BFvMI4z?orgId=1

10k Direct Dial Nodes

image

https://teleportcoreteam.grafana.net/goto/SX6JDGI4z?orgId=1

500 Trusted Cluster

image

https://teleportcoreteam.grafana.net/goto/tuTUDGIVz?orgId=1

Soak Test

----Direct Dial Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-77d968c88-d8mlt ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         162 ms
50         167 ms
75         173 ms
90         181 ms
95         189 ms
99         211 ms
100        484 ms

tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-77d968c88-d8mlt ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         163 ms
50         168 ms
75         174 ms
90         181 ms
95         189 ms
99         208 ms
100        434 ms

----Reverse Tunnel Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-785fb8fc99-999nx ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         164 ms
50         169 ms
75         174 ms
90         181 ms
95         186 ms
99         203 ms
100        404 ms

 tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-785fb8fc99-999nx ps aux

* Requests originated: 17998
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         164 ms
50         170 ms
75         175 ms
90         181 ms
95         187 ms
99         208 ms
100        456 ms

Proxy Peering

10k Tunnel Nodes

image

https://teleportcoreteam.grafana.net/goto/XXiMOGIVk?orgId=1

10k Direct Dial Nodes

image

https://teleportcoreteam.grafana.net/goto/CKcndGI4z?orgId=1

500 Trusted Cluster

image

https://teleportcoreteam.grafana.net/goto/34V4OGSVk?orgId=1

Soak Test

----Direct Dial Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-77d968c88-vtkdv ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         157 ms
50         162 ms
75         167 ms
90         173 ms
95         178 ms
99         200 ms
100        427 ms

tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-77d968c88-vtkdv ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         158 ms
50         162 ms
75         167 ms
90         172 ms
95         176 ms
99         198 ms
100        425 ms


----Reverse Tunnel Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-785fb8fc99-tgdc8 ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         162 ms
50         167 ms
75         173 ms
90         179 ms
95         185 ms
99         204 ms
100        438 ms

tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-785fb8fc99-tgdc8 ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration
---------- -----------------
25         162 ms
50         167 ms
75         174 ms
90         181 ms
95         188 ms
99         208 ms
100        336 ms

@fspmarshall
Copy link
Contributor

fspmarshall commented Oct 11, 2022

DynamoDB

10k Direct Dial Scaling

loadtest-v11-10k-non-iot

Direct Dial Soak

$ tsh bench --duration=30m <user>@<host> ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         171 ms            
50         179 ms            
75         188 ms            
90         197 ms            
95         205 ms            
99         259 ms            
100        1845 ms
$ tsh bench --duration=30m --interactive <user>@<host> ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         177 ms            
50         186 ms            
75         194 ms            
90         205 ms            
95         215 ms            
99         306 ms            
100        2251 ms

10k Tunnel Scaling

loadtest-v11-10k-iot

Tunnel Soak

$ tsh bench --duration=30m <user>@<host> ls

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         155 ms            
50         161 ms            
75         179 ms            
90         184 ms            
95         188 ms            
99         214 ms            
100        1186 ms
$ tsh bench --duration=30m --interactive <user>@<host> ps aux

* Requests originated: 17999
* Requests failed: 0

Histogram

Percentile Response Duration 
---------- ----------------- 
25         155 ms            
50         161 ms            
75         179 ms            
90         184 ms            
95         188 ms            
99         211 ms            
100        469 ms

500 Trusted Clusters

500-tc

Upgrade At Scale

In addition to normal scaling tests, I did a step by step upgrade of a 10K node dynamo cluster in order to asses the dynamoDB usage differences between v10.2 and v11.0.0-alpha.2. This was done in order to assess the effects of #16911 on dynamoDB read capacity.

Below are two dynamo DB stat page images. The first shows a v10.2 cluster being restarted, and the second shows the same restart procedure being used to apply an upgrade from v10.2 to v11.0.0-alpha.2 (we use a non-upgrading restart as the comparison point since it helps us control for the load created by cache resets and disruption of heartbeats):

loadtest-v10-restart-2

loadtest-v11-upgrade

Note the difference in the "read usage" sections between the restart and upgrade cases. Both have a similar large spike immediately after restart due to cache resets, with the upgrade case stabilizing at a much higher average read usage (~29 vs ~1.5). In theory, a read usage of 29 for a 10k cluster is practically nothing, but the proportional difference between the resting rate before and after #16911 does make me nervous. Such a jump might negatively impact users with very high numbers of peak concurrent sessions if they have fine-tuned their dynamo read capacity to just barely accommodate their existing load. We don't recommend doing things like that, and we generally encourage people to use on-demand, but it still gives me pause. Haven't made up my mind yet, but I think I might revert the compare-and-swap semantics introduced in #16911 in favor of an approach that has a lower impact.

@smallinsky
Copy link
Contributor

smallinsky commented Oct 11, 2022

Small issue with Snowflake DB Access: tctl auth sign call on leaf cluster in case of multi trusted clusters setup: #17262

PR with a fix #17263

@fspmarshall
Copy link
Contributor

Opted to revert compare-and-swap node heartbeats based on dynamo stats in #16951 (comment).

PR with fix: #17308

@jdconti
Copy link

jdconti commented Oct 11, 2022

Can we please add X11 tests as a non-root user to this (and future) test plans? Thanks!

@ibeckermayer
Copy link
Contributor

ibeckermayer commented Oct 11, 2022

Desktop Access clipboard sharing is broken -- #17195

Webapps PR with the fix is here gravitational/webapps#1250
Ideally gravitational/webapps#1251 gets merged and backported as well

Update: resolved

@Joerger
Copy link
Contributor

Joerger commented Oct 14, 2022

Hardware key support broke between v11.0.0-alpha.2 and v11.0.0-beta.1 - #17415

Edit: False alarm, it only doesn't work in proxy recording mode as expected... I've added the Hardware Key Support tests to the test plan to double check everything with v11.0.0-beta.1.

@jakule
Copy link
Contributor

jakule commented Oct 14, 2022

/var/log/wtmp is not being updated correctly #17416

@tigrato
Copy link
Contributor

tigrato commented Oct 14, 2022

Teleport Kube Agent Chart hook is failing due to a wrong find & replace #17437

@espadolini
Copy link
Contributor

espadolini commented Oct 17, 2022

@hugoShaka
Copy link
Contributor

Onelogin SSO integration guide still works but a couple of screenshots and concepts would need an update: #17485

@codingllama
Copy link
Contributor

tsh / Windows: tsh mfa add for OTPs doesn't show me the QR code. (Typing the key still works.)

FYI @tobiaszheller

@codingllama
Copy link
Contributor

Raised #17563 and #17564, neither is blocking for the release.

@jakule
Copy link
Contributor

jakule commented Oct 19, 2022

Created #17572

@r0mant r0mant mentioned this issue Oct 21, 2022
30 tasks
@r0mant r0mant closed this as completed Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-plan A list of tasks required to ship a successful product release.
Projects
None yet
Development

No branches or pull requests