-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teleport 11 Test Plan #16951
Comments
I ran into some issues with the new config v3 changes: #17118 |
So far I am unable to ssh to an OpenSSH node using On the ssh node, I see
The connection works using the OpenSSH client connecting through the teleport proxy The SSH node is an ec2 instance running the latest amazon linux 2, sshd version is OpenSSH_7.4p1 edit: I get the same error running Teleport v10.0.0 |
tsh ssh tests
application access tests
general
|
|
Desktop Access clipboard sharing is broken -- #17195 |
Enhanced recording, aka BPF, seems to be broken on v11. |
v10 leaf clusters are mostly unusable from v11 roots: #17211 |
etcd Load TestingAgent Mesh10k Tunnel Nodeshttps://teleportcoreteam.grafana.net/goto/c6BFvMI4z?orgId=1 10k Direct Dial Nodeshttps://teleportcoreteam.grafana.net/goto/SX6JDGI4z?orgId=1 500 Trusted Clusterhttps://teleportcoreteam.grafana.net/goto/tuTUDGIVz?orgId=1 Soak Test----Direct Dial Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-77d968c88-d8mlt ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 162 ms
50 167 ms
75 173 ms
90 181 ms
95 189 ms
99 211 ms
100 484 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-77d968c88-d8mlt ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 163 ms
50 168 ms
75 174 ms
90 181 ms
95 189 ms
99 208 ms
100 434 ms
----Reverse Tunnel Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-785fb8fc99-999nx ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 164 ms
50 169 ms
75 174 ms
90 181 ms
95 186 ms
99 203 ms
100 404 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-785fb8fc99-999nx ps aux
* Requests originated: 17998
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 164 ms
50 170 ms
75 175 ms
90 181 ms
95 187 ms
99 208 ms
100 456 ms Proxy Peering10k Tunnel Nodeshttps://teleportcoreteam.grafana.net/goto/XXiMOGIVk?orgId=1 10k Direct Dial Nodeshttps://teleportcoreteam.grafana.net/goto/CKcndGI4z?orgId=1 500 Trusted Clusterhttps://teleportcoreteam.grafana.net/goto/34V4OGSVk?orgId=1 Soak Test----Direct Dial Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@node-77d968c88-vtkdv ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 157 ms
50 162 ms
75 167 ms
90 173 ms
95 178 ms
99 200 ms
100 427 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@node-77d968c88-vtkdv ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 158 ms
50 162 ms
75 167 ms
90 172 ms
95 176 ms
99 198 ms
100 425 ms
----Reverse Tunnel Node Test----
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m root@iot-node-785fb8fc99-tgdc8 ls
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 162 ms
50 167 ms
75 173 ms
90 179 ms
95 185 ms
99 204 ms
100 438 ms
tsh --insecure --proxy=monster.gravitational.co:3080 -i /etc/teleport/auth bench --duration=30m --interactive root@iot-node-785fb8fc99-tgdc8 ps aux
* Requests originated: 17999
* Requests failed: 0
Histogram
Percentile Response Duration
---------- -----------------
25 162 ms
50 167 ms
75 174 ms
90 181 ms
95 188 ms
99 208 ms
100 336 ms |
DynamoDB10k Direct Dial ScalingDirect Dial Soak
10k Tunnel ScalingTunnel Soak
500 Trusted ClustersUpgrade At ScaleIn addition to normal scaling tests, I did a step by step upgrade of a 10K node dynamo cluster in order to asses the dynamoDB usage differences between Below are two dynamo DB stat page images. The first shows a Note the difference in the "read usage" sections between the restart and upgrade cases. Both have a similar large spike immediately after restart due to cache resets, with the upgrade case stabilizing at a much higher average read usage (~29 vs ~1.5). In theory, a read usage of 29 for a 10k cluster is practically nothing, but the proportional difference between the resting rate before and after #16911 does make me nervous. Such a jump might negatively impact users with very high numbers of peak concurrent sessions if they have fine-tuned their dynamo read capacity to just barely accommodate their existing load. We don't recommend doing things like that, and we generally encourage people to use on-demand, but it still gives me pause. Haven't made up my mind yet, but I think I might revert the compare-and-swap semantics introduced in #16911 in favor of an approach that has a lower impact. |
Opted to revert compare-and-swap node heartbeats based on dynamo stats in #16951 (comment). PR with fix: #17308 |
Can we please add X11 tests as a non-root user to this (and future) test plans? Thanks! |
Webapps PR with the fix is here gravitational/webapps#1250 Update: resolved |
Hardware key support broke between Edit: False alarm, it only doesn't work in proxy recording mode as expected... I've added the Hardware Key Support tests to the test plan to double check everything with |
|
Teleport Kube Agent Chart hook is failing due to a wrong find & replace #17437 |
|
Onelogin SSO integration guide still works but a couple of screenshots and concepts would need an update: #17485 |
FYI @tobiaszheller |
Created #17572 |
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh install of the version to be released
as well as an upgrade of the previous version of Teleport.
Adding nodes to a cluster @EdwardDowling
Labels @EdwardDowling
Trusted Clusters @lxea
RBAC @atburke
Make sure that invalid and valid attempts are reflected in audit log.
Verify that custom PAM environment variables are available as expected. @jakule
Users @codingllama
With every user combination, try to login and signup with invalid second
factor, invalid password to see how the system reacts.
WebAuthn in the release
tsh
binary is implemented using libfido2 forlinux/macOS. Ask for a statically built pre-release binary for realistic
tests. (
tsh fido2 diag
should work in our binary.) Webauthn in Windowsbuild is implemented using
webauthn.dll
. (tsh webauthn diag
withsecurity key selected in dialog should work.)
Touch ID requires a signed
tsh
, ask for a signed pre-release binary so youmay run the tests.
Windows Webauthn requires Windows 10 19H1 and device capable of Windows
Hello.
Adding Users Password Only
Adding Users OTP
Adding Users WebAuthn
Adding Users via platform authenticator
Managing MFA devices
tsh mfa add
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failsecond_factor: optional
inauth_service
, should succeedLogin Password Only
Login with MFA
tsh mfa add
U2F devices must be registered in a previous version of Teleport.
Using Teleport v9, set
auth_service.authentication.second_factor = u2f
,restart the server and then register an U2F device (
tsh mfa add
). Upgradethe install to the current Teleport version (one major at a time) and try to
login using the U2F device as your second factor - it should work.
Deleting Users
SSO @camscale
Backends @Joerger
Session Recording @strideynet
Audit Log @capnspacehook
Failed login attempts are recorded
Interactive sessions have the correct Server ID
Node/Proxy ID may be found at
/var/lib/teleport/host_uuid
in thecorresponding machine.
Node IDs may also be queried via
tctl nodes ls
.Exec commands are recorded
scp
commands are recordedSubsystem results are recorded
Subsystem testing may be achieved using both
Recording Proxy mode
and
OpenSSH integration.
Assuming the proxy is
proxy.example.com:3023
andnode1
is a node runningOpenSSH/sshd, you may use the following command to trigger a subsystem audit
log:
sftp -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %[email protected] -s proxy:%h:%p" root@node1
Interact with a cluster using
tsh
@mdwnThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
Interact with a cluster using
ssh
@tobiaszhellerMake sure to test both recording and regular proxy modes.
Verify proxy jump functionality @Joerger
Log into leaf cluster via root, shut down the root proxy and verify proxy jump works.
Interact with a cluster using the Web UI @capnspacehook
User accounting @jakule
/var/run/utmp
on Linux./var/log/wtmp
on Linux.Combinations @nklaassen
For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.
Teleport with EKS/GKE @tigrato
Teleport with multiple Kubernetes clusters @AntonAM
Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersname
andlabels
Step 2
login value matching the rowsname
columnname
orlabels
in the search bar worksname
columnKubernetes auto-discovery @tigrato
Kubernetes Secret Storage @tigrato
Statefulset
host_uuid
is never stored in Kubernertes Secret and does not survive Pod w/out storage restart #17474)Statefulset
resource and if it contains the new ENV variablesDeployment
was correctly converted into a Statefulset and if the oldDeployment
object was removed after a successful upgradeTeleport with FIPS mode @alistanis
ACME @alistanis
Migrations @jakule
SSH should work for both main and old clusters
SSH should work
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers
tctl sso
family of commands @TenerFor help with setting up sso connectors, check out the Quick GitHub/SAML/OIDC Setup Tips
tctl sso configure
helps to construct a valid connector definition:tctl sso configure github ...
creates valid connector definitionstctl sso configure oidc ...
creates valid connector definitionstctl sso configure saml ...
creates valid connector definitionstctl sso test
test a provided connector definition, which can be loaded fromfile or piped in with
tctl sso configure
ortctl get --with-secrets
. Validconnectors are accepted, invalid are rejected with sensible error messages.
tctl sso test
.Teleport Plugins @hugoShaka
AWS Node Joining @nklaassen
Docs
ec2:DescribeInstances
permissions for local account:TELEPORT_TEST_EC2=1 go test ./integration -run TestEC2NodeJoin
TELEPORT_TEST_EC2=1 go test ./integration -run TestIAMNodeJoin
Passwordless @codingllama
Passwordless requires
tsh
compiled with libfido2 for most operations (apartfrom Touch ID). Ask for a statically-built
tsh
binary for realistic tests.Touch ID requires a properly built and signed
tsh
binary. Ask for apre-release binary, so you may run the tests.
This sections complements "Users -> Managing MFA devices".
tsh
binaries foreach operating system (Linux, macOS and Windows) must be tested separately for
FIDO2 items.
Diagnostics
Commands should pass all tests.
tsh fido2 diag
(macOS/Linux)tsh touchid diag
(macOS only)tsh webauthnwin diag
(Windows only)Registration
tsh mfa add
, choose WEBAUTHN andpasswordless)
tsh mfa add
, choose TOUCHID)tsh mfa add
, choose WEBAUTHN andpasswordless)
Login
tsh login --auth=passwordless
)tsh login --auth=passwordless
)tsh login --auth=passwordless --mfa-mode=cross-platform
uses FIDO2tsh login --auth=passwordless --mfa-mode=platform
uses platform authenticatortsh login --auth=passwordless --mfa-mode=auto
prefers platform authenticator(
auth_service.authentication.passwordless = false
)(
auth_service.authentication.connector_name = passwordless
)(
tsh login --auth=local
)Touch ID support commands
tsh touchid ls
workstsh touchid rm
works (careful, may lock you out!)Hardware Key Support @Joerger
Hardware Key Support is an Enterprise feature and is not available for OSS.
You will need a YubiKey 4.3+ to test this feature.
This feature has additional build requirements, so it should be tested with a pre-release build from Drone (eg:
https://get.gravitational.com/teleport-ent-v11.0.0-alpha.2-linux-amd64-bin.tar.gz
).These tests should be carried out sequentially.
tsh
tests should be carried out on Linux, MacOS, and Windows.tsh login
as user with Webauthn login and no hardware key requirement.role.role_options.require_session_mfa: hardware_key
-tsh login --request-roles=hardware_key_required
tsh ssh
role.role_options.require_session_mfa: hardware_key_touch
-tsh login --request-roles=hardware_key_touch_required
tsh ssh
tsh logout
andtsh login
as the user with no hardware key requirement.auth_service.authentication.require_session_mfa: hardware_key
tsh ls
) should force automatic re-login with yubikeytsh ssh
auth_service.authentication.require_session_mfa: hardware_key_touch
tsh ls
) should force automatic re-login with yubikeytsh ssh
Performance @rosstimothy @fspmarshall
Perform all tests on the following configurations:
Soak Test
Run 30 minute soak test with a mix of interactive/non-interactive sessions for both direct and reverse tunnel nodes:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Concurrent Session Test
Run a concurrent session test that will spawn 5 interactive sessions per node in the cluster:
Teleport with Cloud Providers
AWS @hugoShaka
GCP @AntonAM
IBM @atburke
Application Access @mdwn
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh app login
.tsh aws
commands.tctl create
.tctl create -f
.tctl rm
.Add Application
dialogue works (refresh app screen to see it registered)Database Access @smallinsky + db access team
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
. @gabrielcoradodb_users
. @gabrielcoradodb_names
. @gabrielcoradodb.session.start
is emitted when connection attempt is denied.db_names
. @gabrielcoradodb.session.query
is emitted when command fails due to permissions.tsh db connect
.tctl create
.tctl create -f
.tctl rm
.name
,description
,type
, andlabels
Step 2
login value matching the rowsname
columnlabels
TLS Routing @smallinsky
v2
configuration starts only a single listener. @smallinskymultiplex
modeauth_service.proxy_listener_mode: "multiplex"
@smallinskyweb_proxy_addr == tunnel_addr
tsh db connect
works through proxy running inmultiplex
modetsh proxy db
with a GUI client. @smallinsky @GavinFrazar @greedy52 @Tener @gabrielcoradomultiplex
modessh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh" [email protected]
ssh -o "ForwardAgent yes" -o "ProxyCommand tsh proxy ssh --user=%r --cluster=leaf-cluster %h:%p" [email protected]
tsh ssh
access through proxy running in multiplex modemultiplex
modeDesktop Access @ibeckermayer @probakowski @LKozlowski
listen_addr
):hosts
section.hosts
section.windows_desktop_service
s to the same Teleport cluster,verify that connections to desktops on different AD domains works. (Attempt to
connect several times to verify that you are routed to the correct
windows_desktop_service
)verify all keys are processed correctly in each supported browser. Known
issues: F11 cannot be captured by the browser without
special configuration
on MacOS.
the desktop should show a Windows menu, not a browser context menu)
Horizontal Scroll Test
client_idle_timeout
to a small value and verify that idle sessionsare terminated (the session should end and an audit event will confirm it
was due to idle connection)
teleport.dev/origin
label.teleport.dev
labels for OS, OSVersion, DNS hostname.
origin.
desktop_directory_sharing: false
) and confirm that the option to share a directory doesn't appear in the menuAttempting to start a session with a u2f key registered shows an error message(N/A now that u2f support has been removed)mode: node-sync
ormode: proy-sync
)mode: node
ormode: proxy
)and the progress bar progresses to the end.
a relevant error message.
using the RBAC rule from our
docs
windows.desktop.session.start
(TDP00I
) emitted on startwindows.desktop.session.start
(TDP00W
) emitted when session fails tostart (due to RBAC, for example)
windows.desktop.session.end
(TDP01I
) emitted on enddesktop.clipboard.send
(TDP02I
) emitted for local copy -> remotepaste
desktop.clipboard.receive
(TDP03I
) emitted for remote copy -> localpaste
Binaries compatibility @fheinecke
Machine ID @timothyb89
SSH
With a default Teleport instance configured with a SSH node:
tctl bots add robot --roles=access
. Follow the instructions provided in the output to starttbot
ssh_config
in the destination directorySIGUSR1
andSIGHUP
to a running tbot process causes a renewal and new certificates to be generatedssh_config
provided bytbot
after each phase of a manual CA rotation.Ensure the above tests are completed for both:
DB Access
With a default Postgres DB instance, a Teleport instance configured with DB access and a bot user configured:
tbot db
whiletbot start
is runningHost users creation @lxea
Host users creation docs
Host users creation RFD
teleport-system
groupdisable_create_host_user: true
stops user creation from occurringCA rotations @espadolini
tctl get cert_authority
)standby
phase: onlyactive_keys
, noadditional_trusted_keys
init
phase:active_keys
andadditional_trusted_keys
update_clients
andupdate_servers
phases: the certs from theinit
phase are swappedstandby
phase: only the new certs remain inactive_keys
, nothing inadditional_trusted_keys
rollback
phase (second pass, after completing a regular rotation): same content as in theinit
phasestandby
phase afterrollback
: same content as in the previousstandby
phasetsh app login
kubectl get po
aftertsh kube login
EC2 Discovery @lxea
EC2 Discovery docs
Resources
Quick GitHub/SAML/OIDC Setup Tips
The text was updated successfully, but these errors were encountered: