Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tsh db ls performance #14092

Merged
merged 17 commits into from
Jul 9, 2022

Conversation

greedy52
Copy link
Contributor

@greedy52 greedy52 commented Jul 5, 2022

fix #14075

Changes:

  • Added new API GetCurrentUserRoles that streams all user roles
  • tsh db ls to reuse proxy client
  • tsh db ls --all to run fetch in parallel per profile

@greedy52 greedy52 added the WIP label Jul 5, 2022
@greedy52 greedy52 requested a review from smallinsky July 5, 2022 02:48
@greedy52 greedy52 force-pushed the STeve/14075_improve_tsh_db_ls_performance branch from 78315ad to 3a21023 Compare July 5, 2022 02:49
lib/auth/auth_with_roles.go Outdated Show resolved Hide resolved
lib/auth/grpcserver.go Show resolved Hide resolved
lib/services/role.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
Copy link
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can cover this flow somehow in the tsh test.
For example, we can use our own dialer with custom timeout and inject this in tsh test to simulate low latency between the tsh client and teleport proxy.

Copy link
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered the idea of having the allowed database users retrieved and returned as part of ListDatabases instead, to avoid the extra roundtrips to also fetch the rolesets?

sidenote: we don't seem to care about showing the list of database users when we're outputting databases in json format, perhaps we could skip the extra fetches in that case?

lib/services/role.go Outdated Show resolved Hide resolved
lib/services/role.go Outdated Show resolved Hide resolved
lib/auth/grpcserver.go Show resolved Hide resolved
lib/auth/auth_with_roles.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
@espadolini
Copy link
Contributor

additional question: do we absolutely need all the changes to parallelize ListDatabases and ConnectToProxy/FetchAllClusterRoles instead of opening one less ProxyClient and using the same one for databases and roles?

Copy link
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional point raised by @rosstimothy: we should be careful about parallelizing connections - a cluster with a lot of leaf clusters can cause a massive amount of outbound connections when doing listDatabasesAllClusters.

@russjones russjones requested a review from rosstimothy July 6, 2022 17:19
@greedy52 greedy52 removed the WIP label Jul 7, 2022
@greedy52 greedy52 requested review from espadolini and smallinsky July 7, 2022 04:59
@greedy52 greedy52 marked this pull request as ready for review July 7, 2022 04:59
@github-actions github-actions bot added the tsh tsh - Teleport's command line tool for logging into nodes running Teleport. label Jul 7, 2022
@github-actions github-actions bot requested review from atburke and r0mant July 7, 2022 05:00
Copy link
Contributor

@smallinsky smallinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I have left one comment but aport from that the fix looks good for me.

tool/tsh/db.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
tool/tsh/db.go Outdated Show resolved Hide resolved
Copy link
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We probably want to consider adding some tracing bits to the new stuff added here. It might also be worthwhile to capture traces before and after this change. That would really show the performance gains and also help identify any areas that could still be improved upon.

@greedy52
Copy link
Contributor Author

greedy52 commented Jul 7, 2022

Looks good to me. We probably want to consider adding some tracing bits to the new stuff added here. It might also be worthwhile to capture traces before and after this change. That would really show the performance gains and also help identify any areas that could still be improved upon.

@rosstimothy traceability for new changes works out of the box. Old code services.FetchRoles does not take the context so I have to do some local changes to capture. I will leave instrumenting old code for a separate change.

Here is the comparison. Tested against my cluster in AWS.

Before this change

Screen Shot 2022-07-07 at 1 41 59 PM

  • ConnectToProxy is the most expansive call and we were doing that 2 times.
  • GetRole n times for each role my user has. In the original issue, GetRole takes 100ms~200ms against TeleportCloud so they add up quick.

With this change

Screen Shot 2022-07-07 at 1 32 55 PM

  • ConnectToProxy once
  • No longer calling GetSites to figure out current cluster name.
  • A single GetCurrentUserRoles

Really love the tracing!

@greedy52 greedy52 enabled auto-merge (squash) July 7, 2022 17:57
@rosstimothy
Copy link
Contributor

Looks good!

I think that listDatabasesAllClusters might potentially benefit from a span per profile, but that could be added later too I suppose.

@greedy52 greedy52 merged commit 13abca6 into master Jul 9, 2022
@github-actions
Copy link

github-actions bot commented Jul 9, 2022

@greedy52 See the table below for backport results.

Branch Result
branch/v10 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tsh tsh - Teleport's command line tool for logging into nodes running Teleport.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tsh db ls execution time depends on number of roles assigned to a user
4 participants