Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: better determination if scp -R flag can be used #109036

Merged
merged 1 commit into from
Aug 23, 2023

Conversation

RaduBerinde
Copy link
Member

When uploading a file to a cluster, we use the "tree dist" algorithm by default. This uploads the file to a single node, then we copy the file from that node to the other nodes (up to 10).

This only makes sense if the remote-to-remote transfers can happen directly, which only happens if we pass the -R -A flags to scp. Unfortunately older versions don't support these flags. Currently the flags are only passed if the OS is darwin.

This commits improves the determination - we run ssh -V (once) and check if the SSL major version is three. For reference, some examples of what ssh -V returns:

  • recent MacOSX: OpenSSH_9.0p1, LibreSSL 3.3.6
  • Ubuntu 22.04: OpenSSH_8.9p1 Ubuntu-3ubuntu0.3, OpenSSL 3.0.2 15 Mar 2022

In addition, if the version is not 3, we disable the use of "tree dist".

Epic: none
Release note: None

@RaduBerinde RaduBerinde requested a review from a team as a code owner August 18, 2023 20:06
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the roachprod-scp-treedist-fix branch from 2bfc137 to 506552b Compare August 18, 2023 20:07
@RaduBerinde RaduBerinde changed the title roachprod: better determination if -R flag can be used roachprod: better determination if scp -R flag can be used Aug 18, 2023
@RaduBerinde RaduBerinde requested a review from AlexTalks August 18, 2023 20:07
cmd := exec.Command("ssh", "-V")
out, err := cmd.CombinedOutput()
if err != nil {
panic(fmt.Sprintf("error running ssh -V: %v", err))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rare, think it's worth logging the output as well?

if err != nil {
panic(fmt.Sprintf("error running ssh -V: %v", err))
}
sshVersion3Internal.value = strings.Contains(string(out), "SSL 3.")
Copy link
Collaborator

@stevendanna stevendanna Aug 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try parsing to version to future-proof this against version 4. But eh, I suppose that is next decades problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus if history is any guide, next version might be 5. Or 9? :)

@aliher1911
Copy link
Contributor

Neat! That would also match versions strings on Debian and Arch.

@RaduBerinde RaduBerinde force-pushed the roachprod-scp-treedist-fix branch from 506552b to 83cdf9c Compare August 23, 2023 17:53
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @aliher1911, @herkolategan, @smg260, and @stevendanna)


pkg/roachprod/install/cluster_synced.go line 2454 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

This should be rare, think it's worth logging the output as well?

Done.

@RaduBerinde RaduBerinde force-pushed the roachprod-scp-treedist-fix branch from 83cdf9c to ae1f902 Compare August 23, 2023 18:08
@RaduBerinde
Copy link
Member Author

bors r+

Comment on lines 1756 to 1758
if c.UseTreeDist && sshVersion3() {
detail = " (dist)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused -- this diff is changing the detail that is logged when transferring files, but not changing the actual logic of transferring these files. IIUC, this will change message logged on TC nightly builds (where agents run Ubuntu 20.04) but that's confusing since the transferring algorithm won't have changed.

Am I missing something?

@RaduBerinde
Copy link
Member Author

bors r-

@craig
Copy link
Contributor

craig bot commented Aug 23, 2023

Canceled.

When uploading a file to a cluster, we use the "tree dist" algorithm
by default. This uploads the file to a single node, then we copy the
file from that node to the other nodes (up to 10).

This only makes sense if the remote-to-remote transfers can happen
directly, which only happens if we pass the `-R -A` flags to `scp`.
Unfortunately older versions don't support these flags. Currently the
flags are only passed if the OS is `darwin`.

This commits improves the determination - we run `ssh -V` (once) and
check if the `SSL` major version is three. For reference, some
examples of what `ssh -V` returns:
 - recent MacOSX: `OpenSSH_9.0p1, LibreSSL 3.3.6`
 - Ubuntu 22.04: `OpenSSH_8.9p1 Ubuntu-3ubuntu0.3, OpenSSL 3.0.2 15 Mar 2022`

In addition, if the version is not 3, we disable the use of "tree
dist".

Epic: none
Release note: None
@RaduBerinde RaduBerinde force-pushed the roachprod-scp-treedist-fix branch from ae1f902 to 4240ac6 Compare August 23, 2023 19:36
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @aliher1911, @herkolategan, @renatolabs, @smg260, and @stevendanna)


pkg/roachprod/install/cluster_synced.go line 1757 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

I'm confused -- this diff is changing the detail that is logged when transferring files, but not changing the actual logic of transferring these files. IIUC, this will change message logged on TC nightly builds (where agents run Ubuntu 20.04) but that's confusing since the transferring algorithm won't have changed.

Am I missing something?

Ah.. oops, 😊 Fixed, PTAL

Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@RaduBerinde
Copy link
Member Author

bors r+

@craig craig bot merged commit 44f66d4 into cockroachdb:master Aug 23, 2023
@craig
Copy link
Contributor

craig bot commented Aug 23, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants