Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: report health status during teardown #102636

Merged

Conversation

smg260
Copy link
Contributor

@smg260 smg260 commented Apr 28, 2023

roachtest: report health status of all nodes in teardown

In #102622, we changed how to determine if a node was suitable to connect to for post test validations, to first check a node's health?ready=1 endpoint.

This extends that PR by extracting the http://node:port/health?ready=1 functionality, and using it to report on each node during teardown by logging the status and/or body or error. Finding a live node piggy backs of this information to create a connection to any node with 200 status.

This could be useful to add to test_interface

Epic: none
Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@smg260 smg260 force-pushed the XXXXX_102603_roachtest_health_status branch from 64768ea to 32f3211 Compare April 28, 2023 20:37
@smg260 smg260 force-pushed the XXXXX_102603_roachtest_health_status branch from 32f3211 to 6558c93 Compare May 8, 2023 15:33
@smg260 smg260 changed the title Xxxxx 102603 roachtest health status roachtest: report health status during teardown May 15, 2023
@smg260 smg260 marked this pull request as ready for review May 15, 2023 22:01
@smg260 smg260 requested a review from a team as a code owner May 15, 2023 22:01
@smg260 smg260 requested review from herkolategan and renatolabs and removed request for a team May 15, 2023 22:01
Copy link
Collaborator

@herkolategan herkolategan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @renatolabs and @smg260)


pkg/cmd/roachtest/test_runner.go line 1106 at r1 (raw file):

		var validationNode int
		for _, s := range statuses {
			if s.Err != nil {

Nit: use of a switch or two continues might be cleaner / more go idiomatic like.


pkg/cmd/roachtest/test_runner.go line 1125 at r1 (raw file):

		// not seem to be enough.
		//
		// TODO(testinfra): figure out why this can still get stuck despite the

Nit: Should we update this comment or is the issue mentioned still likely to happen (leave the comment for future ref)?

@smg260 smg260 force-pushed the XXXXX_102603_roachtest_health_status branch from 6558c93 to 024c02d Compare May 16, 2023 13:40
Copy link
Contributor Author

@smg260 smg260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @herkolategan and @renatolabs)


pkg/cmd/roachtest/test_runner.go line 1106 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Nit: use of a switch or two continues might be cleaner / more go idiomatic like.

Thanks. Updated to be more in line with style guide


pkg/cmd/roachtest/test_runner.go line 1125 at r1 (raw file):

Previously, herkolategan (Herko Lategan) wrote…

Nit: Should we update this comment or is the issue mentioned still likely to happen (leave the comment for future ref)?

It is possible that the changes in this PR help, but I'd prefer to keep the comment there for now.

@smg260
Copy link
Contributor Author

smg260 commented May 16, 2023

pkg/cmd/roachtest/cluster.go line 1427 at r2 (raw file):

Previously, renatolabs (Renato Costa) wrote…

Can't this entire block just be results[node-1] = getStatus(ctx, node)? The main blocking call in that function is httputil.Get which gets passed the same context.

If there is a reason for not trusting context cancelation, we should document it.

Yes it can, thanks. An early version used http.Get which did not accept a context.

of all nodes in test teardown. Validations can use a db connection
from any node which has a status of 200.

Epic: none
Release note: none
Informs: cockroachdb#102603
@smg260 smg260 force-pushed the XXXXX_102603_roachtest_health_status branch from 024c02d to 67e1af7 Compare May 16, 2023 15:49
@renatolabs
Copy link
Contributor

Might be good to run a few roachtests on this branch before merging.

@smg260
Copy link
Contributor Author

smg260 commented May 17, 2023

TC job for this branch running a subset of roachtests

TC result: Tests passed: 45, ignored: 6; number of tests 51

@smg260
Copy link
Contributor Author

smg260 commented May 17, 2023

TFTR

bors r=herkolategan,renatolabs

@craig
Copy link
Contributor

craig bot commented May 17, 2023

Build succeeded:

@craig craig bot merged commit b02f647 into cockroachdb:master May 17, 2023
@smg260 smg260 deleted the XXXXX_102603_roachtest_health_status branch May 17, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants