Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: remove panic in Finish #83625

Merged
merged 1 commit into from
Jul 1, 2022
Merged

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented Jun 29, 2022

In #81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

  1. The assertion was on a code path shared with RecordingVerbose
    spans; and,

  2. A RecordingStructured span can have a RecordingVerbose child. The
    RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Fixes #83502

Release note: None

@stevendanna stevendanna requested a review from a team June 29, 2022 22:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna requested a review from adityamaru June 29, 2022 22:52
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix 💯

pkg/util/tracing/crdbspan.go Show resolved Hide resolved
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @stevendanna)


pkg/util/tracing/crdbspan.go line 547 at r1 (raw file):

		// We don't have space for this recording. Let's collect just the structured
		// records.
		for ci := range childRecording {

I think a better fix is to get rid of the assertion in the structured case below, and make that code deal with multiple spans too. That's how it was before #81079, and I'm not sure if there was a specific reason why it changed.

I think it's important for all this code to be fairly permissive in what recording it accepts. For example, as written, I think this code would crash if an RPC is made within a RecordingStructured span, and then the remote child is dynamically changed via \tracez to RecordingVerbose: a verbose, multi-span recording will come back and the client span will choke on it.

Please add the following test with this occassion:

// Test that a RecordingStructured parent does not panic when asked to ingest a
// remote verbose recording. Ingesting a recording of different type is unusual,
// since children are created with the parent's recording mode, but it can
// happen if the child's recording mode was changed dynamically.
func TestRemoteSpanWithDifferentRecordingMode(t *testing.T) {
	tr := NewTracer()
	s1 := tr.StartSpan("p", WithRecording(tracingpb.RecordingStructured))
	s2 := tr.StartSpan("c", WithRemoteParentFromSpanMeta(s1.Meta()), WithRecording(tracingpb.RecordingVerbose))
	s3 := tr.StartSpan("cc", WithParent(s2), WithRecording(tracingpb.RecordingVerbose))
	s3.Finish()
	r := s2.FinishAndGetConfiguredRecording()
	require.NotPanic(s1.ImportRemoteRecording(r))
	s1.Finish()
}

pkg/util/tracing/span_test.go line 408 at r1 (raw file):

}

// TestRecordingFinishWithMaxSpans is a regression test to ensure that a verbose

let's rephrase this test as Aditya was suggesting, and then I don't have to nitpick about this comment.
Call it TestRecordingDowngradesToStructuredIfTooBig and check that you end up with a recording with just one or two spans or whatever, and all the structured events.

@stevendanna stevendanna changed the title tracing: fix panic in Finish for verbose recordings tracing: remove panic in Finish Jun 30, 2022
Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @andreimatei)


pkg/util/tracing/crdbspan.go line 547 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think a better fix is to get rid of the assertion in the structured case below, and make that code deal with multiple spans too. That's how it was before #81079, and I'm not sure if there was a specific reason why it changed.

I think it's important for all this code to be fairly permissive in what recording it accepts. For example, as written, I think this code would crash if an RPC is made within a RecordingStructured span, and then the remote child is dynamically changed via \tracez to RecordingVerbose: a verbose, multi-span recording will come back and the client span will choke on it.

Please add the following test with this occassion:

// Test that a RecordingStructured parent does not panic when asked to ingest a
// remote verbose recording. Ingesting a recording of different type is unusual,
// since children are created with the parent's recording mode, but it can
// happen if the child's recording mode was changed dynamically.
func TestRemoteSpanWithDifferentRecordingMode(t *testing.T) {
	tr := NewTracer()
	s1 := tr.StartSpan("p", WithRecording(tracingpb.RecordingStructured))
	s2 := tr.StartSpan("c", WithRemoteParentFromSpanMeta(s1.Meta()), WithRecording(tracingpb.RecordingVerbose))
	s3 := tr.StartSpan("cc", WithParent(s2), WithRecording(tracingpb.RecordingVerbose))
	s3.Finish()
	r := s2.FinishAndGetConfiguredRecording()
	require.NotPanic(s1.ImportRemoteRecording(r))
	s1.Finish()
}

Done.


pkg/util/tracing/span_test.go line 408 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

let's rephrase this test as Aditya was suggesting, and then I don't have to nitpick about this comment.
Call it TestRecordingDowngradesToStructuredIfTooBig and check that you end up with a recording with just one or two spans or whatever, and all the structured events.

Done. We don't get all of the structured events because of the size of the ring buffer, but I've checked that we get as many as possible given the size of the buffer.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but let's wait for Aditya to seewhat he says about that assertion. I think at some point he was relying on it, but maybe that changed.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @stevendanna)


pkg/util/tracing/span_test.go line 460 at r2 (raw file):

	s3.Finish()
	r := s2.FinishAndGetConfiguredRecording()
	require.NotPanics(t, func() { s1.ImportRemoteRecording(r) })

please also test that s1's recording only has one span in it

In cockroachdb#81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

1) The assertion was on a code path shared with RecordingVerbose
   spans; and,

2) A RecordingStructured span can have a RecordingVerbose child. The
   RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

    panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Release note: None
@stevendanna
Copy link
Collaborator Author

bors r=adityamaru,

@craig
Copy link
Contributor

craig bot commented Jul 1, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 1, 2022

Build succeeded:

@craig craig bot merged commit 78caa5c into cockroachdb:master Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: follower-reads/mixed-version/single-region failed
4 participants