Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: aggregate OperationMetadata on span Finish() #81079

Merged
merged 1 commit into from
Jun 27, 2022

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented May 6, 2022

This change adds a ChildrenMetadata map to crdbspan that is a
mapping from operation to the operations' aggregated metadata. This
map is updated whenever a child of the crdbSpan finishes, with metadata
from all spans in the finishing childs' Recording. The map is therefore
a bucketed view of all the operations being traced by a span.

The motivation for this change is to surface more metadata about the
suboperations being traced in a spans' Recording. This could in turn provide
more o11y into why a job is slow/stuck, or where the performance of a
distributed operation is bottlenecked.

As part of a span Finish()ing, the span fetches its Recording with the spans'
configured verbosity. Prior to this change the recording would then be
processed as follows:

Verbose Recording

In the case of Verbose recording the spans in the recording are added to the parents'
finishedChildren slice provided we have not exceeded the maximum number of
children a parent can track.

Structured Recording

In the case of a Structured recording, only the StructuredEvents from the spans in
the recording are copied into the parent.

With this change, in both the Verbose and Structured recording mode, a finishing span
is also responsible for rolling up the OperationMetadata of all the spans in its
recording. This involves updating the parents' childrenMetadata mapping with:

  1. an entry for the finishing span.
  2. an entry for each of the finishing spans' Finish()ed children.
  3. an entry for each of the finishing spans' open children, and their children recursively

The logic for 2) and 3) is subsumed in the method responsible for getting
the finishing spans' recording. Notably, GetRecording(...) for both Structured and Verbose
recordings, populate the root of the recording with OperationMetadata of all
finished and open children in the recording.

As an example when we are done finishing child:

parent
  child (finished_C: 4s, finished_D: 3s)
    open_A (finished_B: 1s)
      finished_B
    finished_C (finished_D: 3s)
      finished_D

We'd expect parent to have:
{child: 10s, finished_C: 4s, finished_D: 3s, open_A: 3s, finished_B: 1s}

Given that Finish()ing a child, and importing a remote recording into a span
share the same code path, the above semantics also apply to a remote recording
being imported into a parent span.

Fixes: #80391

Release note: None

@adityamaru adityamaru requested a review from a team May 6, 2022 00:16
@adityamaru adityamaru requested review from a team as code owners May 6, 2022 00:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru requested review from andreimatei and stevendanna and removed request for a team May 6, 2022 00:17
@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from fda6549 to ec175fc Compare May 6, 2022 00:19
@adityamaru
Copy link
Contributor Author

I'd review #81020 first, and then I'll rebase this on top of that.

@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from ec175fc to c6667ea Compare May 16, 2022 00:48
@adityamaru adityamaru requested a review from a team as a code owner May 16, 2022 00:48
@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from c6667ea to 2e263fb Compare May 18, 2022 16:46
@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from 2e263fb to e522300 Compare June 7, 2022 02:02
@adityamaru adityamaru changed the title tracing: roll up Finish()ed children durations in parent span tracing: aggregate OperationMetadata on span Finish() Jun 7, 2022
@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch 2 times, most recently from 1487c26 to 607e18a Compare June 7, 2022 14:39
@adityamaru adityamaru requested a review from benbardin June 7, 2022 14:39
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @benbardin, and @stevendanna)


-- commits line 15 at r3:
"its" what?


pkg/util/tracing/crdbspan.go line 128 at r3 (raw file):

	// children in its Recording. childrenMetadata therefore provides a bucketed
	// view of the various operations that are being traced as part of a span.
	childrenMetadata map[string]tracingpb.RecordedSpan_OperationMetadata

does this belong in recording? (i.e., it's only populated when the span is recording, right?)


pkg/util/tracing/crdbspan.go line 543 at r3 (raw file):

		//       finished_D
		//
		// We'd expect `parent` to have:

s/we'd expect parent to have/parent will have...


pkg/util/tracing/crdbspan.go line 547 at r3 (raw file):

		rootChild := childRecording[0]
		// Record the finished rootChilds' metadata.
		s.mu.childrenMetadata[rootChild.Operation] = s.mu.childrenMetadata[rootChild.Operation].Combine(

Instead of all this, I think it'd be better if recordings always included the ChildrenMetadata. Have you considered having GetRecording() do all this aggregation?


pkg/util/tracing/crdbspan.go line 555 at r3 (raw file):

		// Record the metadata of rootChilds' finished children.
		//
		// ChildrenMetadata was populated in GetRecording(...) with rootChilds'

I think this comment lies. GetRecording() doesn't seem to have a role here, does it?


pkg/util/tracing/tracingpb/recorded_span.proto line 112 at r2 (raw file):

  reserved 5,10,11;

  // Next ID: 19

nit: I don't see this elsewhere around here. I don't think it's necessary.

Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @benbardin, and @stevendanna)


pkg/util/tracing/crdbspan.go line 547 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Instead of all this, I think it'd be better if recordings always included the ChildrenMetadata. Have you considered having GetRecording() do all this aggregation?

Every tracingpb.RecordedSpan in a Recording will always have its ChildrenMetadata populated in GetRecording() via the call to getRecordingNoChildrenLocked. This represents the spans' Finish()ed children. When the span is finishing we need to do more than just that:

  1. We need to create an entry for the finishing span itself, which is what we are doing here.
  2. We need to recurse on the finishing spans' open children and create entries for them and their children.

I don't think it makes sense to do 1) in GetRecording because I do not expect the span whose recording I am getting to be part of the ChildrenMetadata map I am populating in the recording. That is why we do it here when we are finishing the span and updating its parent.

  1. is done during GetRecording for Structured because of the fact that we only return 1 tracingpb.RecordedSpan. I'm not convinced it makes sense to do it in the case of GetRecording for Verbose since we do return all tracingpb.RecordedSpans in the Recording, so we can do the recursion on render/finish. This seems more in line with our definition of the two modes?

pkg/util/tracing/crdbspan.go line 555 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think this comment lies. GetRecording() doesn't seem to have a role here, does it?

getRecordingNoChildrenLocked which is called from GetRecording in the Verbose mode is responsible for populating ChildrenMetadata in each tracingpb.RecordedSpan. It is populated with all the Finish()ed children of the span at the time the recording is pulled.

@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from 607e18a to 0379b5c Compare June 8, 2022 15:14
Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @benbardin, and @stevendanna)


-- commits line 15 at r3:

Previously, andreimatei (Andrei Matei) wrote…

"its" what?

its Recording - fixed


pkg/util/tracing/crdbspan.go line 128 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

does this belong in recording? (i.e., it's only populated when the span is recording, right?)

Done.


pkg/util/tracing/crdbspan.go line 543 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/we'd expect parent to have/parent will have...

Done.

@adityamaru adityamaru requested a review from andreimatei June 8, 2022 15:14
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @benbardin, and @stevendanna)


pkg/util/tracing/crdbspan.go line 547 at r3 (raw file):

I don't think it makes sense to do 1) in GetRecording because I do not expect the span whose recording I am getting to be part of the ChildrenMetadata map I am populating in the recording. That is why we do it here when we are finishing the span and updating its parent.

This part makes sense.

2 is done during GetRecording for Structured because of the fact that we only return 1 tracingpb.RecordedSpan. I'm not convinced it makes sense to do it in the case of GetRecording for Verbose since we do return all tracingpb.RecordedSpans in the Recording, so we can do the recursion on render/finish. This seems more in line with our definition of the two modes?

I think verbose and structured recordings should behave the same with respect to returning the childrenMetadata; I think they should both return it, and they should both include the stats about open children. Even though one can recurse through a verbose recording and compute whatever they want, I think it'd be very useful that this standardized childrenMeta is always there (by the argument that a verbose recording can be processed by the caller, you wouldn't need to include the stats of the finished spans either). Otherwise, I'm pretty sure the difference in this respect between structured recordings and verbose recordings is going to surprise us every time.
Also, I think the distinction you're trying to draw about open children and finished children is inconsistent. In a structure like the following:

root
   c1
     c11
  c2 

if c1 finishes, and I collect root's (verbose) recording, I get: {c11, unfinished:true}, {c1}. So, I get info about some unfinished children (c11) but not others (c2). That's surprising!

WDYT?

Not to mention that, if we make verbose recordings include the stats for open children, the code in this function should be uniform regardless of the recording mode. As it stands, there's too much code :P. I think we should aim for this code to only need to look at rootChild and not the other spans.


pkg/util/tracing/crdbspan.go line 555 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

getRecordingNoChildrenLocked which is called from GetRecording in the Verbose mode is responsible for populating ChildrenMetadata in each tracingpb.RecordedSpan. It is populated with all the Finish()ed children of the span at the time the recording is pulled.

I see.


pkg/util/tracing/crdbspan.go line 178 at r4 (raw file):

	// children in its Recording. childrenMetadata therefore provides a bucketed
	// view of the various operations that are being traced as part of a span.
	childrenMetadata map[string]tracingpb.RecordedSpan_OperationMetadata

since we're using this proto in code that otherwise does not have much to do with RecordedSpan, let's make the OperationMetadata be defined as a top-level message, instead of a sub-message inside of RecordedSpan


pkg/util/tracing/crdbspan.go line 573 at r4 (raw file):

		// been flattened into `childRecording` by `GetRecording(...)`
		for _, rec := range childRecording[1:] {
			if !rec.Finished {

see, I think this if !rec.Finished is really ugly. Ain't nobody gonna understand what's going on.

Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @benbardin, and @stevendanna)


pkg/util/tracing/crdbspan.go line 547 at r3 (raw file):

That's surprising!

Indeed, you're right. It makes sense to push this logic into GetRecording for both Verbose and Structured. I've changed the PR to reflect this.


pkg/util/tracing/crdbspan.go line 555 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I see.

Done.


pkg/util/tracing/crdbspan.go line 178 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

since we're using this proto in code that otherwise does not have much to do with RecordedSpan, let's make the OperationMetadata be defined as a top-level message, instead of a sub-message inside of RecordedSpan

Done.


pkg/util/tracing/crdbspan.go line 573 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

see, I think this if !rec.Finished is really ugly. Ain't nobody gonna understand what's going on.

agreed, moved this logic to GetRecording.

@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from 0379b5c to b31ca53 Compare June 20, 2022 18:30
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider including the new stats in Recording.String(). If it turns out to be overwhelming, we can back it out, or trim it, or something. But I'd be curious to see how the full thing looks for starters.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, @benbardin, and @stevendanna)

@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from e94c3ec to 4da1e18 Compare June 23, 2022 17:55
Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this yup, I was going to work on making this usable from the built-in that pulls a recording as the next PR. As part of that I'll also stick this in the stringer.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @benbardin, and @stevendanna)


pkg/util/tracing/crdbspan.go line 492 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: join these two paragraphs into one, to make it clear that they're in contrast to one another. Otherwise, I don't think the first one will mean much to readers.

I think you were referring to the two comments. I've deleted one of them since it wasn't really adding much.


pkg/util/tracing/crdbspan.go line 496 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Looks better!

Done.


pkg/util/tracing/crdbspan.go line 430 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Instead of taking this lock, I think it'd be better to take the data you need from openChildRecording[0]. It's all there, right?

Done.


pkg/util/tracing/crdbspan.go line 443 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can't this be

childrenMetadata[opName] = childrenMetadata[opName].Combine(metadata)

?

If so, consider extracting this loop into a function, called perhaps RollupChildrenMeta(childrenMeta, openChildRecording[0].ChildrenMetadata), cause you have the same loop elsewhere and I think it'd benefit from a name.

Done.


pkg/util/tracing/crdbspan.go line 457 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/from s' children/from s' open children

I scratched my head for a bit trying to understand if the Combine calls below are necessary

childrenMetadata contains entries from both open and finished children, so here we're just rolling up all those entries into the root of the recording.


pkg/util/tracing/crdbspan.go line 495 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

instead of adding this new includeRootStructuredEvents param, I think I'd rather keep getChildrenStructuredEventsRecursivelyLocked (and rename it back to getStructuredEventsRecursivelyLocked), always including the root, and assign res.StructuredRecords = buffer below. And then the whole bit below about managing the space in res.StructuredRecords goes away. I think the only thing that we lose is wasted work copying the structured events in getRecordingNoChildrenLocked, but I don't think that's a big deal.

I'd write this as

res := s.getRecordingNoChildrenLocked(
	tracingpb.RecordingStructured,
	false, // finishing - since we're only asking for the structured recording, the argument doesn't matter
)
res.StructuredRecords  = s.getStructuredEventsRecursivelyLocked(res.StructuredRecords[:], includeDetachedChildren)

hmm res.StructuredRecords = s.getStructuredEventsRecursivelyLocked(res.StructuredRecords[:], includeDetachedChildren) would double count the structured records in the root won't it? res.StructuredRecords[:] would already have the events for the root, and then we'd again append the roots' events to that slice.

I've changed it to take in a buffer instead and then deleted the entire block around memory management below.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @benbardin, and @stevendanna)


pkg/util/tracing/crdbspan.go line 457 at r6 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

childrenMetadata contains entries from both open and finished children, so here we're just rolling up all those entries into the root of the recording.

oh so then can we replace this whole block with rootSpan.ChildrenMetada = childrenMetadata ?


pkg/util/tracing/crdbspan.go line 495 at r6 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

hmm res.StructuredRecords = s.getStructuredEventsRecursivelyLocked(res.StructuredRecords[:], includeDetachedChildren) would double count the structured records in the root won't it? res.StructuredRecords[:] would already have the events for the root, and then we'd again append the roots' events to that slice.

I've changed it to take in a buffer instead and then deleted the entire block around memory management below.

ugh, sorry, I meant res.StructuredRecords[:0]. I think you can use that to avoid allocating buffer.


pkg/util/tracing/crdbspan.go line 411 at r7 (raw file):

	}

	s.mu.Lock()

nit: put everything until the unlock in a { } block for visibility, now that the block has gotten pretty large.


pkg/util/tracing/crdbspan.go line 480 at r7 (raw file):

	defer s.mu.Unlock()

	// Fetch s' recording without including its children.

nit: this comment doesn't add anything to the function name


pkg/util/tracing/crdbspan.go line 494 at r7 (raw file):

	// open.
	childrenMetadata := make(map[string]tracingpb.OperationMetadata)
	s.getChildrenMetadataRecursivelyLocked(childrenMetadata,

can we pass in res.ChildrenMetadata here, and get rid of childrenMetadata

@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from 4da1e18 to e78b1e4 Compare June 24, 2022 15:30
Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @benbardin, and @stevendanna)


pkg/util/tracing/crdbspan.go line 457 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

oh so then can we replace this whole block with rootSpan.ChildrenMetada = childrenMetadata ?

Done.


pkg/util/tracing/crdbspan.go line 495 at r6 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

ugh, sorry, I meant res.StructuredRecords[:0]. I think you can use that to avoid allocating buffer.

Done.


pkg/util/tracing/crdbspan.go line 494 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can we pass in res.ChildrenMetadata here, and get rid of childrenMetadata

Done.

@adityamaru
Copy link
Contributor Author

I'm seeing the CI failure on other builds in staging too https://teamcity.cockroachdb.com/viewLog.html?buildId=5547912&tab=buildResultsDiv&buildTypeId=Cockroach_UnitTests_CheckGeneratedCode.

I'll bring it up with whoever owns web UI tests but I don't think it is related to this diff.

TFTR!

bors r=andreimatei

@craig
Copy link
Contributor

craig bot commented Jun 24, 2022

Build failed:

@adityamaru
Copy link
Contributor Author

bors r=andreimatei

@craig
Copy link
Contributor

craig bot commented Jun 27, 2022

Build failed:

This change adds a `ChildrenMetadata` map to `crdbspan` that is a
mapping from operation to the operations' aggregated metadata. This
map is updated whenever a child of the `crdbSpan` finishes, with metadata
from all spans in the finishing childs' Recording. The map is therefore
a bucketed view of all the operations being traced by a span.

The motivation for this change is to surface more metadata about the
suboperations being traced in a spans' Recording. This could in turn provide
more o11y into why a job is slow/stuck, or where the performance of a
distributed operation is bottlenecked.

As part of a span Finish()ing, the span fetches its Recording with the spans'
configured verbosity. Prior to this change the recording would then be
processed as follows:

*Verbose Recording*

In the case of Verbose recording the spans in the recording are added to the parents'
`finishedChildren` slice provided we have not exceeded the maximum number of
children a parent can track.

*Structured Recording*

In the case of a Structured recording, only the StructuredEvents from the spans in
the recording are copied into the parent.

With this change, in both the Verbose and Structured recording mode, a finishing span
is also responsible for rolling up the OperationMetadata of all the spans in its
recording. This involves updating the parents' `childrenMetadata` mapping with:

1) an entry for the finishing span.
2) an entry for each of the finishing spans' Finish()ed children.
3) an entry for each of the finishing spans' open children, and their children recursively

The logic for 2) and 3) is subsumed in the method responsible for getting
the finishing spans' recording. Notably, GetRecording(...) for both Structured and Verbose
recordings, populate the root of the recording with OperationMetadata of all
finished and open children in the recording.

As an example when we are done finishing `child`:

```
parent
  child
    grandchild
```

We'd expect `parent` to have:
`{child: 2s, grandchild: 1s}`

Given that Finish()ing a child, and importing a remote recording into a span
share the same code path, the above semantics also apply to a remote recording
being imported into a parent span.

Fixes: cockroachdb#80391

Release note: None
@adityamaru adityamaru force-pushed the aggregate-child-op-duration branch from e78b1e4 to 4ddc350 Compare June 27, 2022 14:34
@adityamaru
Copy link
Contributor Author

Now its TestDockerCLI acting up 😢 , let's try again.

bors r=andreimatei

@craig
Copy link
Contributor

craig bot commented Jun 27, 2022

Build failed:

@adityamaru
Copy link
Contributor Author

Another flake, this time on #83253

bors r=andreimatei

@craig craig bot merged commit 9dfdf1b into cockroachdb:master Jun 27, 2022
@craig
Copy link
Contributor

craig bot commented Jun 27, 2022

Build succeeded:

@adityamaru adityamaru deleted the aggregate-child-op-duration branch June 29, 2022 14:52
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Jun 29, 2022
When the children of a finished span won't fit in a parent with a
verbose recording type, we don't store the children but still want
their structured records.

Unfortunately, we did this in the recordFinishedChildredLocked case by
falling through from the verbose recording case to the structured
recording case. In cockroachdb#81079, we added an assertion in the structured
recording case that only applies to structured recordings.

The result is Finish() failing with:

   panic: RecordingStructured has 12 recordings; expected 1

Now, we don't fall through and instead gather the structured records
of all children.

Release note: None
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Jun 30, 2022
In cockroachdb#81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

1) The assertion was on a code path shared with RecordingVerbose
   spans; and,

2) A RecordingStructured span can have a RecordingVerbose child. The
   RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

    panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Release note: None
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Jun 30, 2022
In cockroachdb#81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

1) The assertion was on a code path shared with RecordingVerbose
   spans; and,

2) A RecordingStructured span can have a RecordingVerbose child. The
   RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

    panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Release note: None
craig bot pushed a commit that referenced this pull request Jul 1, 2022
83625: tracing: remove panic in Finish r=adityamaru, a=stevendanna

In #81079, we added an assertion that failed if a child recording of a
RecordingStructured span had more than one span recording. However,
this is problematic for couple of reasons:

1) The assertion was on a code path shared with RecordingVerbose
   spans; and,

2) A RecordingStructured span can have a RecordingVerbose child. The
   RecordingVerbose child is likely to have more than one span recording.

As a result, we have seen roachtests failing with

    panic: RecordingStructured has 12 recordings; expected 1

Here, we remove the assertion.

Fixes #83502

Release note: None

83642: server/systemconfigwatcher/systemconfigwatchertest: fix race r=ajwerner a=ajwerner

Fixes #83258

```
==================
WARNING: DATA RACE
Write at 0x00c001b75c00 by goroutine 264:
  github.com/cockroachdb/cockroach/pkg/roachpb.KeyValueByKey.Swap()
      github.com/cockroachdb/cockroach/pkg/roachpb/pkg/roachpb/data.go:2400 +0x190
  github.com/cockroachdb/cockroach/pkg/roachpb.(*KeyValueByKey).Swap()
      <autogenerated>:1 +0xb3
  sort.medianOfThree()
      GOROOT/src/sort/sort.go:90 +0x72
  sort.doPivot()
      GOROOT/src/sort/sort.go:114 +0x9e
  sort.quickSort()
      GOROOT/src/sort/sort.go:203 +0xac
  sort.Sort()
      GOROOT/src/sort/sort.go:231 +0x64
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest.runTest.func2()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest/test_system_config_watcher.go:110 +0x4d1
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest.runTest.func3.1()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest/test_system_config_watcher.go:118 +0x4e
  github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithinError.func1()
      github.com/cockroachdb/cockroach/pkg/testutils/soon.go:69 +0x7a
  github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration()
      github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:207 +0x191
  github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithinError()
      github.com/cockroachdb/cockroach/pkg/testutils/soon.go:75 +0x1d1
  github.com/cockroachdb/cockroach/pkg/testutils.SucceedsWithin()
      github.com/cockroachdb/cockroach/pkg/testutils/soon.go:57 +0x96
  github.com/cockroachdb/cockroach/pkg/testutils.SucceedsSoon()
      github.com/cockroachdb/cockroach/pkg/testutils/soon.go:40 +0x8d
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest.runTest.func3()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest/test_system_config_watcher.go:117 +0x164
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest.runTest()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest/test_system_config_watcher.go:121 +0x5cf
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest.TestSystemConfigWatcher.func1()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/systemconfigwatchertest/test_system_config_watcher.go:62 +0xdc
  testing.tRunner()
      GOROOT/src/testing/testing.go:1259 +0x22f
  testing.(*T).Run·dwrap·21()
      GOROOT/src/testing/testing.go:1306 +0x47

Previous read at 0x00c001b75c00 by goroutine 502:
  runtime.slicecopy()
      GOROOT/src/runtime/slice.go:284 +0x0
  github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedbuffer.MergeKVs()
      github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedbuffer/kvs.go:47 +0x354
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher.(*Cache).handleUpdate()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/cache.go:253 +0x431
  github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher.(*Cache).handleUpdate-fm()
      github.com/cockroachdb/cockroach/pkg/server/systemconfigwatcher/cache.go:232 +0x11e
  github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache.(*Watcher).handleUpdate()
      github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go:346 +0x277
  github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache.(*Watcher).Run()
      github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go:327 +0x1b04
  github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache.Start.func1()
      github.com/cockroachdb/cockroach/pkg/kv/kvclient/rangefeed/rangefeedcache/watcher.go:189 +0x3d1
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:494 +0x551
```

Clone the values to make sure we don't have a race.

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracing: parent span should aggregate durations for finished child spans
3 participants