bulk: introduce an ingestionTracer to the bulk ingestion pipeline #80198

adityamaru · 2022-04-19T21:22:21Z

This change introduces an ingestionTracer that is associated with
a tracing span and is responsible for adding ingestion information
to the trace in the form of lazy tags. It currently aggregates information
about AddSSTable, AdminSplit and AdminScatter requests that are issued during
ingestion.

As part of introducing the tracer, all RPC requests during ingestion are
now made in their own child span, and allow for us to inspect the returned trace
to aggregate relevant information. This is beneficial if egs: there is a stuck AdminScatter
as it will show up on the tracez page as a long-running span.

The Render methods responsible for displaying the aggregated information are
just stubs that can be iterated on in the future. Egs: instead of displaying
the throughput/duration of each RPC request at the level of the import/restore
processor, we might want to bubble up the information to the flow coordinator
who is then responsible for aggregating this information across nodes and exposing
it. The ingestion tracer gives us a building block to capture and push relevant
trace information further up the stack.

Release note: None

cockroach-teamcity · 2022-04-19T21:22:30Z

This change is

adityamaru · 2022-04-19T21:28:53Z

There's obviously a lot more to do but I wanted to start somewhere and get your take on this. Some follow up work I have in mind is:

Teach every processor to send back relevant traces to the coordinator periodically. Display this bucketed information on the job coordinator tracing span so that we can see the per-node throughput. I was thinking we could also do something like for the past x epochs node 1 has been seeing 20Mb/s addsstable throughput. If this drops below some threshold then we can highlight this on the tracez page.
Add tracing to our external storage call sites, and try and get a throughput metric for reads/writes.
Make lazy tags/tags collapsable so that we don't clutter the tracez page when a span has a lot of these tags displaying information.
Make structured events in the tracez page way less noisy: https://cockroachlabs.slack.com/archives/C01CNRP6TSN/p1650328483705979

andreimatei

Teach every processor to send back relevant traces to the coordinator periodically. Display this bucketed information on the job coordinator tracing span so that we can see the per-node throughput.

I've got two thoughts here:

Since jobs record their traces, it shouldn't be necessary to send information from... some RPC spans to the coordinator span. I think it should be enough to teach tracez to collect spans from around the cluster (which I hope we'll do soon), and to aggregate data at render time. Modulo the fact that there is a limit to the number of child spans that a parent can accumulate. I think we should have rollup logic that keeps stats around when this limit is hit.
Depending exactly on how these import jobs collect their traces, you might also have a problem because the recordings of RPC spans might be lost if these RPCs are not Batch. Batch is the only RPC that can return a recording from the server-side. DistSQL processors have their own way of collecting recordings. We can talk about this more. Perhaps after I read this PR I'll understand what RPCs we're talking about.
FWIW, I've been thinking a lot about timeseries recently, in particular the timeseries of the "dynamic", high-cardinality kind. Like, for example, different timeseries with a job_id label. With this Observability Service project that I'm agitating for I want to make it feasible and pleasant to have such timeseries. So my hope is that, in the future, you'll be able to have each node participating in a job expose counters related to that job, and have them all sampled into timeseries that are easy to aggregate, etc. So it should be feasible, perhaps easy, to have a page in the console that gives you an ingestion rate for a job over time.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @dt)

pkg/kv/sender.go, line 458 at r2 (raw file):

	ah roachpb.AdmissionHeader,
	args roachpb.Request,
) (*roachpb.BatchResponse, roachpb.Response, *roachpb.Error) {

mmm I don't know about this change. I think returning both BatchResponse and Response will be very confusing. I'd leave this function alone and copy the code over in SendWrappedWithTracing.

pkg/kv/bulk/ingestion_tracer.go, line 41 at r2 (raw file):

		syncutil.Mutex

		sendWait time.Duration

comment the fields pls

pkg/kv/bulk/ingestion_tracer.go, line 116 at r2 (raw file):

}

func (a *addSSTableTag) notify(rec tracing.Recording) error {

comment the function and the arg pls

pkg/kv/bulk/ingestion_tracer.go, line 122 at r2 (raw file):

	sp, found := rec.FindSpan(string(AddSSTableOpName))
	if !found {
		return errors.AssertionFailedf("expected recording to contain %s span", AddSSTableOpName)

I wonder if this error is wise, as opposed to tolerating whatever recording (for example an empty one). Generally speaking I think we should treat recordings as best-effort. I don't know if in this particular case the callers are tight enough to be in a position to make strong assumptions.

pkg/kv/bulk/ingestion_tracer.go, line 148 at r2 (raw file):

		// Should only ever really be one iteration but if somehow it isn't,
		// e.g. if a request was redirected, go ahead and count it against all
		// involved stores; if it is small this edge case is immaterial, and

who's "it"? This comment seemingly says "should ever be one" but then gives an example of how that might not be the case. Which one is it?
If possible, I think it'd be good to tighten AddSSTableStats to a single store, perhaps in recordRequestStats(). Like, maybe we should just take the last one? Or maybe we should move away from the vagaries of BatchResponse.RangeInfos, and get more clear information from the AddSSTableResponse?

Code quote:

if it is small

pkg/kv/bulk/ingestion_tracer.go, line 191 at r2 (raw file):

// Render implements the LazyTag interface.
//
// TODO(adityamaru): TBD on what to render.

nit: I don't know if this TODO serves any purpose. I'd either say more or delete it.

pkg/kv/bulk/sender.go, line 35 at r1 (raw file):

const AdminScatterOpName = "AdminScatter"

func recordRequestStats(

please give this a comment and specify which args can be nil

pkg/kv/bulk/sender.go, line 36 at r1 (raw file):

func recordRequestStats(
	sp *tracing.Span, req roachpb.Request, resp roachpb.Response, br *roachpb.BatchResponse,

I think it's confusing to pass in both resp and br. I'd either make it take resp and RangeInfos or just br. In the latter case you can make the function iterate over the potentially multiple responses in the BatchResponse, even though in practice perhaps there'll ever only be one.

pkg/kv/bulk/sender.go, line 45 at r1 (raw file):

	case *roachpb.AddSSTableResponse:
		storeIDs := make([]roachpb.StoreID, 0)
		if br != nil && len(br.BatchResponse_Header.RangeInfos) > 0 {

nit: I don't think you need the len(br.BatchResponse_Header.RangeInfos) > 0 part

pkg/kv/bulk/sender.go, line 60 at r1 (raw file):

	case *roachpb.AdminScatterResponse:
		if resp.MVCCStats != nil {
			moved := resp.MVCCStats.Total()

please take the opportunity to give AdminScatterResponse.MVCCStats. And also AdminScatterResponse.RangeInfos for that matter.

pkg/kv/bulk/sender.go, line 60 at r1 (raw file):

	case *roachpb.AdminScatterResponse:
		if resp.MVCCStats != nil {
			moved := resp.MVCCStats.Total()

I can't tell if resp.MVCCStats corresponds to "data moved". I think we accumulate the stats of all ranges (*) in the scatter's key span, but we don't seem to account for what how many replicas moved (which could be many and also zero, right?).

(*) But actually, do we? I don't see us touching the MVCCStats in combine(). Is that missing?

pkg/kv/bulk/sender.go, line 63 at r1 (raw file):

			sp.RecordStructured(&roachpb.AdminScatterStats{
				DataMoved: moved,

nit: inline the moved

pkg/kv/bulk/sender.go, line 72 at r1 (raw file):

It is not valid to pass a nil
// context; since the method is expected to be used with a tracing span that is
// recording.

Nil contexts are rarely acceptable. Let's drop this comment unless you had specific reasons for it. And also drop the protection.

pkg/kv/bulk/sender.go, line 23 at r2 (raw file):

)

type requestOpName string

nit: I suspect this type will cause more annoyance because of the cast then benefit

pkg/kv/bulk/sender.go, line 74 at r2 (raw file):

}

// SendWrappedWithTracing is a convenience function which wraps the request in a

does this function need to be exported? I'd much rather if it wasn't. In fact I find the semantics of this function weird because FinishAndGetConfiguredRecording() asks for the "configured" recording, but the function itself doesn't configure any recording. So the parent's recording mode matters, which is pretty hidden. Since it only has one caller, I'd get rid of this function. I also liked the old code in sst_batcher.go better when it was constructing a roachpb.Batch by hand. These SendWrapped functions are generally unnecessary fluff, in my opinion.

pkg/kv/bulk/sender.go, line 102 at r2 (raw file):

}

// AdminSplitWithTracing is a utility method that expects a non-nil context and

Document that the parent's recording mode matters and nil is returned if the parent is not recording.
Consider taking the recording mode in explicitly. And/or consider moving this function inside SSTBatcher.doFlush.

At the very least, I wouldn't export this.

pkg/kv/bulk/sst_batcher.go, line 133 at r1 (raw file):

func MakeSSTBatcher(
	ctx context.Context,
	name string,

consider taking the opportunity to comment what this name is used for.

pkg/kv/bulk/sst_batcher.go, line 141 at r1 (raw file):

) (*SSTBatcher, error) {
	sp := tracing.SpanFromContext(ctx)
	i, err := newIngestionTracer(name, sp)

nit: find a better name than i

pkg/kv/bulk/sst_batcher.go, line 142 at r1 (raw file):

	sp := tracing.SpanFromContext(ctx)
	i, err := newIngestionTracer(name, sp)
	if err != nil {

nit: inline sp

pkg/roachpb/api.proto, line 1596 at r1 (raw file):

message AdminScatterStats {
  int64 data_moved = 1;

data_moved_bytes

pkg/roachpb/api.proto, line 1595 at r2 (raw file):

}

message AdminScatterStats {

please give this a comment

pkg/roachpb/api.proto, line 1803 at r2 (raw file):

}

message AddSSTableStats {

please give comments to all the fields

pkg/roachpb/api.proto, line 1804 at r2 (raw file):

message AddSSTableStats {
  int64 data_size = 1;

data_size_bytes

pkg/roachpb/api.proto, line 1805 at r2 (raw file):

message AddSSTableStats {
  int64 data_size = 1;
  repeated int32 store_ids = 2 [(gogoproto.casttype) = "StoreID"];

nit: perhaps just stores

adityamaru · 2022-04-20T18:28:05Z

Modulo the fact that there is a limit to the number of child spans that a parent can accumulate. I think we should have rollup logic that keeps stats around when this limit is hit.

This was one of the reasons why I thought a processor should periodically roll up its information into ComponentStats and send it back periodically. I guess sending it back is similar to just emitting the aggregated information as a structured event in the processor. tracez cluster wide fanout can then collect these rolled-up events and render them.

adityamaru

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @dt)

pkg/kv/sender.go, line 458 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

mmm I don't know about this change. I think returning both BatchResponse and Response will be very confusing. I'd leave this function alone and copy the code over in SendWrappedWithTracing.

Done.

pkg/kv/bulk/ingestion_tracer.go, line 41 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

comment the fields pls

Done.

pkg/kv/bulk/ingestion_tracer.go, line 116 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

comment the function and the arg pls

Done.

pkg/kv/bulk/ingestion_tracer.go, line 122 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I wonder if this error is wise, as opposed to tolerating whatever recording (for example an empty one). Generally speaking I think we should treat recordings as best-effort. I don't know if in this particular case the callers are tight enough to be in a position to make strong assumptions.

good point, changed it to a warning.

pkg/kv/bulk/ingestion_tracer.go, line 148 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

who's "it"? This comment seemingly says "should ever be one" but then gives an example of how that might not be the case. Which one is it?
If possible, I think it'd be good to tighten AddSSTableStats to a single store, perhaps in recordRequestStats(). Like, maybe we should just take the last one? Or maybe we should move away from the vagaries of BatchResponse.RangeInfos, and get more clear information from the AddSSTableResponse?

The comment above range_infos is:

    // The server may also include additional RangeInfo objects if it suspects
    // that the client may be interested in them. This is currently the case
    // immediately after a Range split, where a stale client_range_info from
    // before the split is taken to be an indication that the client may be
    // interested in information about both the LHS and RHS post-split ranges.

Which makes me believe that we can't just take the last one cause we don't know which one actually served the request? I've improved the comment here and in our existing code, but I'm still looking into whether we can tag on the StoreID to AddSSTableResponse to at least get an authoritative list of all the stores that did in fact evaluate the request.

I also see we don't have a combine() for AddSSTableRequest which seems buggy. I'll address these in a follow-up since this PR is already doing a fair bit.

pkg/kv/bulk/ingestion_tracer.go, line 191 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I don't know if this TODO serves any purpose. I'd either say more or delete it.

Removed, we're probably going to come back and change what we show, and how we show it but doesn't warrant a TODO.

pkg/kv/bulk/sender.go, line 35 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please give this a comment and specify which args can be nil

Done.

pkg/kv/bulk/sender.go, line 36 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think it's confusing to pass in both resp and br. I'd either make it take resp and RangeInfos or just br. In the latter case you can make the function iterate over the potentially multiple responses in the BatchResponse, even though in practice perhaps there'll ever only be one.

done, switched to just passing in br.

pkg/kv/bulk/sender.go, line 45 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I don't think you need the len(br.BatchResponse_Header.RangeInfos) > 0 part

copy paste from elsewhere, deleted.

pkg/kv/bulk/sender.go, line 60 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please take the opportunity to give AdminScatterResponse.MVCCStats. And also AdminScatterResponse.RangeInfos for that matter.

Done.

pkg/kv/bulk/sender.go, line 60 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I can't tell if resp.MVCCStats corresponds to "data moved". I think we accumulate the stats of all ranges (*) in the scatter's key span, but we don't seem to account for what how many replicas moved (which could be many and also zero, right?).

(*) But actually, do we? I don't see us touching the MVCCStats in combine(). Is that missing?

you're right, I think this was added while debugging a slow IMPORT to sanity check that the ranges we were touching during import issued AdminScatters were mostly empty. To this effect, I think calling this dataInScatteredRanges might be more appropriate. Though I do agree the more useful metric is indeed the size of the ranges that were actually scattered. I can send a patch that inspects the output of the replica queue and increments the size only when the replica has actually moved.

And yes, the combine() looks like a bug. Sending out a patch.

pkg/kv/bulk/sender.go, line 72 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

It is not valid to pass a nil
// context; since the method is expected to be used with a tracing span that is
// recording.

Nil contexts are rarely acceptable. Let's drop this comment unless you had specific reasons for it. And also drop the protection.

All the SendWith methods in pkg/kv/sender.go have a comment saying nil ctx is okay and in that case, they'll use an empty context. That's the only reason I added this comment since we pull a child span from the context which would NPE.

pkg/kv/bulk/sender.go, line 23 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I suspect this type will cause more annoyance because of the cast then benefit

When the method was taking an opName from the caller, I wanted to make the sure the user was particular about using one of these defined opNames. This is because the ingestionTracer uses tracing.FindSpan(opName) to find the relevant trace in the recording, and so it was important they both matched up. This doesn't hold anymore since I have removed opName as a parameter, and made it the methods responsibility to resolve the opName based on the type of the request.

pkg/kv/bulk/sender.go, line 74 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

does this function need to be exported? I'd much rather if it wasn't. In fact I find the semantics of this function weird because FinishAndGetConfiguredRecording() asks for the "configured" recording, but the function itself doesn't configure any recording. So the parent's recording mode matters, which is pretty hidden. Since it only has one caller, I'd get rid of this function. I also liked the old code in sst_batcher.go better when it was constructing a roachpb.Batch by hand. These SendWrapped functions are generally unnecessary fluff, in my opinion.

I anticipate ExportRequest during backup also calling this method. Currently it uses SendWrappedWithAdmissionHeader directly. Your point about the lack of clarity about the trace configuration is a good one, and similar to your suggestion below I'll change this method to take in the recording type explicitly.

pkg/kv/bulk/sender.go, line 102 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Document that the parent's recording mode matters and nil is returned if the parent is not recording.
Consider taking the recording mode in explicitly. And/or consider moving this function inside SSTBatcher.doFlush.

At the very least, I wouldn't export this.

yup this makes sense. I've unexported the method and changed it to take the recording mode explicitly. If in the future someone outside the package wants to use it, we can export it.

pkg/kv/bulk/sst_batcher.go, line 133 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider taking the opportunity to comment what this name is used for.

added a comment to the field in the SSTBatcher.

pkg/kv/bulk/sst_batcher.go, line 141 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: find a better name than i

Done.

pkg/roachpb/api.proto, line 1596 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

data_moved_bytes

changed to data_in_scattered_ranges_bytes based on your comment above about this not really being the data we moved.

pkg/roachpb/api.proto, line 1595 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please give this a comment

Done.

pkg/roachpb/api.proto, line 1803 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please give comments to all the fields

Done.

pkg/roachpb/api.proto, line 1804 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

data_size_bytes

Done.

pkg/roachpb/api.proto, line 1805 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: perhaps just stores

Done.

adityamaru · 2022-04-21T15:16:33Z

I also see we don't have a combine() for AddSSTableRequest which seems buggy
https://reviewable.io/reviews/cockroachdb/cockroach/80198#-N06y_060IbSfre5OG5u

I was wrong about this. AddSSTableRequests are marked as unsplittable and so if the range splits then we would get a RangeKeyMismatchedError causing us to split the sst into two and try again. Given this invariant, I think we can just have the replica evaluating the request send back the store it is on as part of the response. This way we should only have one such value for each addsstable response, I can fix this separately.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @dt)

pkg/kv/sender.go, line 444 at r3 (raw file):

	ctx context.Context, sender Sender, h roachpb.Header, args roachpb.Request,
) (roachpb.Response, *roachpb.Error) {
	resp, pErr := SendWrappedWithAdmission(ctx, sender, h, roachpb.AdmissionHeader{}, args)

undo this change

pkg/kv/bulk/buffering_adder.go, line 89 at r3 (raw file):

	}

	sp := tracing.SpanFromContext(ctx)

nit: consider inlining sp

pkg/kv/bulk/buffering_adder.go, line 336 at r3 (raw file):

	width := len(b.curBuf.entries) / b.initialSplits
	var toScatter []roachpb.Key
	recordingType := tracing.RecordingOff

consider moving this inside the loop

pkg/kv/bulk/buffering_adder.go, line 336 at r3 (raw file):

	width := len(b.curBuf.entries) / b.initialSplits
	var toScatter []roachpb.Key
	recordingType := tracing.RecordingOff

I think you can use tracing.SpanFromContext(ctx).RecordingType() always, even if there is no span (it'll be RecordingOff.

pkg/kv/bulk/ingestion_tracer.go, line 122 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

good point, changed it to a warning.

Even a warning is too much IMO.

Also you might as well look for all AddSSTable spans, not just the first one.

pkg/kv/bulk/ingestion_tracer.go, line 148 at r2 (raw file):

Which makes me believe that we can't just take the last one cause we don't know which one actually served the request?

That's why I wouldn't use the RangeInfos. I wrote that as a best-effort mechanism to update DistSender caches, not as a way for the client to get precise info about anything. So yeah, let's introduce some response fields that do what we want.

I also see we don't have a combine() for AddSSTableRequest which seems buggy. I'll address these in a follow-up since this PR is already doing a fair bit.

I don't think we need one because AddSSTable is not a range request (i.e. it can't be split into per-range sub-requests... or, can it?) ?

pkg/kv/bulk/ingestion_tracer.go, line 124 at r3 (raw file):

}

// notify locks the addSSTableTag, and updates its fields with information about

nitty nit: I wouldn't comment about locking or not locking.

pkg/kv/bulk/ingestion_tracer.go, line 130 at r3 (raw file):

	defer a.mu.Unlock()

	sp, found := rec.FindSpan(string(AddSSTableOpName))

I don't think you need this case any more.

pkg/kv/bulk/ingestion_tracer.go, line 298 at r3 (raw file):

	defer s.mu.Unlock()

	sp, found := rec.FindSpan(string(AdminScatterOpName))

do you still need this case?

pkg/kv/bulk/ingestion_tracer.go, line 421 at r3 (raw file):

	err := i.addSSTTag.notify(ctx, rec)
	if err != nil {
		log.Warningf(ctx, "failed to notify the ingestion trace of an AddSSTable: %+v", err)

Consider moving this logging inside notify in the one place where it returns an error, and get rid of the error return. I doubt that anyone can meaningfully do something with an error from this kind of notify method.

pkg/kv/bulk/ingestion_tracer.go, line 430 at r3 (raw file):

func (i *ingestionTracer) notifyAdminSplit(ctx context.Context, rec tracing.Recording) {
	if rec == nil {
		log.Warning(ctx, "received empty AdminSplit trace recording")

Is this really unexpected? Are you relying on jobs always being traced? I'd rather not rely on that and tolerate any recording with no warnings.

pkg/kv/bulk/sender.go, line 60 at r1 (raw file):

I can send a patch that inspects the output of the replica queue and increments the size only when the replica has actually moved.

IMO, I'd rather do that and avoid introducing this DataInScatteredRangesBytes.

pkg/kv/bulk/sender.go, line 72 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

All the SendWith methods in pkg/kv/sender.go have a comment saying nil ctx is okay and in that case, they'll use an empty context. That's the only reason I added this comment since we pull a child span from the context which would NPE.

Those comments say that a nil ctx is OK - which is weird. In the regular case where a nil ctx is not OK, you don't have to say anything or program defensively (and I'd much prefer it if we didn't).

pkg/kv/bulk/sender.go, line 74 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

I anticipate ExportRequest during backup also calling this method. Currently it uses SendWrappedWithAdmissionHeader directly. Your point about the lack of clarity about the trace configuration is a good one, and similar to your suggestion below I'll change this method to take in the recording type explicitly.

The more I look at this function, the less I like it :)
For one, the opName determination is not exhaustive; is calling ChildSpan() with an empty name OK?

Is there a reason why callers can't create the span and collect its recording themselves? Cause if there isn't, I think it'd be better if they did. In fact, the construction seems convoluted to me: IIUC, recordRequestStats adds a structured event, and then addSSTableTag.notify() fishes for that event. And then we can't make up our minds about how generic or specific the notify() function should be in the recording that it accepts. Let's pass in the event directly to notify(), I'd say.

pkg/kv/bulk/sender.go, line 130 at r3 (raw file):

	ctx context.Context,
	db *kv.DB,
	splitKey interface{},

does splitKey really need to be generic? Just because db.AdminSplit takes a generic one is not reason enough for all the callers to also be generic.

adityamaru

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @dt)

pkg/kv/sender.go, line 444 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

undo this change

Done.

pkg/kv/bulk/buffering_adder.go, line 336 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

consider moving this inside the loop

inlined it given your comment below

pkg/kv/bulk/buffering_adder.go, line 336 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think you can use tracing.SpanFromContext(ctx).RecordingType() always, even if there is no span (it'll be RecordingOff.

ah good catch, done.

pkg/kv/bulk/ingestion_tracer.go, line 122 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Even a warning is too much IMO.

Also you might as well look for all AddSSTable spans, not just the first one.

This has changed now that we take in the event itself.

pkg/kv/bulk/ingestion_tracer.go, line 148 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Which makes me believe that we can't just take the last one cause we don't know which one actually served the request?

That's why I wouldn't use the RangeInfos. I wrote that as a best-effort mechanism to update DistSender caches, not as a way for the client to get precise info about anything. So yeah, let's introduce some response fields that do what we want.

I also see we don't have a combine() for AddSSTableRequest which seems buggy. I'll address these in a follow-up since this PR is already doing a fair bit.

I don't think we need one because AddSSTable is not a range request (i.e. it can't be split into per-range sub-requests... or, can it?) ?

Yup, I was wrong about this. AddSSTableRequests are marked as unsplittable and so if the range splits then we would get a RangeKeyMismatchedError causing us to split the sst into two and try again. Given this invariant, I think we can just have the replica evaluating the request send back the store it is on as part of the response. This way we should only have one such value for each addsstable response, I can fix this separately if that's cool?

pkg/kv/bulk/ingestion_tracer.go, line 124 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nitty nit: I wouldn't comment about locking or not locking.

Done.

pkg/kv/bulk/ingestion_tracer.go, line 130 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I don't think you need this case any more.

Done.

pkg/kv/bulk/ingestion_tracer.go, line 298 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do you still need this case?

Done.

pkg/kv/bulk/ingestion_tracer.go, line 421 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Consider moving this logging inside notify in the one place where it returns an error, and get rid of the error return. I doubt that anyone can meaningfully do something with an error from this kind of notify method.

Code has changed after the refactor making this comment not applicable.

pkg/kv/bulk/ingestion_tracer.go, line 430 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is this really unexpected? Are you relying on jobs always being traced? I'd rather not rely on that and tolerate any recording with no warnings.

Makes sense, code has changed but I've followed your suggestion of tolerating nil stats with no warnings.

pkg/kv/bulk/sender.go, line 60 at r1 (raw file):

DataInScatteredRangesBytes
yup fair, #80338

pkg/kv/bulk/sender.go, line 72 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Those comments say that a nil ctx is OK - which is weird. In the regular case where a nil ctx is not OK, you don't have to say anything or program defensively (and I'd much prefer it if we didn't).

Done.

pkg/kv/bulk/sender.go, line 74 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

The more I look at this function, the less I like it :)
For one, the opName determination is not exhaustive; is calling ChildSpan() with an empty name OK?

Is there a reason why callers can't create the span and collect its recording themselves? Cause if there isn't, I think it'd be better if they did. In fact, the construction seems convoluted to me: IIUC, recordRequestStats adds a structured event, and then addSSTableTag.notify() fishes for that event. And then we can't make up our minds about how generic or specific the notify() function should be in the recording that it accepts. Let's pass in the event directly to notify(), I'd say.

Thanks for pushing to make this cleaner, I refactored it let me know what you think. I do like the idea of having all bulk KV requests in a single file and so I've kept the request helpers in pkg/kv/bulk.go file. I plan on adding export request to this file as well.

pkg/kv/bulk/sender.go, line 130 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

does splitKey really need to be generic? Just because db.AdminSplit takes a generic one is not reason enough for all the callers to also be generic.

nope, changed to roachpb.Key

Code quote:

adminSplitWithTracing

This change introduces an `ingestionTracer` that is associated with a tracing span and is responsible for adding ingestion information to the trace in the form of lazy tags. It currently aggregates information about AddSSTable, AdminSplit and AdminScatter requests that are issued during ingestion. As part of introducing the tracer, all RPC requests during ingestion are now made in their own child span, and allow for us to inspect the returned trace for relevant information. This is beneficial if egs: there is a stuck AdminScatter as it will show up on the `tracez` page as a long running span. The `Render` methods responsible for displaying the aggregated information are just stubs that can be iterated on in the future. Egs: instead of displaying the throughput/duration of each RPC request at the level of the import/restore processor, we might want to bubble up the information to the flow coordinator who is then responsible for aggregating this information across nodes and exposing it. The ingestion tracer gives us a building block to capture and push relevant trace information further up the stack. Release note: None

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @dt)

pkg/kv/bulk/sender.go line 43 at r4 (raw file):

// associated with the request.
//
// The recording mode used when fetching the span recording is specified using

I don't understand this parameter. The one caller doesn't even care what it passes in; it simply passes the parent's recording mode. The callee could do that itself. But better yet, I think the callee wants RecordingStructured, as that's the cheapest of the two and is sufficient. But actually, I don't think the callee wants the recording at all; see below.

pkg/kv/bulk/sender.go line 100 at r4 (raw file):

	var duration time.Duration
	rec := childSpan.FinishAndGetRecording(recordingType)
	addSSTSpan, ok := rec.FindSpan(AddSSTableOpName)

The AddSSTableOpName span is the root of this recording (if there is a recording); let's access it as rec[0].
But I don't really understand what's going here. Why are we looking at the trace at all for duration? Let's just measure it ourselves in this function; it'd be a lot more direct.

If you do that, I'd consider getting rid of childSpan, and any mention of tracing in this function. It seems to me that the RecordStructured(addSSTStats) would function even better on the parent span.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @andreimatei, and @dt)

pkg/kv/bulk/sender.go line 100 at r4 (raw file):

But I don't really understand what's going here. Why are we looking at the trace at all for duration? Let's just measure it ourselves in this function; it'd be a lot more direct.

I wanted to say here further that the caller can measure the duration, which would be even better. Unless we put the duration in AddSSTableStats, which I think would be reasonable.

adityamaru · 2022-04-25T19:50:15Z

Given our conversation last week, I think I'm going to pick up pieces from this PR while we build out a more generalized aggregator in #80388 instead of this ingestion tracer. We're still going to most likely need the changes in pkg/kv/bulk/sender.go and the tags we have introduced here, so I will address your remaining comments as I add the pieces in future PRs.

dt · 2022-04-25T19:56:23Z

does distsender already make a new child span for each request?

adityamaru · 2022-05-28T11:38:29Z

Closing since we're pursuing this elsewhere.

adityamaru requested review from dt and andreimatei April 19, 2022 21:22

adityamaru requested a review from a team as a code owner April 19, 2022 21:22

adityamaru requested a review from a team April 19, 2022 21:22

adityamaru force-pushed the tracing-experiment branch 2 times, most recently from 3fcf1d9 to c996b3a Compare April 19, 2022 23:27

tbg requested review from a team and removed request for a team April 20, 2022 07:58

adityamaru force-pushed the tracing-experiment branch 2 times, most recently from 4b1b207 to 40228bc Compare April 20, 2022 16:30

andreimatei reviewed Apr 20, 2022

View reviewed changes

adityamaru commented Apr 20, 2022

View reviewed changes

adityamaru force-pushed the tracing-experiment branch from 40228bc to 2cfe649 Compare April 20, 2022 22:56

adityamaru requested a review from andreimatei April 20, 2022 22:58

adityamaru force-pushed the tracing-experiment branch from 2cfe649 to 833a108 Compare April 21, 2022 14:12

andreimatei reviewed Apr 21, 2022

View reviewed changes

adityamaru commented Apr 22, 2022

View reviewed changes

adityamaru force-pushed the tracing-experiment branch from 833a108 to 75005b4 Compare April 22, 2022 00:32

adityamaru requested a review from andreimatei April 22, 2022 00:36

adityamaru force-pushed the tracing-experiment branch from 75005b4 to 68e13c3 Compare April 22, 2022 13:27

adityamaru mentioned this pull request Apr 22, 2022

bulk: implement a bulk aggregator that uses tracing to surface information about suboperations #80388

Closed

andreimatei reviewed Apr 25, 2022

View reviewed changes

adityamaru closed this May 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulk: introduce an ingestionTracer to the bulk ingestion pipeline #80198

bulk: introduce an ingestionTracer to the bulk ingestion pipeline #80198

adityamaru commented Apr 19, 2022

cockroach-teamcity commented Apr 19, 2022

adityamaru commented Apr 19, 2022

andreimatei left a comment

adityamaru commented Apr 20, 2022

adityamaru left a comment

adityamaru commented Apr 21, 2022 •

edited

Loading

andreimatei left a comment

adityamaru left a comment

andreimatei left a comment

andreimatei left a comment

adityamaru commented Apr 25, 2022

dt commented Apr 25, 2022

adityamaru commented May 28, 2022

bulk: introduce an ingestionTracer to the bulk ingestion pipeline #80198

bulk: introduce an ingestionTracer to the bulk ingestion pipeline #80198

Conversation

adityamaru commented Apr 19, 2022

cockroach-teamcity commented Apr 19, 2022

adityamaru commented Apr 19, 2022

andreimatei left a comment

Choose a reason for hiding this comment

adityamaru commented Apr 20, 2022

adityamaru left a comment

Choose a reason for hiding this comment

adityamaru commented Apr 21, 2022 • edited Loading

andreimatei left a comment

Choose a reason for hiding this comment

adityamaru left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

adityamaru commented Apr 25, 2022

dt commented Apr 25, 2022

adityamaru commented May 28, 2022

adityamaru commented Apr 21, 2022 •

edited

Loading