Fix race when checking for dirty aggregations #3886

ryanhall07 · 2021-10-29T04:27:47Z

This fixes the following race:

Writer creates a new aggregation and adds it to the drity set.
Flusher processes dirty set.
Writer updates the aggregation value.

This race was introduced with the recent refactor of resending to use to
an explicit dirty set. However, this race existed historically before
resending introduced the notion of a dirty bit.

The fix is to ensure the individual aggregation lock is held when
read/writing the dirty bit.

It was starting to get really confusing what state truly needs to be guarded by the
aggregation lock. This also refactors the code to introduce an
aggFlushState of flusher local state that doesn't need any
synchronization. Now all fields on the lockedAggregation must be
accessed with the lock.

What this PR does / why we need it:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

Does this PR require updating code package or user-facing documentation?:

ryanhall07 · 2021-10-29T04:29:54Z

src/aggregator/aggregator/counter_elem_gen.go

@@ -75,25 +71,26 @@ type CounterElem struct {

 	// startTime -> agg (new one per every resolution)
 	values map[xtime.UnixNano]timedCounter
+	// startTime -> state. this is local state to the flusher and does not need to guarded with a lock.
+	// values and flushState should always have the exact same key set.
+	flushState map[xtime.UnixNano]aggFlushState


aggFlushState now holds all the state that is local to the flusher which does not need to be synchronized. much easier to reason about.

ryanhall07 · 2021-10-29T04:30:33Z

src/aggregator/aggregator/counter_elem_gen.go

+		elemBase:   newElemBase(opts),
+		dirty:      make([]xtime.UnixNano, 0, defaultNumAggregations), // in most cases values will have two entries
+		values:     make(map[xtime.UnixNano]timedCounter),
+		flushState: make(map[xtime.UnixNano]aggFlushState),


now there is a separate flushState map that is also start aligned. this removes all the nonsense of somethings are start aligned and some things are end aligned.

ryanhall07 · 2021-10-29T04:31:55Z

src/aggregator/aggregator/counter_elem_gen.go

-				prevV.previousTimeNanos = xtime.UnixNano(timestampNanosFn(int64(e.minStartTime), resolution))
-				e.toExpire = append(e.toExpire, prevV)
+				// can't expire flush state until after the flushing, so we save the time to expire later.
+				e.flushStateToExpire = append(e.flushStateToExpire, e.minStartTime)


flushState is kind of like the old consumedValues/previousTimeNanos. although it's much easier to reason about because those fields are scoped to a single timestamp on the flush state. if you need the previous consumed values, you just look it from the previous flush state.

ryanhall07 · 2021-10-29T04:32:31Z

src/aggregator/aggregator/counter_elem_gen.go

+
+	// note: flushState might be empty for the first flush
+	flushState := e.flushState[dirtyTime]
+	// copy the lockedAgg data to the flushState while holding the lock.


the flush state now holds a copy of the lockedAgg values, so we don't need to hold the lock when actually processing the values.

ryanhall07 · 2021-10-29T04:33:40Z

src/aggregator/aggregator/counter_elem_gen.go

-				if _, ok := e.consumedValues[prevTimeNanos]; ok {
-					prev = e.consumedValues[prevTimeNanos][aggTypeIdx]
+				if flushState.prevStartTime > 0 {
+					prevFlushState, ok := e.flushState[flushState.prevStartTime]


hopefully this is simpler. now a flushState can point to a previous flushState through the prevStarTime. from there you can get the prev consumedValues.

ryanhall07 · 2021-10-29T04:34:55Z

src/aggregator/aggregator/elem_base.go

@@ -176,15 +176,46 @@ type elemBase struct {
 	listType                        metricListType

 	// Mutable states.
-	tombstoned           bool
-	closed               bool
-	cachedSourceSetsLock sync.Mutex                  // nolint: structcheck


don't need this lock anymore because the values lock is always held when cachedSourceSets is accessed. so it's redundant.

codecov · 2021-10-29T04:45:03Z

Codecov Report

Merging #3886 (15e41cf) into master (b3fe0f1) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff            @@
##           master   #3886     +/-   ##
========================================
- Coverage    56.8%   56.7%   -0.2%     
========================================
  Files         553     553             
  Lines       63440   63325    -115     
========================================
- Hits        36072   35940    -132     
- Misses      24175   24187     +12     
- Partials     3193    3198      +5

Flag	Coverage Δ
aggregator	`63.2% <100.0%> (-0.6%)`	⬇️
cluster	`∅ <ø> (∅)`
collector	`58.4% <ø> (ø)`
dbnode	`60.3% <ø> (-0.1%)`	⬇️
m3em	`46.4% <ø> (ø)`
metrics	`19.7% <ø> (ø)`
msg	`74.4% <ø> (+<0.1%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3fe0f1...15e41cf. Read the comment docs.

rallen090 · 2021-10-29T16:04:25Z

src/aggregator/aggregator/generic_elem.go

-		if ok && !nextAgg.lockedAgg.dirty {
-			nextAgg.previousTimeNanos = xtime.UnixNano(timestampNanosFn(int64(dirtyTime), resolution))
-			e.toConsume = append(e.toConsume, nextAgg)
+		if ok && (i == len(dirtyTimes)-1 || dirtyTimes[i+1] != nextAgg.startAtNanos) {


Is this new behavior to prevent duplicates?

it's not new behavior. previously we only cascaded the update if the nextAgg wasn't dirty. however again we were incorrectly reading the dirty bit of the next agg without holding the lock. I could grab the nextAgg lock here but I want to avoid more locking. this basically achieves the same thing by checking if it's already in the dirty set.

Ok yeah this makes sense

rallen090 · 2021-10-29T16:44:06Z

src/aggregator/aggregator/generic_elem.go

+		ts := dirtyTime.ToTime()
+		instrument.EmitAndLogInvariantViolation(e.opts.InstrumentOptions(), func(l *zap.Logger) {
+			l.Error("dirty timestamp not in values map", zap.Time("ts", ts))
+		})


I think in this case we need to return false otherwise agg is empty and we'd panic on agg.lockedAgg right?

good catch. it's hard to have test coverage for these paths that should never happen and panic in tests.

This fixes the following race: 1. Writer creates a new aggregation and adds it to the drity set. 2. Flusher processes dirty set. 3. Writer updates the aggregation value. This race was introduced with the recent refactor of resending to use to an explicit dirty set. However, this race existed historically before resending introduced the notion of a dirty bit. The fix is to ensure the individual aggregation lock is held when read/writing the dirty bit. It was starting to get really confusing what state truly needs to be guarded by the aggregation lock. This also refactors the code to introduce an aggFlushState of flusher local state that doesn't need any synchronization. Now all fields on the lockedAggregation must be accessed with the lock.

…ondition

ryanhall07 · 2021-11-02T23:05:07Z

src/aggregator/aggregator/entry_test.go

@@ -1071,6 +1073,80 @@ func TestAddUntimed_ResendEnabled(t *testing.T) {
 	require.True(t, ok)
 }

+func TestAddUntimed_ClosedAggregation(t *testing.T) {


this test does a pretty good job illustrating the race condition and leak of the dirty set.

ryanhall07 · 2021-11-02T23:06:18Z

src/aggregator/aggregator/generic_elem.go

@@ -293,10 +294,8 @@ func (e *GenericElem) expireValuesWithLock(
 	}
 	resolution := e.sp.Resolution().Window

-	// start after the minimum to ensure we always keep at least one value in the map for binary transformations.
-	currStart := e.minStartTime.Add(resolution)
+	currStart := e.minStartTime


we actually want to run this loop for e.minStartTime so we close the aggregation.

ryanhall07 · 2021-11-02T23:07:14Z

src/aggregator/aggregator/generic_elem.go

-				e.toConsume = append(e.toConsume, flushState)
+		// potentially consume the nextAgg as well in case we need to cascade an update from a previously flushed
+		// value.
+		if flushState.flushed {


figured this guard would help reduce some unnecessary lookups in the common case.

So this can be done since a "nextAgg" can only precede a "flushed current agg"? Might be worth commenting

it's more we only care to cascade an update when a previously flushed value has been updated. will add a comment/example.

ryanhall07 · 2021-11-02T23:08:15Z

src/aggregator/aggregator/generic_elem.go

-						flushState.consumedValues = make([]float64, len(e.aggTypes))
-					}
-					flushState.consumedValues[aggTypeIdx] = curr.Value
+				if flushState.consumedValues == nil {


this guard was dumb. it just causes a potential nil pointer when accessing the previous consumedValues above. instead of having to add a branch for nil, just add NaN which the code can already handle anyways.

rallen090 · 2021-11-03T13:06:33Z

src/aggregator/aggregator/generic_elem.go

-			if prevV, ok := e.values[e.minStartTime]; ok {
+			// if we're currently pointing at the start skip this there is no previous for the start. this ensures
+			// we always keep at least one value in the map for binary transformations.
+			if prevV, ok := e.values[e.minStartTime]; ok && currStart != e.minStartTime {


Ah ok this makes sense. nit: this comment wording I think is missing a "because"

rallen090 · 2021-11-03T13:14:16Z

src/aggregator/aggregator/generic_elem.go

 					}
+					prev.Value = prevFlushState.consumedValues[aggTypeIdx]


Why's this unnecessary now? I guess anything "previous" must have consumedVals? That'd only be the case if the previous had a binary op though right?

the only time the consumed value of the previous aggregation could be nil, is if the consumed value was NaN. this can happen for certain aggregations like Max when there is no data yet. I removed the guard below that did not populate consumedValues if the value was NaN, so this can never happen now. The code could already handle a NaN value, so it just seemed overly complicated. Removed 2 if statements that provided no value.

"That'd only be the case if the previous had a binary op though right?" - You're in the binaryOp case so you're guaranteed the previous was a binary op as well. The pipeline can't change within an aggregation (it would create a new aggregation).

also see the other comment below.

rallen090 · 2021-11-03T20:59:41Z

src/aggregator/aggregator/generic_elem.go

+					prevFlushState, ok := e.flushState[flushState.prevStartTime]
+					if !ok {
+						ts := flushState.prevStartTime.ToTime()
+						instrument.EmitAndLogInvariantViolation(e.opts.InstrumentOptions(), func(l *zap.Logger) {


I think this is another case where not panicking here (e.g. in prod) will result in a panic below because prevFlushState will have an empty consumedValues

For something like this or some other bug it might be worth still keeping around that check against a nil consumedValues - just to prevent a panic. And we could just do EmitAndLogInvariantViolation if nil

yea good catch

* master: Fix race when checking for dirty aggregations (#3886) [aggregator] Add test coverage to expireValues (#3898) [aggregator] Propagate cancellation through tick (#3895)

ryanhall07 requested review from robskillington and rallen090 October 29, 2021 04:27

ryanhall07 commented Oct 29, 2021

View reviewed changes

rallen090 reviewed Oct 29, 2021

View reviewed changes

rallen090 approved these changes Oct 29, 2021

View reviewed changes

ryanhall07 added 4 commits November 2, 2021 08:25

review comments

3f86f04

fix bugs

b570104

Merge remote-tracking branch 'origin/master' into rhall-resend-race-c…

8c07727

…ondition

ryanhall07 force-pushed the rhall-resend-race-condition branch from 671ae6f to 6e09a00 Compare November 2, 2021 22:57

fix race condition with dirty set

30003f1

ryanhall07 force-pushed the rhall-resend-race-condition branch from 6e09a00 to 30003f1 Compare November 2, 2021 22:59

ryanhall07 commented Nov 2, 2021

View reviewed changes

rallen090 reviewed Nov 3, 2021

View reviewed changes

rallen090 approved these changes Nov 3, 2021

View reviewed changes

ryanhall07 added 2 commits November 3, 2021 10:11

review comments

4bbee8a

Merge branch 'master' into rhall-resend-race-condition

6cf8fa9

rallen090 reviewed Nov 3, 2021

View reviewed changes

Merge branch 'master' into rhall-resend-race-condition

84cadb0

fix test from merge conflict

15e41cf

ryanhall07 enabled auto-merge (squash) November 3, 2021 22:52

ryanhall07 merged commit a4aff33 into master Nov 3, 2021

ryanhall07 deleted the rhall-resend-race-condition branch November 3, 2021 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race when checking for dirty aggregations #3886

Fix race when checking for dirty aggregations #3886

ryanhall07 commented Oct 29, 2021

ryanhall07 Oct 29, 2021

ryanhall07 Oct 29, 2021

ryanhall07 Oct 29, 2021

ryanhall07 Oct 29, 2021

ryanhall07 Oct 29, 2021 •

edited

Loading

ryanhall07 Oct 29, 2021

codecov bot commented Oct 29, 2021 •

edited

Loading

rallen090 Oct 29, 2021

ryanhall07 Oct 29, 2021

rallen090 Oct 29, 2021

rallen090 Oct 29, 2021 •

edited

Loading

ryanhall07 Oct 29, 2021

ryanhall07 Nov 2, 2021

ryanhall07 Nov 2, 2021

ryanhall07 Nov 2, 2021

rallen090 Nov 3, 2021

ryanhall07 Nov 3, 2021

ryanhall07 Nov 2, 2021

rallen090 Nov 3, 2021

rallen090 Nov 3, 2021

ryanhall07 Nov 3, 2021 •

edited

Loading

rallen090 Nov 3, 2021

rallen090 Nov 3, 2021

ryanhall07 Nov 3, 2021

Fix race when checking for dirty aggregations #3886

Fix race when checking for dirty aggregations #3886

Conversation

ryanhall07 commented Oct 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanhall07 Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 29, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rallen090 Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanhall07 Nov 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanhall07 Oct 29, 2021 •

edited

Loading

codecov bot commented Oct 29, 2021 •

edited

Loading

rallen090 Oct 29, 2021 •

edited

Loading

ryanhall07 Nov 3, 2021 •

edited

Loading