fix(tsm1): fix ring and related test #20172

StoneYunZhao · 2020-11-25T03:23:58Z

I accidentally found that ring.go is not rigorous。I opened two identical PRs before, one of the PR's circleci test failed but another succeed.

failed circleci triggered by feat: No need to compute point hash if there is only one shard #20117 .
succeed circleci triggered by feat: No need to compute point hash if there is only one shard #20118 .

I'm pretty sure that it's not my commit who caused the failure after read ring.go. I did some fix work for it.

e9db11a changed the max partition to 16, but not change benchmark test, so the benchmark test will failed if the partition larger than 16.
newring() doesn't check the partition if is the power of two, and TestRing_newRing can't test that case.
add() should not add a nil entry in benchmarkRingkeys, because it will cause panic when call keys().
reset keysHint must use atomic in reset(). If not, it will cause the below failure at a very small chance.
remove() is safe for use by multiple goroutines, so we must ensure keysHint not be negative in remove(). If negative , may cause panic because it used in the way make([][]byte, 0, atomic.LoadInt64(&r.keysHint))

Below is the error log in circleci:

==================
WARNING: DATA RACE
Write at 0x00c00039c300 by goroutine 90:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*ring).reset()
      /root/influxdb/tsdb/engine/tsm1/ring.go:81 +0xac
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).ClearSnapshot()
      /root/influxdb/tsdb/engine/tsm1/cache.go:449 +0x3b1
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WriteSnapshot()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1934 +0x5e2
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).CreateSnapshot()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1959 +0xbd
  github.com/influxdata/influxdb/tsdb.(*Shard).CreateSnapshot()
      /root/influxdb/tsdb/shard.go:1185 +0xa6
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent.func2()
      /root/influxdb/tsdb/shard_test.go:475 +0x1ef

Previous read at 0x00c00039c300 by goroutine 100:
  sync/atomic.LoadInt64()
      /usr/local/go/src/runtime/race_amd64.s:211 +0xb
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*ring).keys()
      /root/influxdb/tsdb/engine/tsm1/ring.go:121 +0x52
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Keys()
      /root/influxdb/tsdb/engine/tsm1/cache.go:504 +0x97
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1694 +0x7f8
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRangeWithPredicate()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1527 +0x8ff
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRange()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1419 +0x91
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteMeasurement()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1870 +0x30f
  github.com/influxdata/influxdb/tsdb.(*Shard).DeleteMeasurement()
      /root/influxdb/tsdb/shard.go:795 +0xb0
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent.func1()
      /root/influxdb/tsdb/shard_test.go:453 +0xfa

Goroutine 90 (running) created at:
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent()
      /root/influxdb/tsdb/shard_test.go:466 +0xec6
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:909 +0x199

Goroutine 100 (running) created at:
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent()
      /root/influxdb/tsdb/shard_test.go:450 +0xe60
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:909 +0x199
==================
FAIL	github.com/influxdata/influxdb/tsdb	23.041s

CHANGELOG.md updated with a link to the PR (not the Issue)
Well-formatted commit messages
Rebased/mergeable
Tests pass
Signed CLA (if not already signed)

danxmoran · 2021-02-23T16:19:22Z

@StoneYunZhao apologies it's taken so long for this to be reviewed.

Could you rebase your branch on the master-1.x branch and re-target this PR to merge there? We're trying to identify the source of a performance regression on 1.8, so commits there are effectively frozen for now.

StoneYunZhao · 2021-02-25T09:10:21Z

@StoneYunZhao apologies it's taken so long for this to be reviewed.

Could you rebase your branch on the master-1.x branch and re-target this PR to merge there? We're trying to identify the source of a performance regression on 1.8, so commits there are effectively frozen for now.

Sorry, I haven't seen your reply until now. I found that you have helped me do this work in #20802. Thanks a lot! What else do I need to do now?

danxmoran · 2021-02-25T13:41:35Z

Yes, we can reopen the backport to 1.8 when the release branch is ready. Thanks again for the submission!

danxmoran · 2021-02-25T13:42:10Z

Ack, sorry, didn't notice you edited your comment. I think closing is the right move, no extra work needed on your part.

fix(tsm1): fix ring and related test

835ce6b

danxmoran self-requested a review November 25, 2020 18:06

danxmoran changed the base branch from 1.8 to master-1.x February 23, 2021 16:16

danxmoran changed the base branch from master-1.x to 1.8 February 23, 2021 16:17

danxmoran mentioned this pull request Feb 23, 2021

fix(tsm1): fix data race and validation in cache ring #20797

Merged

dgnorton approved these changes Feb 24, 2021

View reviewed changes

This was referenced Feb 24, 2021

Fix data race and validation in cache ring [backport 2.0.x] #20801

Closed

fix(tsm1): fix data race and validation in cache ring #20802

Merged

danxmoran closed this Feb 25, 2021

StoneYunZhao deleted the zhaoyun/fix-ring branch June 10, 2021 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tsm1): fix ring and related test #20172

fix(tsm1): fix ring and related test #20172

StoneYunZhao commented Nov 25, 2020 •

edited

Loading

danxmoran commented Feb 23, 2021

StoneYunZhao commented Feb 25, 2021 •

edited

Loading

danxmoran commented Feb 25, 2021

danxmoran commented Feb 25, 2021

fix(tsm1): fix ring and related test #20172

fix(tsm1): fix ring and related test #20172

Conversation

StoneYunZhao commented Nov 25, 2020 • edited Loading

danxmoran commented Feb 23, 2021

StoneYunZhao commented Feb 25, 2021 • edited Loading

danxmoran commented Feb 25, 2021

danxmoran commented Feb 25, 2021

StoneYunZhao commented Nov 25, 2020 •

edited

Loading

StoneYunZhao commented Feb 25, 2021 •

edited

Loading