Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tsm1): fix ring and related test #20172

Closed
wants to merge 1 commit into from

Conversation

StoneYunZhao
Copy link
Contributor

@StoneYunZhao StoneYunZhao commented Nov 25, 2020

I accidentally found that ring.go is not rigorous。I opened two identical PRs before, one of the PR's circleci test failed but another succeed.

  1. failed circleci triggered by feat: No need to compute point hash if there is only one shard #20117 .
  2. succeed circleci triggered by feat: No need to compute point hash if there is only one shard #20118 .

I'm pretty sure that it's not my commit who caused the failure after read ring.go. I did some fix work for it.

  1. e9db11a changed the max partition to 16, but not change benchmark test, so the benchmark test will failed if the partition larger than 16.
  2. newring() doesn't check the partition if is the power of two, and TestRing_newRing can't test that case.
  3. add() should not add a nil entry in benchmarkRingkeys, because it will cause panic when call keys().
  4. reset keysHint must use atomic in reset(). If not, it will cause the below failure at a very small chance.
  5. remove() is safe for use by multiple goroutines, so we must ensure keysHint not be negative in remove(). If negative , may cause panic because it used in the way make([][]byte, 0, atomic.LoadInt64(&r.keysHint))

Below is the error log in circleci:

==================
WARNING: DATA RACE
Write at 0x00c00039c300 by goroutine 90:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*ring).reset()
      /root/influxdb/tsdb/engine/tsm1/ring.go:81 +0xac
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).ClearSnapshot()
      /root/influxdb/tsdb/engine/tsm1/cache.go:449 +0x3b1
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).WriteSnapshot()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1934 +0x5e2
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).CreateSnapshot()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1959 +0xbd
  github.com/influxdata/influxdb/tsdb.(*Shard).CreateSnapshot()
      /root/influxdb/tsdb/shard.go:1185 +0xa6
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent.func2()
      /root/influxdb/tsdb/shard_test.go:475 +0x1ef

Previous read at 0x00c00039c300 by goroutine 100:
  sync/atomic.LoadInt64()
      /usr/local/go/src/runtime/race_amd64.s:211 +0xb
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*ring).keys()
      /root/influxdb/tsdb/engine/tsm1/ring.go:121 +0x52
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Keys()
      /root/influxdb/tsdb/engine/tsm1/cache.go:504 +0x97
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).deleteSeriesRange()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1694 +0x7f8
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRangeWithPredicate()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1527 +0x8ff
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteSeriesRange()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1419 +0x91
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).DeleteMeasurement()
      /root/influxdb/tsdb/engine/tsm1/engine.go:1870 +0x30f
  github.com/influxdata/influxdb/tsdb.(*Shard).DeleteMeasurement()
      /root/influxdb/tsdb/shard.go:795 +0xb0
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent.func1()
      /root/influxdb/tsdb/shard_test.go:453 +0xfa

Goroutine 90 (running) created at:
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent()
      /root/influxdb/tsdb/shard_test.go:466 +0xec6
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:909 +0x199

Goroutine 100 (running) created at:
  github.com/influxdata/influxdb/tsdb_test.TestShard_WritePoints_FieldConflictConcurrent()
      /root/influxdb/tsdb/shard_test.go:450 +0xe60
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:909 +0x199
==================
FAIL	github.com/influxdata/influxdb/tsdb	23.041s

@danxmoran danxmoran self-requested a review November 25, 2020 18:06
@danxmoran danxmoran changed the base branch from 1.8 to master-1.x February 23, 2021 16:16
@danxmoran danxmoran changed the base branch from master-1.x to 1.8 February 23, 2021 16:17
@danxmoran
Copy link
Contributor

@StoneYunZhao apologies it's taken so long for this to be reviewed.

Could you rebase your branch on the master-1.x branch and re-target this PR to merge there? We're trying to identify the source of a performance regression on 1.8, so commits there are effectively frozen for now.

@StoneYunZhao
Copy link
Contributor Author

StoneYunZhao commented Feb 25, 2021

@StoneYunZhao apologies it's taken so long for this to be reviewed.

Could you rebase your branch on the master-1.x branch and re-target this PR to merge there? We're trying to identify the source of a performance regression on 1.8, so commits there are effectively frozen for now.

Sorry, I haven't seen your reply until now. I found that you have helped me do this work in #20802. Thanks a lot! What else do I need to do now?

@danxmoran
Copy link
Contributor

Yes, we can reopen the backport to 1.8 when the release branch is ready. Thanks again for the submission!

@danxmoran danxmoran closed this Feb 25, 2021
@danxmoran
Copy link
Contributor

Ack, sorry, didn't notice you edited your comment. I think closing is the right move, no extra work needed on your part.

@StoneYunZhao StoneYunZhao deleted the zhaoyun/fix-ring branch June 10, 2021 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants