Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: TestChangefeedDataTTL failed under stress #36369

Closed
cockroach-teamcity opened this issue Apr 1, 2019 · 3 comments · Fixed by #36891
Closed

ccl/changefeedccl: TestChangefeedDataTTL failed under stress #36369

cockroach-teamcity opened this issue Apr 1, 2019 · 3 comments · Fixed by #36891
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/668162cc99e4f3198b663b1abfa51858eeb3ccb8

Parameters:

TAGS=
GOFLAGS=

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestChangefeedDataTTL PKG=github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1212391&tab=buildLog

I190401 05:35:38.545298 26532 sql/event_log.go:135  [n1,intExec=initializeClusterSecret] Event: "set_cluster_setting", target: 0, info: {SettingName:cluster.secret Value:0335c4a5-31c0-41a0-b4b2-c2c536b7b26f User:root}
I190401 05:35:38.551632 26495 sql/event_log.go:135  [n1,intExec=create-default-db] Event: "create_database", target: 50, info: {DatabaseName:defaultdb Statement:CREATE DATABASE IF NOT EXISTS defaultdb User:root}
I190401 05:35:38.554466 26563 sql/event_log.go:135  [n1,intExec=create-default-db] Event: "create_database", target: 51, info: {DatabaseName:postgres Statement:CREATE DATABASE IF NOT EXISTS postgres User:root}
I190401 05:35:38.560298 26293 server/server.go:1653  [n1] done ensuring all necessary migrations have run
I190401 05:35:38.560333 26293 server/server.go:1656  [n1] serving sql connections
I190401 05:35:38.562954 26578 sql/event_log.go:135  [n1] Event: "node_join", target: 1, info: {Descriptor:{NodeID:1 Address:127.0.0.1:36033 Attrs: Locality: ServerVersion:2.1-10 BuildTag:v19.1.0-beta.20190318-526-g668162c StartedAt:1554096938449926888 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:f23eda40-f846-4ab4-bcc9-88e13cafccbe StartedAt:1554096938449926888 LastUp:1554096938449926888}
I190401 05:35:38.563682 26543 server/server_update.go:67  [n1] no need to upgrade, cluster already at the newest version
I190401 05:35:38.575111 26582 sql/event_log.go:135  [n1,client=127.0.0.1:44984,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:kv.rangefeed.enabled Value:true User:root}
I190401 05:35:38.579478 26582 sql/event_log.go:135  [n1,client=127.0.0.1:44984,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:kv.closed_timestamp.target_duration Value:00:00:01 User:root}
I190401 05:35:38.582632 26582 sql/event_log.go:135  [n1,client=127.0.0.1:44984,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:changefeed.experimental_poll_interval Value:00:00:00.01 User:root}
I190401 05:35:38.585828 26582 sql/event_log.go:135  [n1,client=127.0.0.1:44984,user=root] Event: "create_database", target: 52, info: {DatabaseName:d Statement:CREATE DATABASE d User:root}
I190401 05:35:38.590023 26582 sql/event_log.go:135  [n1,client=127.0.0.1:44984,user=root] Event: "create_table", target: 53, info: {TableName:d.public.foo Statement:CREATE TABLE foo (a INT8 PRIMARY KEY, b STRING) User:root}
I190401 05:35:38.591505 26601 storage/replica_command.go:246  [n1,split,s1,r20/1:/{Table/24-Max}] initiating a split of this range at key /Table/53 [r21] (zone config)
I190401 05:35:38.656546 26293 util/stop/stopper.go:546  quiescing; tasks left:
12     [async] storage.IntentResolver: cleanup txn records
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
W190401 05:35:38.656580 26537 storage/intentresolver/intent_resolver.go:820  failed to gc transaction record: could not GC completed transaction anchored at /Table/SystemConfigSpan/Start: node unavailable; try another peer
I190401 05:35:38.656627 26293 util/stop/stopper.go:546  quiescing; tasks left:
11     [async] storage.IntentResolver: cleanup txn records
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I190401 05:35:38.656682 26293 util/stop/stopper.go:546  quiescing; tasks left:
3      [async] storage.IntentResolver: cleanup txn records
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I190401 05:35:38.656711 26293 util/stop/stopper.go:546  quiescing; tasks left:
2      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-rangefeed-subscriber
I190401 05:35:38.656744 26293 util/stop/stopper.go:546  quiescing; tasks left:
1      [async] storage.IntentResolver: cleanup txn records
1      [async] closedts-rangefeed-subscriber
I190401 05:35:38.656769 26293 util/stop/stopper.go:546  quiescing; tasks left:
1      [async] closedts-rangefeed-subscriber
    --- FAIL: TestChangefeedDataTTL/sinkless (0.24s)
        changefeed_test.go:1237: expected "must be after replica GC threshold" error got: <nil>

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Apr 1, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Apr 1, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/668162cc99e4f3198b663b1abfa51858eeb3ccb8

Parameters:

TAGS=
GOFLAGS=-race

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=TestChangefeedDataTTL PKG=github.com/cockroachdb/cockroach/pkg/ccl/changefeedccl TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1213009&tab=buildLog

I190401 06:33:22.971356 29212 sql/event_log.go:135  [n1,intExec=initializeClusterSecret] Event: "set_cluster_setting", target: 0, info: {SettingName:cluster.secret Value:85489398-4141-4f30-bba0-1b0caaccf07a User:root}
I190401 06:33:23.012355 29314 sql/event_log.go:135  [n1,intExec=create-default-db] Event: "create_database", target: 50, info: {DatabaseName:defaultdb Statement:CREATE DATABASE IF NOT EXISTS defaultdb User:root}
I190401 06:33:23.038227 29276 sql/event_log.go:135  [n1,intExec=create-default-db] Event: "create_database", target: 51, info: {DatabaseName:postgres Statement:CREATE DATABASE IF NOT EXISTS postgres User:root}
I190401 06:33:23.110241 29061 server/server.go:1653  [n1] done ensuring all necessary migrations have run
I190401 06:33:23.110381 29061 server/server.go:1656  [n1] serving sql connections
I190401 06:33:23.128666 29321 sql/event_log.go:135  [n1] Event: "node_join", target: 1, info: {Descriptor:{NodeID:1 Address:127.0.0.1:34907 Attrs: Locality: ServerVersion:2.1-10 BuildTag:v19.1.0-beta.20190318-526-g668162c StartedAt:1554100402372901894 LocalityAddress:[] XXX_NoUnkeyedLiteral:{} XXX_sizecache:0} ClusterID:faababa7-15cd-49e0-a2dc-bfedc38fc950 StartedAt:1554100402372901894 LastUp:1554100402372901894}
I190401 06:33:23.131093 29318 server/server_update.go:67  [n1] no need to upgrade, cluster already at the newest version
I190401 06:33:23.169224 29054 sql/event_log.go:135  [n1,client=127.0.0.1:57464,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:kv.rangefeed.enabled Value:true User:root}
I190401 06:33:23.202390 29054 sql/event_log.go:135  [n1,client=127.0.0.1:57464,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:kv.closed_timestamp.target_duration Value:00:00:01 User:root}
I190401 06:33:23.230782 29054 sql/event_log.go:135  [n1,client=127.0.0.1:57464,user=root] Event: "set_cluster_setting", target: 0, info: {SettingName:changefeed.experimental_poll_interval Value:00:00:00.01 User:root}
I190401 06:33:23.257610 29054 sql/event_log.go:135  [n1,client=127.0.0.1:57464,user=root] Event: "create_database", target: 52, info: {DatabaseName:d Statement:CREATE DATABASE d User:root}
I190401 06:33:23.289914 29054 sql/event_log.go:135  [n1,client=127.0.0.1:57464,user=root] Event: "create_table", target: 53, info: {TableName:d.public.foo Statement:CREATE TABLE foo (a INT8 PRIMARY KEY, b STRING) User:root}
I190401 06:33:23.300002 29248 storage/replica_command.go:246  [n1,split,s1,r20/1:/{Table/24-Max}] initiating a split of this range at key /Table/53 [r21] (zone config)
W190401 06:33:23.526993 29253 storage/intentresolver/intent_resolver.go:820  failed to gc transaction record: could not GC completed transaction anchored at /Table/SystemConfigSpan/Start: node unavailable; try another peer
I190401 06:33:23.526999 29061 util/stop/stopper.go:546  quiescing; tasks left:
12     [async] storage.IntentResolver: cleanup txn records
1      [async] transport racer
1      [async] intent_resolver_ir_batcher
1      [async] intent_resolver_gc_batcher
1      [async] closedts-rangefeed-subscriber
I190401 06:33:23.528276 29061 util/stop/stopper.go:546  quiescing; tasks left:
6      [async] storage.IntentResolver: cleanup txn records
1      [async] transport racer
1      [async] closedts-rangefeed-subscriber
I190401 06:33:23.528546 29061 util/stop/stopper.go:546  quiescing; tasks left:
5      [async] storage.IntentResolver: cleanup txn records
1      [async] transport racer
1      [async] closedts-rangefeed-subscriber
I190401 06:33:23.530048 29061 util/stop/stopper.go:546  quiescing; tasks left:
1      [async] transport racer
1      [async] closedts-rangefeed-subscriber
I190401 06:33:23.531548 29061 util/stop/stopper.go:546  quiescing; tasks left:
1      [async] transport racer
I190401 06:33:23.598782 29234 kv/transport_race.go:113  transport race promotion: ran 58 iterations on up to 301 requests
    --- FAIL: TestChangefeedDataTTL/sinkless (1.39s)
        changefeed_test.go:1237: expected "must be after replica GC threshold" error got: <nil>

@tbg
Copy link
Member

tbg commented Apr 2, 2019

Saw this in CI today too

@danhhz
Copy link
Contributor

danhhz commented Apr 2, 2019

I don't see any failures of this before 2 days ago and now we have these two plus the one Tobi saw in CI? Seems like something changed, I wonder if it was #36132 (which as mentioned in #36432 is getting ripped out).

danhhz added a commit to danhhz/cockroach that referenced this issue Apr 16, 2019
This test is yet another instance of too prescriptive an assumption
about how many duplicate we will (or won't get). Make it more resilient
to them.

Closes cockroachdb#36369

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 17, 2019
This test is yet another instance of too prescriptive an assumption
about how many duplicate we will (or won't get). Make it more resilient
to them.

Closes cockroachdb#36369

Release note: None
craig bot pushed a commit that referenced this issue Apr 17, 2019
36891: changefeedccl: deflake TestChangefeedDataTTL/sinkless r=tbg a=danhhz

This test is yet another instance of too prescriptive an assumption
about how many duplicate we will (or won't) get. Make it more resilient
to them.

Closes #36369

Release note: None

Co-authored-by: Daniel Harrison <[email protected]>
@craig craig bot closed this as completed in #36891 Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants