Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: follower-reads/nodes=3 failed #34814

Closed
cockroach-teamcity opened this issue Feb 12, 2019 · 3 comments
Closed

roachtest: follower-reads/nodes=3 failed #34814

cockroach-teamcity opened this issue Feb 12, 2019 · 3 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/1634c6bbf48d82ef8994386f29750c9dc3f6163a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=follower-reads/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1133009&tab=buildLog

The test failed on provisional_201902112134_v2.1.5:
	follower_reads.go:88,test.go:1206: failed to disable load based splitting: pq: unknown cluster setting 'kv.range_split.by_load_enabled'

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Feb 12, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Feb 12, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8a6262b773ff8ddf761dfe9669e5d1f66e67c3fb

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=follower-reads/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1133211&tab=buildLog

The test failed on release-2.1:
	follower_reads.go:88,test.go:1206: failed to disable load based splitting: pq: unknown cluster setting 'kv.range_split.by_load_enabled'

@tbg
Copy link
Member

tbg commented Feb 12, 2019

cc @ajwerner, probably just skip the test in 2.1

@ajwerner
Copy link
Contributor

@tbg Thanks!

craig bot pushed a commit that referenced this issue Feb 12, 2019
34387: kv: cleanup txn more eagerly r=andreimatei a=andreimatei

Before this patch, when a txn got a non-retriable error, its heartbeat
loop (if any) was left running until the client sent a rollback.
This patch makes the txn cleanup more eager - we do it immediately on
receiving the error.
Besides seeming sane (why wait for the client when we know what must
happen), this also makes the TxnCoordSender state more internally
consistent: if a non-retriable error contains an Aborted txn, we now
stop the hb loop before that loop has the opportunity to freakout about
running with an Aborted txn. It's unclear if non-retriable errors could
contain Aborted txns, but see below.

This patch also refactors the state update code in an attempt to make it
more readable.

In #34337 we see a crash due to the fact that a heartbeat is running for
a transaction whose proto status is no longer PENDING. It's not entirely
clear to me how that can happen since we "clean up the txn" - i.e. stop
the hb loop - after commits and roll backs as well on
TransactionAbortedErrors, but it's also not very convincing that it
can't happen. The thing is that the protocol between the "client" and
the "server" wrt communicating txn updates from the server is lax and
it's not very clear what kind of responses can carry an Aborted or
Committed proto in them.

This patch also makes leaf TxnCoordSender nimbler by not using
interceptors needed only by roots.

Release note: None

34828: roachtest: disable follower_reads test for versions prior to v2.2.0 r=ajwerner a=ajwerner

Fixed #34814

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in #34828 Feb 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

3 participants