Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB panic when running sysbench oltp_read_write #30140

Closed
dbsid opened this issue Nov 25, 2021 · 10 comments · Fixed by #30181
Closed

TiDB panic when running sysbench oltp_read_write #30140

dbsid opened this issue Nov 25, 2021 · 10 comments · Fixed by #30181
Assignees
Labels
found/automation Found by automation tests severity/critical type/bug The issue is confirmed as a bug.

Comments

@dbsid
Copy link
Contributor

dbsid commented Nov 25, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

running oltp_read_write with 32 tables each with 10,000,000 rows

2. What did you expect to see? (Required)

no panic

3. What did you see instead (Required)

tidb panic

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2000955]

goroutine 14357480 [running]:
github.com/pingcap/tidb/sessionctx/stmtctx.(*StatementContext).GetResourceGroupTagByLabel(0xc04a4311e0, 0x2, 0x25, 0x2, 0x1)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/sessionctx/stmtctx/stmtctx.go:308 +0x375
github.com/pingcap/tidb/sessionctx/stmtctx.(*StatementContext).GetResourceGroupTagger.func1(0xc05dab9200)
        /home/jenkins/agent/workspace/build_tidb_multi_branch_master/go/src/github.com/pingcap/tidb/sessionctx/stmtctx/stmtctx.go:292 +0x7d
github.com/tikv/client-go/v2/txnkv/transaction.actionCommit.handleSingleBatch(0x0, 0xc0ccb86480, 0xc0292ce090, 0x4ce, 0x1, 0x1f2, 0x430b168, 0xc02a6a70c0, 0x0, 0x0, ...)
        /nfs/cache/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/commit.go:77 +0x2166
github.com/tikv/client-go/v2/txnkv/transaction.(*batchExecutor).startWorker.func1(0xc06ca00740, 0xc04cb85560, 0x4ce, 0x1, 0x1f2, 0x430b168, 0xc02a6a70c0, 0x0)
        /nfs/cache/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/2pc.go:1804 +0x197
created by github.com/tikv/client-go/v2/txnkv/transaction.(*batchExecutor).startWorker
        /nfs/cache/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/2pc.go:1787 +0x19e
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x20008b9]

image

4. What is your TiDB version? (Required)

latest nightly build

[2021/11/25 09:17:25.069 +08:00] [INFO] [printer.go:34] ["Welcome to TiDB."] ["Release Version"=v5.4.0-alpha-222-g5916672ce] [Edition=Community] ["Git Commit Hash"=5916672ce97672ff0e3a46cca69300fe61cde715] ["Git Branch"=master] ["UTC Build Time"="2021-11-24 11:26:42"] [GoVersion=go1.16.4] ["Race Enabled"=false] ["Check Table Before Drop"=false] ["TiKV Min Version"=v3.0.0-60965b006877ca7234adaced7890d7b029ed1306]
@dbsid dbsid added type/bug The issue is confirmed as a bug. severity/critical labels Nov 25, 2021
@dbsid dbsid added the found/automation Found by automation tests label Nov 25, 2021
@ChenPeng2013 ChenPeng2013 added the sig/transaction SIG:Transaction label Nov 25, 2021
@cfzjywxk
Copy link
Contributor

Seems introduced by tikv/client-go#368,
@crazycs520 @mornyx PTAL

@cfzjywxk cfzjywxk removed their assignment Nov 25, 2021
@cfzjywxk cfzjywxk removed the sig/transaction SIG:Transaction label Nov 25, 2021
@mornyx
Copy link
Contributor

mornyx commented Nov 26, 2021

/assign

@mornyx
Copy link
Contributor

mornyx commented Nov 26, 2021

Hi @dbsid, thanks for your feedback. I tried to reproduce this panic but failed, can you reproduce this case stably? If so, please provide your more detailed background, such as the way to deploy TiDB, full command of your sysbench execution, etc. Thanks!

@sticnarf
Copy link
Contributor

sticnarf commented Nov 29, 2021

@mornyx I can easily reproduce this problem. I'm running a 1 TiKV + 1 TiDB cluster deployed using TiUP. My sysbench command is like:
sysbench oltp_write_only --rand-type=special --tables=32 --table-size=1000000 --threads=256 run

The panic happens so frequently that I cannot finish any benchmark. Hope it will be resolved soon.

@mornyx
Copy link
Contributor

mornyx commented Nov 29, 2021

@mornyx I can easily reproduce this problem. I'm running a 1 TiKV + 1 TiDB cluster deployed using TiUP. My sysbench command is like: sysbench oltp_write_only --rand-type=special --tables=32 --table-size=1000000 --threads=256 run

The panic happens so frequently that I cannot finish any benchmark. Hope it will be resolved soon.

Thanks for this information, I'm solving.

@wjhuang2016
Copy link
Member

I encountered this by:

cd cmd/explaintest
./run-tests.sh -r explain_generate_column_substitute

@github-actions
Copy link

Please check whether the issue should be labeled with 'affects-x.y' or 'fixes-x.y.z', and then remove 'needs-more-info' label.

@mornyx
Copy link
Contributor

mornyx commented Nov 29, 2021

I have removed the code that may cause a panic, and now the master branch should no longer panic. cc @dbsid @sticnarf @wjhuang2016 @Tammyxia

I reproduced this panic locally with go1.16. The reason for not reproducing before is that I used go1.17 locally. There are some differences between the two versions that make the phenomenon different. At present, I suspect that this difference is caused by register-based calling convention, which is interesting. I will continue to figure out this question, and if I have a conclusion, I will write it here.

@cyliu0
Copy link
Contributor

cyliu0 commented Nov 30, 2021

We got another index out of range panic after this fix while running tpcc.

panic: runtime error: index out of range [4294967295] with length 2

goroutine 24492959 [running]:
github.com/tikv/client-go/v2/internal/unionstore.(*nodeAllocator).getNode(...)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb_arena.go:213
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).getNode(...)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:754
github.com/tikv/client-go/v2/internal/unionstore.memdbNodeAddr.getRight(...)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:786
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).traverse(0xc0175c32b0, 0xc045da3e30, 0x13, 0x13, 0x0, 0x60, 0x7f86257193c8)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:375 +0xd51
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).GetFlags(0xc0175c32b0, 0xc045da3e30, 0x13, 0x13, 0x3bbecc0, 0x42a7f01, 0xc04336f6e0)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/memdb.go:223 +0x52
github.com/tikv/client-go/v2/internal/unionstore.(*KVUnionStore).HasPresumeKeyNotExists(0xc0acc30180, 0xc045da3e30, 0x13, 0x13, 0x13)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/internal/unionstore/union_store.go:141 +0x4c
github.com/tikv/client-go/v2/txnkv/transaction.actionPessimisticLock.handleSingleBatch(0xc00581ebe0, 0xc05f786000, 0xc0a1cbbd40, 0x488, 0x5, 0x98, 0x4319f38, 0xc0a6b869c0, 0x0, 0x12fe4c5, ...)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/pessimistic.go:94 +0x1c7
github.com/tikv/client-go/v2/txnkv/transaction.(*batchExecutor).startWorker.func1(0xc0ab1b3740, 0xc0a6b86ae0, 0x488, 0x5, 0x98, 0x4319f38, 0xc0a6b869c0, 0x0)
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/2pc.go:1804 +0x197
created by github.com/tikv/client-go/v2/txnkv/transaction.(*batchExecutor).startWorker
	/nfs/cache/mod/github.com/tikv/client-go/[email protected]/txnkv/transaction/2pc.go:1787 +0x19e

@sticnarf
Copy link
Contributor

@cyliu0 It is unrelated to this issue. It looks like #26832.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
found/automation Found by automation tests severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants