Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issue between rename partitioned table and applyTable #9233

Open
JaySon-Huang opened this issue Jul 15, 2024 · 1 comment
Open
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. component/storage severity/moderate type/bug The issue is confirmed as a bug.

Comments

@JaySon-Huang
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

https://ci.pingcap.net/blue/organizations/jenkins/tiflash-ghpr-integration-tests/detail/tiflash-ghpr-integration-tests/16561/pipeline/

fullstack-test2-logs.tar.gz

[2024-07-12T04:46:15.000Z] fullstack-test2/ddl/rename_table_across_databases.test: Running
[2024-07-12T04:46:33.020Z]   File: fullstack-test2/ddl/rename_table_across_databases.test
[2024-07-12T04:46:33.020Z]   Error line: 117
[2024-07-12T04:46:33.020Z]   Error: set session tidb_isolation_read_engines='tiflash'; select * from test_new.part4 order by id;
[2024-07-12T04:46:33.020Z]   Result:
[2024-07-12T04:46:33.020Z]     ERROR 1105 (HY000) at line 1: other error for mpp stream: Code: 107, e.displayText() = DB::Exception: Cannot open file /tmp/tiflash/data/db/metadata/db_708/t_713.sql, errno: 2, strerror: No such file or directory, e.what() = DB::Exception,
[2024-07-12T04:46:33.020Z]   Expected:
[2024-07-12T04:46:33.020Z]     +----+----------+------+
[2024-07-12T04:46:33.020Z]     | id | store_id | c1   |
[2024-07-12T04:46:33.020Z]     +----+----------+------+
[2024-07-12T04:46:33.020Z]     |  1 |        1 | NULL |
[2024-07-12T04:46:33.020Z]     |  2 |        2 | NULL |
[2024-07-12T04:46:33.020Z]     |  3 |        3 | NULL |
[2024-07-12T04:46:33.020Z]     | 11 |       11 | NULL |
[2024-07-12T04:46:33.020Z]     | 16 |       16 | NULL |
[2024-07-12T04:46:33.020Z]     +----+----------+------+

4. What is your TiFlash version? (Required)

release-7.5

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jul 15, 2024

This is a concurrent issue about renaming partitioned table across databases and tiflash can recover itself in following queries.
So mark it as moderate.


Thread-A enter TiDBSchemaSyncer::syncSchemaDiffs and run into RenameTable. That overwrite the id_mapping for table_id=710 old_database_id=2 new_database_id=708.

[2024/07/12 12:46:31.169 +08:00] [INFO] [TiDBSchemaSyncer.cpp:261] ["Sync table schema begin, table_id=712"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.170 +08:00] [WARN] [SchemaBuilder.cpp:1638] ["table is not exist in TiKV, applyTable need retry, get_by_mvcc=false database_id=2 logical_table_id=710"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.170 +08:00] [WARN] [TiDBSchemaSyncer.cpp:274] ["Can not apply table schema because the table_id_map is not up-to-date, try to syncSchemas. physical_table_id=712 database_id=2 logical_table_id=710"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.170 +08:00] [INFO] [TiDBSchemaSyncer.cpp:96] ["Start to sync schemas. current version is: 916 and try to sync schema version to: 926"] [source="keyspace=4294967295"] [thread_id=788]
...
[2024/07/12 12:46:31.214 +08:00] [TRACE] [SchemaBuilder.cpp:262] ["applyDiff accept type=RenameTable"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.214 +08:00] [WARN] [TableIDMap.cpp:41] ["table_id to database_id is being overwrite, table_id=710 old_database_id=2 new_database_id=708"] [source="keyspace=4294967295"] [thread_id=788]

Thread-B can not find the table .sql file in database_id=708 and raise an error

[2024/07/12 12:46:31.222 +08:00] [INFO] [TiDBSchemaSyncer.cpp:261] ["Sync table schema begin, table_id=713"] [source="keyspace=4294967295"] [thread_id=790]
[2024/07/12 12:46:31.228 +08:00] [INFO] [SchemaBuilder.cpp:1698] ["Alter table db_708.t_713 begin, database_id=708 table_id=713"] [source="keyspace=4294967295"] [thread_id=790]
[2024/07/12 12:46:31.233 +08:00] [ERROR] [MPPTask.cpp:644] ["task running meets error: Code: 107, e.displayText() = DB::Exception: Cannot open file /tmp/tiflash/data/db/metadata/db_708/t_713.sql, errno: 2, strerror: No such file or directory, e.what() = DB::Exception, Stack trace:\n\n\n       0x42d99be\tStackTrace::StackTrace() [tiflash+70097342]\n       0x42c8262\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+70025826]\n       0x42fcc7a\tDB::ErrnoException::ErrnoException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) [tiflash+70241402]\n       0x42f82fe\tDB::throwFromErrno(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, int) [tiflash+70222590]\n       0xc83433c\tDB::PosixRandomAccessFile::PosixRandomAccessFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::shared_ptr<DB::ReadLimiter> const&, std::__1::shared_ptr<DB::FileSegment> const&) [tiflash+209929020]\n       0xc828beb\tDB::PosixRandomAccessFile* std::__1::construct_at<DB::PosixRandomAccessFile, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, DB::PosixRandomAccessFile*>(DB::PosixRandomAccessFile*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209882091]\n       0xc82897b\tvoid std::__1::allocator_traits<std::__1::allocator<DB::PosixRandomAccessFile> >::construct<DB::PosixRandomAccessFile, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, void, void>(std::__1::allocator<DB::PosixRandomAccessFile>&, DB::PosixRandomAccessFile*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209881467]\n       0xc82867e\tstd::__1::__shared_ptr_emplace<DB::PosixRandomAccessFile, std::__1::allocator<DB::PosixRandomAccessFile> >::__shared_ptr_emplace<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&>(std::__1::allocator<DB::PosixRandomAccessFile>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209880702]\n       0xc8284a4\tstd::__1::shared_ptr<DB::PosixRandomAccessFile> std::__1::allocate_shared<DB::PosixRandomAccessFile, std::__1::allocator<DB::PosixRandomAccessFile>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, void>(std::__1::allocator<DB::PosixRandomAccessFile> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209880228]\n       0xc827a47\tstd::__1::shared_ptr<DB::PosixRandomAccessFile> std::__1::make_shared<DB::PosixRandomAccessFile, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&, void>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&, std::__1::shared_ptr<DB::ReadLimiter> const&) [tiflash+209877575]\n       0xc8238ef\tDB::FileProvider::newRandomAccessFile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::EncryptionPath const&, std::__1::shared_ptr<DB::ReadLimiter> const&, int) const [tiflash+209860847]\n       0xc84fb5c\tDB::ReadBufferFromFileProvider::ReadBufferFromFileProvider(std::__1::shared_ptr<DB::FileProvider> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::EncryptionPath const&, unsigned long, std::__1::shared_ptr<DB::ReadLimiter> const&, int, char*, unsigned long) [tiflash+210041692]\n       0xc7ff9a9\tDB::DatabaseTiFlash::alterTable(DB::Context const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ColumnsDescription const&, std::__1::function<void (DB::IAST&)> const&) [tiflash+209713577]\n       0xd63faef\tDB::updateDeltaMergeTableCreateStatement(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<DB::SortColumnDescription, std::__1::allocator<DB::SortColumnDescription> > const&, DB::ColumnsDescription const&, DB::OrderedNameSet const&, std::__1::optional<std::__1::reference_wrapper<TiDB::TableInfo const> >, unsigned long, DB::Context const&) [tiflash+224656111]\n       0xd640515\tDB::StorageDeltaMerge::alterSchemaChange(std::__1::shared_ptr<DB::RWLock::LockHolderImpl> const&, TiDB::TableInfo&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::Context const&) [tiflash+224658709]\n       0xde9d7b8\tDB::SchemaBuilder<DB::SchemaGetter, DB::SchemaNameMapper>::applyTable(long, long, long, bool) [tiflash+233428920]\n       0xde6fa32\tDB::TiDBSchemaSyncer<false, false>::trySyncTableSchema(DB::Context&, long, DB::SchemaGetter&, bool, char const*) [tiflash+233241138]\n       0xde6ecf9\tDB::TiDBSchemaSyncer<false, false>::syncTableSchema(DB::Context&, long) [tiflash+233237753]\n       0xc9a8858\tDB::TiDBSchemaSyncerManager::syncTableSchema(DB::Context&, unsigned int, long) [tiflash+211454040]\n       0xe316822\tDB::DAGStorageInterpreter::getAndLockStorages(long)::$_8::operator()(long) const [tiflash+238118946]\n       0xe30f884\tDB::DAGStorageInterpreter::getAndLockStorages(long) [tiflash+238090372]\n       0xe308487\tDB::DAGStorageInterpreter::prepare() [tiflash+238060679]\n       0xe309311\tDB::DAGStorageInterpreter::execute(DB::PipelineExecutorContext&, DB::PipelineExecGroupBuilder&) [tiflash+238064401]\n       0xe7152d1\tDB::PhysicalTableScan::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+242307793]\n       0xe64935b\tDB::PhysicalPlanNode::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+241472347]\n       0xe64935b\tDB::PhysicalPlanNode::buildPipeline(DB::PipelineBuilder&, DB::Context&, DB::PipelineExecutorContext&) [tiflash+241472347]\n       0xe640c12\tDB::PhysicalPlan::toPipeline(DB::PipelineExecutorContext&, DB::Context&) [tiflash+241437714]\n       0xe5b327d\tDB::PipelineExecutor::PipelineExecutor(std::__1::shared_ptr<MemoryTracker> const&, DB::AutoSpillTrigger*, std::__1::function<void (std::__1::shared_ptr<DB::OperatorSpillContext> const&)> const&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+240857725]\n       0xe23d0ef\tstd::__1::__unique_if<DB::PipelineExecutor>::__unique_single std::__1::make_unique<DB::PipelineExecutor, std::__1::shared_ptr<MemoryTracker>&, std::nullptr_t, std::nullptr_t, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&>(std::__1::shared_ptr<MemoryTracker>&, std::nullptr_t&&, std::nullptr_t&&, DB::Context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [tiflash+237228271]\n       0xe237e73\tDB::(anonymous namespace)::executeAsPipeline(DB::Context&, bool) [tiflash+237207155]\n       0xe23747b\tDB::queryExecute(DB::Context&, bool) [tiflash+237204603]\n       0xe4d8a18\tDB::MPPTask::preprocess() [tiflash+239962648]"] [source="MPP<gather_id:<gather_id:3, query_ts:1720759591160913143, local_query_id:163, server_id:1709, start_ts:451086802253250576, resource_group: default>,task_id:2>"] [thread_id=790]
...

Thread-A end for renaming the partitioned table

[2024/07/12 12:46:31.232 +08:00] [INFO] [SchemaBuilder.cpp:740] ["Rename table db_2.t_713 (display name: t_713) to db_708.t_713 begin, database_id=708 table_id=713"] [source="keyspace=4294967295"] [thread_id=788]
[2024/07/12 12:46:31.237 +08:00] [INFO] [SchemaBuilder.cpp:763] ["Rename table db_2.t_713 (display name: t_713) to db_708.t_713 end, database_id=708 table_id=713"] [source="keyspace=4294967295"] [thread_id=788]

@JaySon-Huang JaySon-Huang changed the title Concurrency issue between rename table and applyTable Concurrency issue between rename partitioned table and applyTable Jul 15, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 23, 2024
ref #9233

Make renaming `DeltaMergeStore::db_name` to be atomic
JaySon-Huang added a commit to JaySon-Huang/tiflash that referenced this issue Dec 2, 2024
ref pingcap#9233

Make renaming `DeltaMergeStore::db_name` to be atomic
@JaySon-Huang JaySon-Huang added the affects-7.5 This bug affects the 7.5.x(LTS) versions. label Dec 3, 2024
ti-chi-bot bot pushed a commit that referenced this issue Dec 3, 2024
ref #9233

Make renaming `DeltaMergeStore::db_name` to be atomic

Signed-off-by: JaySon-Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.5 This bug affects the 7.5.x(LTS) versions. component/storage severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

1 participant