Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix lock release issue in cloud native pk table (backport #53878) #53884

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Dec 12, 2024

Why I'm doing:

When prepare_primary_index, PK index load will fail and remove this index from cache:

StatusOr<IndexEntry*> UpdateManager::prepare_primary_index(
        const TabletMetadataPtr& metadata, MetaFileBuilder* builder, int64_t base_version, int64_t new_version,
        std::unique_ptr<std::lock_guard<std::shared_timed_mutex>>& guard) {
    ......
    // Fetch lock guard before `lake_load`
    guard = index.fetch_guard(); <---- get lock guard
    Status st = index.lake_load(_tablet_mgr, metadata, base_version, builder);
    ...
    if (!st.ok()) {
        .....
        _index_cache.remove(index_entry); <---- remove index
        ....
        return Status::InternalError(msg);
    }

But PrimaryKeyTxnLogApplier still hold the lock guard. After PrimaryKeyTxnLogApplier been destroyed, lock guard will release the lock which address is invalid because PK index already been remove.

This will cause invalid address access, it lead to unexpected behavior like stucking at here:
img_v3_02hg_05c4b134-a793-412d-8fec-67c4dc70577g

What I'm doing:

If PK index load fail, we need to release the lock before PK index been removed.

This pull request includes changes to improve error handling in the UpdateManager class and adds a new test case to cover index load failures. The most important changes include releasing the lock guard before removing the index entry when load or prepare operations fail, and adding a new test case in LakePrimaryKeyPublishTest.

Improvements to error handling:

New test case:

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

luohaha
luohaha previously approved these changes Dec 12, 2024
Signed-off-by: Yixin Luo <[email protected]>
@wanpengfei-git wanpengfei-git merged commit 225a9bf into branch-3.3 Dec 12, 2024
28 of 29 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-3.3/pr-53878 branch December 12, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants