Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix lock release issue in cloud native pk table (backport #53878) #53885

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Dec 12, 2024

Why I'm doing:

When prepare_primary_index, PK index load will fail and remove this index from cache:

StatusOr<IndexEntry*> UpdateManager::prepare_primary_index(
        const TabletMetadataPtr& metadata, MetaFileBuilder* builder, int64_t base_version, int64_t new_version,
        std::unique_ptr<std::lock_guard<std::shared_timed_mutex>>& guard) {
    ......
    // Fetch lock guard before `lake_load`
    guard = index.fetch_guard(); <---- get lock guard
    Status st = index.lake_load(_tablet_mgr, metadata, base_version, builder);
    ...
    if (!st.ok()) {
        .....
        _index_cache.remove(index_entry); <---- remove index
        ....
        return Status::InternalError(msg);
    }

But PrimaryKeyTxnLogApplier still hold the lock guard. After PrimaryKeyTxnLogApplier been destroyed, lock guard will release the lock which address is invalid because PK index already been remove.

This will cause invalid address access, it lead to unexpected behavior like stucking at here:
img_v3_02hg_05c4b134-a793-412d-8fec-67c4dc70577g

What I'm doing:

If PK index load fail, we need to release the lock before PK index been removed.

This pull request includes changes to improve error handling in the UpdateManager class and adds a new test case to cover index load failures. The most important changes include releasing the lock guard before removing the index entry when load or prepare operations fail, and adding a new test case in LakePrimaryKeyPublishTest.

Improvements to error handling:

New test case:

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Signed-off-by: luohaha <[email protected]>
(cherry picked from commit 35043dc)

# Conflicts:
#	be/test/storage/lake/primary_key_publish_test.cpp
Copy link
Contributor Author

mergify bot commented Dec 12, 2024

Cherry-pick of 35043dc has failed:

On branch mergify/bp/branch-3.1/pr-53878
Your branch is up to date with 'origin/branch-3.1'.

You are currently cherry-picking commit 35043dc0ad.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/storage/lake/update_manager.cpp

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   be/test/storage/lake/primary_key_publish_test.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Copy link
Contributor Author

mergify bot commented Dec 12, 2024

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@luohaha luohaha reopened this Dec 12, 2024
@wanpengfei-git wanpengfei-git enabled auto-merge (squash) December 12, 2024 15:28
Signed-off-by: Yixin Luo <[email protected]>
@wanpengfei-git wanpengfei-git merged commit 645c222 into branch-3.1 Dec 12, 2024
30 of 31 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-3.1/pr-53878 branch December 12, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants