Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] fix lock release issue in cloud native pk table #53878

Merged
merged 1 commit into from
Dec 12, 2024

Conversation

luohaha
Copy link
Contributor

@luohaha luohaha commented Dec 12, 2024

Why I'm doing:

When prepare_primary_index, PK index load will fail and remove this index from cache:

StatusOr<IndexEntry*> UpdateManager::prepare_primary_index(
        const TabletMetadataPtr& metadata, MetaFileBuilder* builder, int64_t base_version, int64_t new_version,
        std::unique_ptr<std::lock_guard<std::shared_timed_mutex>>& guard) {
    ......
    // Fetch lock guard before `lake_load`
    guard = index.fetch_guard(); <---- get lock guard
    Status st = index.lake_load(_tablet_mgr, metadata, base_version, builder);
    ...
    if (!st.ok()) {
        .....
        _index_cache.remove(index_entry); <---- remove index
        ....
        return Status::InternalError(msg);
    }

But PrimaryKeyTxnLogApplier still hold the lock guard. After PrimaryKeyTxnLogApplier been destroyed, lock guard will release the lock which address is invalid because PK index already been remove.

This will cause invalid address access, it lead to unexpected behavior like stucking at here:
img_v3_02hg_05c4b134-a793-412d-8fec-67c4dc70577g

What I'm doing:

If PK index load fail, we need to release the lock before PK index been removed.

This pull request includes changes to improve error handling in the UpdateManager class and adds a new test case to cover index load failures. The most important changes include releasing the lock guard before removing the index entry when load or prepare operations fail, and adding a new test case in LakePrimaryKeyPublishTest.

Improvements to error handling:

New test case:

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@luohaha luohaha marked this pull request as ready for review December 12, 2024 12:32
@luohaha luohaha requested a review from a team as a code owner December 12, 2024 12:32
Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

pass : 4 / 4 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 src/storage/lake/update_manager.cpp 4 4 100.00% []

@luohaha luohaha enabled auto-merge (squash) December 12, 2024 13:06
@luohaha luohaha merged commit 35043dc into StarRocks:main Dec 12, 2024
60 checks passed
Copy link

@Mergifyio backport branch-3.4

@github-actions github-actions bot removed the 3.4 label Dec 12, 2024
Copy link

@Mergifyio backport branch-3.3

@github-actions github-actions bot removed the 3.3 label Dec 12, 2024
Copy link

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Dec 12, 2024
Copy link

@Mergifyio backport branch-3.1

@github-actions github-actions bot removed the 3.1 label Dec 12, 2024
Copy link
Contributor

mergify bot commented Dec 12, 2024

backport branch-3.4

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2024

backport branch-3.3

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2024

backport branch-3.2

✅ Backports have been created

Copy link
Contributor

mergify bot commented Dec 12, 2024

backport branch-3.1

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Dec 12, 2024
mergify bot pushed a commit that referenced this pull request Dec 12, 2024
mergify bot pushed a commit that referenced this pull request Dec 12, 2024
mergify bot pushed a commit that referenced this pull request Dec 12, 2024
Signed-off-by: luohaha <[email protected]>
(cherry picked from commit 35043dc)

# Conflicts:
#	be/test/storage/lake/primary_key_publish_test.cpp
wanpengfei-git pushed a commit that referenced this pull request Dec 12, 2024
wanpengfei-git pushed a commit that referenced this pull request Dec 12, 2024
wanpengfei-git pushed a commit that referenced this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants