Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](merge-on-write) Fix MergeIndexDeleteBitmapCalculator::calculate_one() coredump #44284

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Nov 19, 2024

What problem does this PR solve?

Problem Summary:

MergeIndexDeleteBitmapCalculatorContext::get_current_key() may return non-OK status when encounter memory allocation failure, which makes MergeIndexDeleteBitmapCalculatorContext::Comparator::operator() returns incorrect result and break some assumptions during the process of multiway merging, which leads to coredump.

 1# 0x00007F507D0B3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# 0x000055E3A805DD7D in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 6# 0x000055E3A805047A in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 8# google::LogMessage::Flush() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
10# doris::MergeIndexDeleteBitmapCalculatorContext::seek_at_or_after(doris::Slice const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
11# doris::MergeIndexDeleteBitmapCalculator::calculate_one(doris::RowLocation&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/delete_bitmap_calculator.cpp:197
12# doris::MergeIndexDeleteBitmapCalculator::calculate_all(std::shared_ptr) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
13# doris::Tablet::calc_delete_bitmap_between_segments(std::shared_ptr, std::vector, std::allocator > > const&, std::shared_ptr) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:4075
14# doris::Tablet::update_delete_bitmap_without_lock(std::shared_ptr const&, std::vector, std::allocator > > const*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:3468
15# doris::Tablet::revise_tablet_meta(std::vector, std::allocator > > const&, std::vector, std::allocator > > const&, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:415
16# doris::EngineCloneTask::_finish_incremental_clone(doris::Tablet*, std::shared_ptr const&, long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:795
17# doris::EngineCloneTask::_finish_clone(doris::Tablet*, std::__cxx11::basic_string, std::allocator > const&, long, bool) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
18# doris::EngineCloneTask::_do_clone() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
19# doris::EngineCloneTask::execute() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:159
20# doris::clone_callback(doris::StorageEngine&, doris::TMasterInfo const&, doris::TAgentTaskRequest const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
21# std::_Function_handler(doris::TAgentTaskRequest const&) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
22# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:551
23# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
24# start_thread at ./nptl/pthread_create.c:442
25# 0x00007F507D197850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

dataroaring
dataroaring previously approved these changes Nov 19, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 19, 2024

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 19, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@bobhan1 bobhan1 requested a review from zhannngchen November 19, 2024 13:04
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 19, 2024
@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 19, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@bobhan1
Copy link
Contributor Author

bobhan1 commented Nov 20, 2024

run cloud_p0

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@bobhan1 bobhan1 requested a review from dataroaring November 20, 2024 05:54
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit 1601d75 into apache:master Nov 20, 2024
33 of 36 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 20, 2024
…e_one()` coredump (#44284)

### What problem does this PR solve?

Problem Summary:

`MergeIndexDeleteBitmapCalculatorContext::get_current_key()` may return
non-OK status when encounter memory allocation failure, which makes
`MergeIndexDeleteBitmapCalculatorContext::Comparator::operator()`
returns incorrect result and break some assumptions during the process
of multiway merging, which leads to coredump.

```
 1# 0x00007F507D0B3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# 0x000055E3A805DD7D in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 6# 0x000055E3A805047A in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 8# google::LogMessage::Flush() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
10# doris::MergeIndexDeleteBitmapCalculatorContext::seek_at_or_after(doris::Slice const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
11# doris::MergeIndexDeleteBitmapCalculator::calculate_one(doris::RowLocation&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/delete_bitmap_calculator.cpp:197
12# doris::MergeIndexDeleteBitmapCalculator::calculate_all(std::shared_ptr) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
13# doris::Tablet::calc_delete_bitmap_between_segments(std::shared_ptr, std::vector, std::allocator > > const&, std::shared_ptr) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:4075
14# doris::Tablet::update_delete_bitmap_without_lock(std::shared_ptr const&, std::vector, std::allocator > > const*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:3468
15# doris::Tablet::revise_tablet_meta(std::vector, std::allocator > > const&, std::vector, std::allocator > > const&, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:415
16# doris::EngineCloneTask::_finish_incremental_clone(doris::Tablet*, std::shared_ptr const&, long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:795
17# doris::EngineCloneTask::_finish_clone(doris::Tablet*, std::__cxx11::basic_string, std::allocator > const&, long, bool) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
18# doris::EngineCloneTask::_do_clone() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
19# doris::EngineCloneTask::execute() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:159
20# doris::clone_callback(doris::StorageEngine&, doris::TMasterInfo const&, doris::TAgentTaskRequest const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
21# std::_Function_handler(doris::TAgentTaskRequest const&) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
22# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:551
23# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
24# start_thread at ./nptl/pthread_create.c:442
25# 0x00007F507D197850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```
github-actions bot pushed a commit that referenced this pull request Nov 20, 2024
…e_one()` coredump (#44284)

### What problem does this PR solve?

Problem Summary:

`MergeIndexDeleteBitmapCalculatorContext::get_current_key()` may return
non-OK status when encounter memory allocation failure, which makes
`MergeIndexDeleteBitmapCalculatorContext::Comparator::operator()`
returns incorrect result and break some assumptions during the process
of multiway merging, which leads to coredump.

```
 1# 0x00007F507D0B3520 in /lib/x86_64-linux-gnu/libc.so.6
 2# pthread_kill at ./nptl/pthread_kill.c:89
 3# raise at ../sysdeps/posix/raise.c:27
 4# abort at ./stdlib/abort.c:81
 5# 0x000055E3A805DD7D in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 6# 0x000055E3A805047A in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 8# google::LogMessage::Flush() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
 9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
10# doris::MergeIndexDeleteBitmapCalculatorContext::seek_at_or_after(doris::Slice const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
11# doris::MergeIndexDeleteBitmapCalculator::calculate_one(doris::RowLocation&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/delete_bitmap_calculator.cpp:197
12# doris::MergeIndexDeleteBitmapCalculator::calculate_all(std::shared_ptr) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
13# doris::Tablet::calc_delete_bitmap_between_segments(std::shared_ptr, std::vector, std::allocator > > const&, std::shared_ptr) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:4075
14# doris::Tablet::update_delete_bitmap_without_lock(std::shared_ptr const&, std::vector, std::allocator > > const*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:3468
15# doris::Tablet::revise_tablet_meta(std::vector, std::allocator > > const&, std::vector, std::allocator > > const&, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/tablet.cpp:415
16# doris::EngineCloneTask::_finish_incremental_clone(doris::Tablet*, std::shared_ptr const&, long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:795
17# doris::EngineCloneTask::_finish_clone(doris::Tablet*, std::__cxx11::basic_string, std::allocator > const&, long, bool) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
18# doris::EngineCloneTask::_do_clone() in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
19# doris::EngineCloneTask::execute() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/olap/task/engine_clone_task.cpp:159
20# doris::clone_callback(doris::StorageEngine&, doris::TMasterInfo const&, doris::TAgentTaskRequest const&) in /mnt/hdd01/ci/doris-deploy-branch-2.1-local/be/lib/doris_be
21# std::_Function_handler(doris::TAgentTaskRequest const&) const::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
22# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:551
23# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
24# start_thread at ./nptl/pthread_create.c:442
25# 0x00007F507D197850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```
hello-stephen pushed a commit that referenced this pull request Nov 20, 2024
…or::calculate_one()` coredump #44284 (#44330)

Cherry-picked from #44284

Co-authored-by: bobhan1 <[email protected]>
dataroaring pushed a commit that referenced this pull request Nov 26, 2024
…or::calculate_one()` coredump #44284 (#44328)

Cherry-picked from #44284

Co-authored-by: bobhan1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants