Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: successive exec of flushdb may cause delete old db fail #2790

Merged

Conversation

cheniujh
Copy link
Collaborator

@cheniujh cheniujh commented Jul 9, 2024

该PR修复了 Issue #2698
问题原因:
Pika的flushdb, 删除旧实例的目录是异步执行,client这边直接返回ok,但如果之前的旧目录还没有删除完毕,新到达的flushdb会实际上执行失败。 如果用户短时间内提交好几条flushdb, 其实就第一条会成功,后面几条很有可能会失败(尤其是之前旧DB目录比较大的情况下),然后pika是不管flushdb是否成功都返回了ok。Issue中gdb抓取到的异常,是由于旧的db_path_deleting目录还存在(异步执行删除),所以下一条flushdb试图执行rename逻辑时(将db_path,rename为db_path_deleting)会发现db_path_deleting目录已经存在,所以rename失败,但是原代码中没有对这种失败的处理,尽管db目录没有清除掉,但是照样会写binlog,会清空缓存,还会告知用户flushdb已经成功, 这是完全错误的。

问题修复:
1 给flushdb增加了is_flushing_db标志位和异步回调函数,这样下一条flushdb如果在前一个flushdb的异步任务(删除旧db目录)没有执行完之前,就会执行失败。
(更换了一种修复方式,直接在rename旧db时,在后缀deleteing后面加上时间戳,这样即使有多个flushdb连续达到,也不会因为旧目录没有删除掉而触发rename失败)
2 给flushdb链路增加了返回值,如果flushdb失败,会告知用户执行失败,而不是无脑返回OK
3 如果flushdb失败了,就不写binlog,也不去清空缓存
4 对于flushall命令,但凡有某个db flush失败,都返回告知用户

This PR fixes Issue #2698.

Cause of the Problem:

The issue arises from Pika's flush operation, where the deletion of the old instance directory is performed asynchronously. The client side immediately receives an OK, but if the old directory has not been fully deleted, any new flushdb commands will actually fail. If a user submits several flushdb commands in quick succession, only the first one is likely to succeed. The subsequent ones will most likely fail (particularly if the old DB directory is quite large). Pika returned OK regardless of whether the flushdb succeeded. The exception captured by gdb in the issue is due to the old db_path_deleting directory still existing (as the deletion is asynchronous). Consequently, the next flushdb tries to execute the rename operation (renaming db_path to db_path_deleting), but finds that the db_path_deleting directory already exists and thus the rename fails. The original code didn't handle this failure; the binlog was still written, the cache was cleared, and the user was informed that flushdb was successful despite the fact that the DB directory was not cleaned up.

Problem Fix:

  1. A flag is_flushing_db and a callback function were added to the flushdb. This ensures that any subsequent flushdb commands will fail if the asynchronous task of deleting the old DB directory from a previous flushdb has not completed.
    (An alternative fix was implemented: by appending a timestamp to the suffix of db_path_deleting during the rename operation. Therefore, even if multiple flushdb commands arrive consecutively, the rename will not fail due to the old directory not being deleted.)

  2. flushdb now returns a value. If flushdb fails, it will inform the user of the failure instead of blindly returning OK.

  3. If flushdb fails, neither the binlog is written nor is the cache cleared.

  4. For the flushall command, if any flushdb fails, the user is informed of the failure.

Summary by CodeRabbit

  • New Features

    • Added improved error handling and logging for database flush operations, providing more detailed information about success or failure.
    • Introduced callback functionality for directory purging, allowing users to specify actions upon completion.
  • Bug Fixes

    • Enhanced error handling during flush operations to ensure correct database instance management and conflict avoidance.
  • Improvements

    • Added new methods and flags to track the success of database flush operations, improving reliability and user feedback.

2 add an atomic flag for flushdb
3 add callback function when deleted old path
4 revised some ugly code
Copy link

coderabbitai bot commented Jul 9, 2024

Walkthrough

Recent updates introduce more robust control and error handling for database flush operations, including new methods and improved logging. Key changes involve adding callback support for directory purges, enhancing flushing logic to append timestamps for conflict resolution, and better tracking the success of operations. These modifications ensure smoother and more informative administrative commands and transactions in the system.

Changes

File Path Change Summary
include/pika_admin.h Modified FlushallCmd and FlushdbCmd to return bool and added new methods for binlog operations.
include/pika_db.h Added ResetIsFlushingDBToFalse method to reset flushing flags.
include/pika_server.h Introduced PurgeDirWithCallBack for directory purging with callbacks.
src/pika_admin.cc Enhanced flush operation tracking and error logging in FlushallCmd and FlushdbCmd.
src/pika_db.cc Improved conflict handling during database flush by appending timestamps to paths.
src/pika_server.cc Added PurgeDirArgsWithCallBackArgs struct and DoPurgeDirWithCallBack function for callback support.
src/pika_transaction.cc Modified FlushdbCmd method names for consistency.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PikaAdmin
    participant PikaDB
    participant PikaServer

    User->>PikaAdmin: Initiate FlushallCmd
    PikaAdmin->>PikaDB: Call FlushAllWithoutLock
    PikaDB-->>PikaAdmin: Return success/fail status
    PikaAdmin->>PikaAdmin: Update flushall_succeed_ flag
    PikaAdmin->>PikaServer: Call DoBinlog if flushall_succeed_
    PikaServer-->>PikaAdmin: Binlog operation status
    PikaAdmin->>User: Return operation result

    User->>PikaServer: Request directory purge with callback
    PikaServer->>PikaServer: Schedule PurgeDirTask
    PikaServer->>PikaServer: Execute DoPurgeDirWithCallBack
    PikaServer-->>User: Confirm purge completion
Loading

Poem

In the code where bits do play,
A flush command clears the way.
With timestamps now, no conflicts stay,
Callbacks guide each purge's day.
Logs and flags ensure we say,
Our data's safe, come what may.
CodeRabbit smiles, hip-hip-hooray! 🐰✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Jul 9, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 12090e4 and 7b125d8.

Files selected for processing (7)
  • include/pika_admin.h (2 hunks)
  • include/pika_db.h (2 hunks)
  • include/pika_server.h (1 hunks)
  • src/pika_admin.cc (5 hunks)
  • src/pika_db.cc (3 hunks)
  • src/pika_server.cc (2 hunks)
  • src/pika_transaction.cc (1 hunks)
Additional comments not posted (25)
include/pika_db.h (1)

151-153: LGTM!

The new method ResetIsFlushingDBToFalse correctly sets the is_flushing_db_ flag to false.

src/pika_transaction.cc (1)

Line range hint 63-67:
Ensure proper error handling for FlushdbCmd.

The DoWithoutLock method now returns a boolean indicating success or failure. Ensure that the failure case is handled appropriately.

include/pika_server.h (1)

205-205: LGTM! Verify callback usage.

The new method PurgeDirWithCallBack correctly adds functionality for directory purging with a callback. Ensure that the callback function is used correctly in all relevant places.

src/pika_db.cc (3)

47-47: LGTM!

The is_flushing_db_ flag is correctly initialized to false in the constructor.


201-205: LGTM! Verify the correctness of the logic.

The logic to handle the is_flushing_db_ flag in the FlushDBWithoutLock method ensures that only one flush operation can occur at a time. Proper logging is included for failure cases.


228-230: LGTM! Verify callback usage.

The usage of the callback function to reset the is_flushing_db_ flag in the FlushDBWithoutLock method is correct and ensures that the flag is reset after the directory is purged.

include/pika_admin.h (9)

188-188: Change return type to bool for FlushAllWithoutLock.

Changing the return type to bool is a good practice to indicate success or failure. Ensure that this method is properly tested and that the return values are correctly handled in the calling code.


189-189: Add DoBinlog method.

The addition of the DoBinlog method is noted. Ensure that this method is correctly implemented and integrated with the rest of the class.


193-193: Change return type to bool for DoWithoutLock.

Changing the return type to bool is a good practice to indicate success or failure. Ensure that this method is properly tested and that the return values are correctly handled in the calling code.


194-194: Add Clear method.

The addition of the Clear method is noted. This method resets the flushall_succeed_ flag, which is crucial for maintaining the state of the command. Ensure that this method is correctly called where necessary.


196-196: Add flushall_succeed_ flag.

The addition of the flushall_succeed_ flag is noted. This flag is essential for tracking the success of the flush operation. Ensure that this flag is correctly set and checked in the relevant methods.


212-212: Add DoBinlog method.

The addition of the DoBinlog method is noted. Ensure that this method is correctly implemented and integrated with the rest of the class.


213-213: Change return type to bool for DoWithoutLock.

Changing the return type to bool is a good practice to indicate success or failure. Ensure that this method is properly tested and that the return values are correctly handled in the calling code.


217-220: Add Clear method.

The addition of the Clear method is noted. This method resets the flush_succeed_ flag and clears the db_name_, which is crucial for maintaining the state of the command. Ensure that this method is correctly called where necessary.


222-222: Add flush_succeed_ flag.

The addition of the flush_succeed_ flag is noted. This flag is essential for tracking the success of the flush operation. Ensure that this flag is correctly set and checked in the relevant methods.

src/pika_server.cc (3)

813-816: Verify callback behavior.

The PurgeDirWithCallBack function schedules a directory purge operation with a callback. Ensure that the callback function does not introduce side effects or unexpected behavior.


51-57: Ensure callback execution under all circumstances.

The DoPurgeDirWithCallBack function calls the callback after the directory deletion. Ensure that the callback is executed even if pstd::DeleteDir fails.


44-49: Verify appropriate usage of callback.

The PurgeDirArgsWithCallBackArgs struct encapsulates the arguments for directory purge operations with a callback. Ensure that the std::function<void()> callback is used appropriately and doesn't cause memory leaks or dangling references.

src/pika_admin.cc (7)

495-495: Initialization of flushall_succeed_

The initialization of the flushall_succeed_ flag to false is a good practice to ensure the success state is correctly tracked.


514-524: Usage of flushall_succeed_ flag

The update to use the flushall_succeed_ flag to track the success of the flush operation is appropriate. Ensure that the flag is correctly set and checked to communicate the operation's success.


533-536: Check for flushall_succeed_ flag

The check for the flushall_succeed_ flag before clearing the cache is a good practice to prevent unnecessary cache clearing in case of a flush failure.


543-554: Return type change to bool

The change to return a bool indicating the success of the flush operation allows for better error handling and communication of the operation's success.


557-570: Return type change to bool

The change to return a bool indicating the success of the flush operation allows for better error handling and communication of the operation's success. The function also updates the cache if the flush operation is successful.


571-574: Addition of DoBinlog function

The addition of the DoBinlog function ensures that binlog operations are handled correctly after a successful flush operation.


Line range hint 578-646:
Updates to FlushdbCmd class

The updates to the FlushdbCmd class, including the use of a success flag and bool return type for the DoWithoutLock method, ensure consistency in handling flush operations across different commands. These changes improve error handling and communication.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7b125d8 and a4a1d2b.

Files selected for processing (2)
  • src/pika_db.cc (1 hunks)
  • src/pika_server.cc (2 hunks)
Files skipped from review as they are similar to previous changes (2)
  • src/pika_db.cc
  • src/pika_server.cc

@cheniujh cheniujh force-pushed the fix_successive_flushdb_exec branch from a4a1d2b to 86578e2 Compare July 10, 2024 06:55
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between a4a1d2b and 86578e2.

Files selected for processing (3)
  • src/pika_admin.cc (5 hunks)
  • src/pika_db.cc (1 hunks)
  • src/pika_server.cc (2 hunks)
Files skipped from review as they are similar to previous changes (3)
  • src/pika_admin.cc
  • src/pika_db.cc
  • src/pika_server.cc

@cheniujh cheniujh requested review from chejinge and Mixficsol July 10, 2024 08:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 86578e2 and 41d82f4.

Files selected for processing (1)
  • src/pika_db.cc (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/pika_db.cc

@cheniujh cheniujh requested a review from AlexStocks July 10, 2024 09:26
src/pika_admin.cc Show resolved Hide resolved
src/pika_admin.cc Outdated Show resolved Hide resolved
@cheniujh cheniujh requested a review from Mixficsol July 15, 2024 08:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 41d82f4 and 7cd4d95.

Files selected for processing (1)
  • src/pika_admin.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/pika_admin.cc

src/pika_admin.cc Outdated Show resolved Hide resolved
src/pika_admin.cc Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7cd4d95 and 52e14ee.

Files selected for processing (1)
  • src/pika_admin.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/pika_admin.cc

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 52e14ee and 66daa77.

Files selected for processing (1)
  • src/pika_admin.cc (5 hunks)
Files skipped from review as they are similar to previous changes (1)
  • src/pika_admin.cc

@wanghenshui
Copy link
Collaborator

please add test

@AlexStocks
Copy link
Contributor

Yeah, please add a test.

@baixin01 baixin01 merged commit 394f517 into OpenAtomFoundation:unstable Jul 26, 2024
13 checks passed
chejinge pushed a commit that referenced this pull request Aug 1, 2024
* 1 add return flag for flushdb

* removed duplicated logic

---------

Co-authored-by: cheniujh <[email protected]>
cheniujh added a commit to cheniujh/pika that referenced this pull request Sep 24, 2024
…mFoundation#2790)

* 1 add return flag for flushdb
2 add an atomic flag for flushdb
3 add callback function when deleted old path
4 revised some ugly code

* add timestamp for deleting suffix when flushdb

* remove debug
remove debug log

* change a return value to fixed "false"

* add failed reason to user resp

* removed duplicated logic

---------

Co-authored-by: cheniujh <[email protected]>
cheniujh added a commit to cheniujh/pika that referenced this pull request Sep 24, 2024
…mFoundation#2790)

* 1 add return flag for flushdb

* removed duplicated logic

---------

Co-authored-by: cheniujh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.5.5 4.0.1 ☢️ Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants