-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement] trim redundant read when scanning entry log. #4161
base: master
Are you sure you want to change the base?
[Improvement] trim redundant read when scanning entry log. #4161
Conversation
Could you help to reviw this PR? thanks! |
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/EntryLogScanner.java
Outdated
Show resolved
Hide resolved
bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good, I left one final comment about the "switch" statement
return; | ||
} | ||
scanner.process(ledgerId, offset, data); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the "default" case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure of the extent of the improvements that can be achieved with this change, given the following concerns:
- All the changes in this PR are specific to uncommon cases, occurring only when the entrylog file's index is missing.
- The readFromLogChannel function utilizes BufferedLogChannel, which is a RandomAccessFile channel. When reading data from the file channel, the data is pre-fetched into the OS PageCache, with a default pre-fetch size of 4KB. Even though we only retrieve the entryId, the actual data read from the disk is a minimum of 4KB.
There is a potential risk associated with this PR:
- We solely read the entryID without validating the entry data. If the file is corrupted, the scan operation will be unable to detect it and will incorrectly populate the index with the wrong ledger size. This introduces a high level of risk.
We have encountered cases that the ledger map in entry log is missed, maybe because the bookie crashed before flushing entry log. As long as the corrupted entry log file exists, bookie will scan the entry log by
It is common that the size of one entry reach 4MB ( we set the max batch size of Pulsar Client to be 4MB), so we can decrease 99.9% of the disk read with this enhancement.
The fault detecting logic is not changed, in the old logic we just read |
Descriptions of the changes in this PR:
Remove the redundant disk read when scannig the entry log file, which could reduce the pressure of disk a lot.
Motivation
We have
org.apache.bookkeeper.bookie.storage.EntryLogScanner
to process the entry data when scanning the entry log file, in whichprocess
method is used to process the entry data.We have a lot of implementations for
EntryLogScanner
for various reason.But some of them do not need to access all data in the entry log file.
For example:
extractEntryLogMetadataByScanning
only need to know theentrySize
field, in such case we do not need to read any data in the entry.org.apache.bookkeeper.bookie.InterleavedStorageRegenerateIndexOp#initiate
only need to know theentryId
field, in such case we only need to read the first 16 bytes(ledgerId+entryId).So, we do not need to read the entry data when scanning the entry log in some cases.
Changes
Add method
getLengthToRead
inEntryLogScanner
to indicate how much data do we need to read, and read this mount of data only.