[Improvement] trim redundant read when scanning entry log. #4161

thetumbled · 2023-12-21T08:47:07Z

Descriptions of the changes in this PR:
Remove the redundant disk read when scannig the entry log file, which could reduce the pressure of disk a lot.

Motivation

We have org.apache.bookkeeper.bookie.storage.EntryLogScanner to process the entry data when scanning the entry log file, in which process method is used to process the entry data.
We have a lot of implementations for EntryLogScanner for various reason.

But some of them do not need to access all data in the entry log file.
For example:

scanner implemented in method extractEntryLogMetadataByScanning only need to know the entrySize field, in such case we do not need to read any data in the entry.
scanner implemented in method org.apache.bookkeeper.bookie.InterleavedStorageRegenerateIndexOp#initiate only need to know the entryId field, in such case we only need to read the first 16 bytes(ledgerId+entryId).
many more simillar situations...

So, we do not need to read the entry data when scanning the entry log in some cases.

Changes

Add method getLengthToRead in EntryLogScanner to indicate how much data do we need to read, and read this mount of data only.

thetumbled · 2023-12-21T09:00:35Z

Could you help to reviw this PR? thanks!
@eolivelli @dlg99 @hangc0276 @nicoloboschi @shoothzj @zymap @wenbingshen

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/EntryLogScanner.java

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java

eolivelli

very good, I left one final comment about the "switch" statement

eolivelli · 2023-12-22T15:13:16Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java

+                            return;
+                        }
+                        scanner.process(ledgerId, offset, data);
+                        break;


please add the "default" case

Done, PTAL.

hangc0276

I'm unsure of the extent of the improvements that can be achieved with this change, given the following concerns:

All the changes in this PR are specific to uncommon cases, occurring only when the entrylog file's index is missing.
The readFromLogChannel function utilizes BufferedLogChannel, which is a RandomAccessFile channel. When reading data from the file channel, the data is pre-fetched into the OS PageCache, with a default pre-fetch size of 4KB. Even though we only retrieve the entryId, the actual data read from the disk is a minimum of 4KB.

There is a potential risk associated with this PR:

We solely read the entryID without validating the entry data. If the file is corrupted, the scan operation will be unable to detect it and will incorrectly populate the index with the wrong ledger size. This introduces a high level of risk.

thetumbled · 2023-12-25T08:31:16Z

All the changes in this PR are specific to uncommon cases, occurring only when the entrylog file's index is missing.

We have encountered cases that the ledger map in entry log is missed, maybe because the bookie crashed before flushing entry log. As long as the corrupted entry log file exists, bookie will scan the entry log by extractEntryLogMetadataByScanning to generate EntryLogMetadataMap when doing gc every gcWaitTime milliseconds (default 15min).
And other cases is related to index rebuilding, which may be rarely used.

The readFromLogChannel function utilizes BufferedLogChannel, which is a RandomAccessFile channel. When reading data from the file channel, the data is pre-fetched into the OS PageCache, with a default pre-fetch size of 4KB. Even though we only retrieve the entryId, the actual data read from the disk is a minimum of 4KB.

It is common that the size of one entry reach 4MB ( we set the max batch size of Pulsar Client to be 4MB), so we can decrease 99.9% of the disk read with this enhancement.

There is a potential risk associated with this PR:
We solely read the entryID without validating the entry data. If the file is corrupted, the scan operation will be unable to detect it and will incorrectly populate the index with the wrong ledger size. This introduces a high level of risk.

The fault detecting logic is not changed, in the old logic we just read entrySize amount of data and check if we have read such amount of data. In the new read type, READ_NOTHING or READ_LEDGER_ENTRY_ID, we still move the read position in the loop and check if we have read expected amount of data. I don't think it is a break change.

thetumbled added 3 commits December 21, 2023 16:28

add support for specifying LengthToRead.

734126e

fix.

a9c6c78

fix.

7dc7689

thetumbled changed the title ~~[Improvement] skip read rebundunct content when scanning entry log.~~ [Improvement] trim rebundunct read when scanning entry log. Dec 21, 2023

thetumbled changed the title ~~[Improvement] trim rebundunct read when scanning entry log.~~ [Improvement] trim redundant read when scanning entry log. Dec 21, 2023

eolivelli requested changes Dec 21, 2023

View reviewed changes

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/EntryLogScanner.java Outdated Show resolved Hide resolved

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java Outdated Show resolved Hide resolved

thetumbled added 8 commits December 21, 2023 17:54

introduce enum ReadLengthType.

5c3823d

introduce different process method.

4aa6134

refactor.

2d4b700

fix.

c6130b3

fix.

869b5cc

add code.

c2fd96e

add implementation in LogReaderScan.

52dff17

add code.

fdddd8b

thetumbled requested a review from eolivelli December 21, 2023 12:46

eolivelli requested changes Dec 22, 2023

View reviewed changes

thetumbled added 2 commits December 23, 2023 12:11

add default case.

04f076e

throw exception when not implemented right.

4d2810e

thetumbled requested a review from eolivelli December 23, 2023 13:43

zymap assigned thetumbled Dec 25, 2023

zymap added type/improvement release/4.16.4 labels Dec 25, 2023

zymap requested a review from hangc0276 December 25, 2023 02:25

zymap added this to the 4.17.0 milestone Dec 25, 2023

hangc0276 requested changes Dec 25, 2023

View reviewed changes

hangc0276 requested review from merlimat and zymap December 25, 2023 03:25

hangc0276 removed the release/4.16.4 label Jan 8, 2024

hangc0276 added the release/4.16.5 label Jan 8, 2024

nicoloboschi removed the release/4.16.5 label Mar 21, 2024

eolivelli modified the milestones: 4.17.0, 4.18.0 Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] trim redundant read when scanning entry log. #4161

[Improvement] trim redundant read when scanning entry log. #4161

thetumbled commented Dec 21, 2023 •

edited

Loading

thetumbled commented Dec 21, 2023

eolivelli left a comment

eolivelli Dec 22, 2023

thetumbled Dec 23, 2023

hangc0276 left a comment

thetumbled commented Dec 25, 2023 •

edited

Loading

[Improvement] trim redundant read when scanning entry log. #4161

Are you sure you want to change the base?

[Improvement] trim redundant read when scanning entry log. #4161

Conversation

thetumbled commented Dec 21, 2023 • edited Loading

Motivation

Changes

thetumbled commented Dec 21, 2023

eolivelli left a comment

Choose a reason for hiding this comment

eolivelli Dec 22, 2023

Choose a reason for hiding this comment

thetumbled Dec 23, 2023

Choose a reason for hiding this comment

hangc0276 left a comment

Choose a reason for hiding this comment

thetumbled commented Dec 25, 2023 • edited Loading

thetumbled commented Dec 21, 2023 •

edited

Loading

thetumbled commented Dec 25, 2023 •

edited

Loading