Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce receipt size #6602

Merged
merged 31 commits into from
Mar 26, 2024
Merged

Reduce receipt size #6602

merged 31 commits into from
Mar 26, 2024

Conversation

jframe
Copy link
Contributor

@jframe jframe commented Feb 22, 2024

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Most advanced CI tests are deferred until PR approval, but you could:

  • locally run all unit tests via: ./gradlew build
  • locally run all acceptance tests via: ./gradlew acceptanceTest
  • locally run all integration tests via: ./gradlew integrationTest
  • locally run all reference tests via: ./gradlew ethereum:referenceTests:referenceTests

PR description

Reduce receipt by introducing a new compact receipt encoding saving approximately 78GB on a freshly synced mainnet snapsync node. This new compact receipt encoding does not include the bloom filter and trims zeros from the log topics and log data similar to what was done in the Geth and Nethermind receipt encoding PRs.

I've added a version for the database format. So users won't be able to downgrade after once this is merged.

  • Create new compact receipt encoding with no bloom filter and reduce logs by trimming leading zeros on the log topic and log data
  • New format is detected by the absence of the log bloom filter encoded as a null/zero in RLP
  • Besu supports both the existing receipt and the compacted receipt format
  • New format enabled by default
  • Added feature flag to disable the feature where performance RPCs is critical --receipt-compaction-enabled
  • Added new Database metadata format v3 for forest and bonsai to avoid runtime errors if a user downgrade Besu to a version that doesn't have support for receipt compact

Testing

  • Snap sync on mainnet
  • Checkpoint sync on mainnet
  • RPC results comparison against version without this change for eth_debugGetRawReceipts, eth_feeHistory, eth_getBlockReceipts, eth_getMinerDataByBlockHash, eth_getTransactionReceipt, eth_getLogs, eth_newFilter, eth_getFilterChanges, eth_getFilterLogs
  • RPC load test eth_getLogs and eth_getBlockReceipts with 2 req/s over 5 minutes

Engine API performance

Under normal load see similar performance to the nodes not using receipt compaction
Screenshot 2024-02-29 at 5 30 39 pm

Load testing

eth_getLogs with 2 req/s over 5 minutes using Gatling on the node while it syncing

receipts
min: 2, max: 147, mean: 12, std dev: 11, response 95th percentile: 26, response 99th percentile: 53
min: 3, max: 107, mean: 12, std dev: 8, response 95th percentile: 24, response 99th percentile: 43
min: 3, max: 73, mean: 12, std dev: 8, response 95th percentile: 23, response 99th percentile: 52

control:
min: 2, max: 140, mean: 16, std dev: 14, response 95th percentile: 41, response 99th percentile: 58
min: 2, max: 73, mean: 10, std dev: 6, response 95th percentile: 21, response 99th percentile: 32
min: 3, max: 118, mean: 11, std dev: 10, response 95th percentile: 20, response 99th percentile: 55

Storage improvement

control:

Column Family Keys Total Size SST Files Size Blob Files Size
BLOCKCHAIN 2372566258 948 GiB 167 GiB 781 GiB

receipts:

Column Family Keys Total Size SST Files Size Blob Files Size
BLOCKCHAIN 2372566326 870 GiB 167 GiB 702 GiB

Disk space saving of ~78GB

Fixed Issue(s)

fixes #6476

@jframe jframe added the doc-change-required Indicates an issue or PR that requires doc to be updated label Mar 4, 2024
Signed-off-by: Jason Frame <[email protected]>
@jframe jframe marked this pull request as ready for review March 4, 2024 06:40
@gfukushima
Copy link
Contributor

I like the results of this PR, looking promising! Had a very quick first pass and the first thing I saw was some tests defaulting the creation of a "PrefixedKeyBlockchainStorage" to "false". It makes sense to test both scenarios where possible (I think you already do for unit tests). And In case where the creation of the blockchain isn't actually being tested but created to fulfill a dependency I would rather have that set to true since this is going to be the default value.

@siladu
Copy link
Contributor

siladu commented Mar 5, 2024

Added new Database metadata format v3 for forest and bonsai to avoid runtime errors if a user downgrade Besu to a version that doesn't have support for receipt compact

Will the user get a startup error instead?


min: 2, max: 147, mean: 12, std dev: 11, response 95th percentile: 26, response 99th percentile: 53

Is this showing response time measured in ms?

What's your interpretation of the results - seems like receipts are better than control (if significant)?

@jframe
Copy link
Contributor Author

jframe commented Mar 5, 2024

Added new Database metadata format v3 for forest and bonsai to avoid runtime errors if a user downgrade Besu to a version that doesn't have support for receipt compact

Will the user get a startup error instead?

They will get a startup error saying the database version is incompatible.

min: 2, max: 147, mean: 12, std dev: 11, response 95th percentile: 26, response 99th percentile: 53

Is this showing response time measured in ms?

What's your interpretation of the results - seems like receipts are better than control (if significant)?

Response time is measured in ms.

Not clear to me from the data that it's better as there is a fair bit of variance between test runs. The only thing I think can I say is that receipts seem at least as good as the control.

…t compaction to true in tests

Signed-off-by: Jason Frame <[email protected]>
Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR description and tests 👍

Are we planning on enabling this by default soon? If so, might be worth making this the default config in the integration/acceptance tests?

LGTM but wouldn't mind a second opinion particularly about the new database versioning stuff, downgrading etc.

CHANGELOG.md Outdated
@@ -4,6 +4,7 @@

### Breaking Changes
- RocksDB database metadata format has changed to be more expressive, the migration of an existing metadata file to the new format is automatic at startup. Before performing a downgrade to a previous version it is mandatory to revert to the original format using the subcommand `besu --data-path=/path/to/besu/datadir storage revert-metadata v2-to-v1`.
- RocksDB database version incremented for receipt compaction. It will not be possible to downgrade to the previous Besu version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be wary of merging this before Dencun dust has settled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds sensible. Could even make the default disabled and toggle it on by default at a later date if makes more sense.

@@ -61,6 +62,12 @@ public class DataStorageOptions implements CLIOptions<DataStorageConfiguration>
arity = "1")
private Long bonsaiMaxLayersToLoad = DEFAULT_BONSAI_MAX_LAYERS_TO_LOAD;

@Option(
names = "--receipt-compaction-enabled",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jframe
Copy link
Contributor Author

jframe commented Mar 7, 2024

@fab-10 Would you be able to take a look at the use of the database versioning I'm using in the PR?

@joaniefromtheblock joaniefromtheblock removed the doc-change-required Indicates an issue or PR that requires doc to be updated label Mar 7, 2024
…e receipt compaction to true in tests"

This reverts commit 7cffb80.

Signed-off-by: Jason Frame <[email protected]>
Copy link
Contributor

@fab-10 fab-10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall look good, some minor code suggestions, and some open questions on the UX:

  1. When enabling compact receipts on an existing DB, past receipts will not be touched, what about a subcommand to convert them to compact?
  2. Downgrade is only possible with a full resync, evaluate the option to add a revert subcommand
  3. Prevent the user from doing enable->disable?

@@ -6,6 +6,7 @@
- RocksDB database metadata format has changed to be more expressive, the migration of an existing metadata file to the new format is automatic at startup. Before performing a downgrade to a previous version it is mandatory to revert to the original format using the subcommand `besu --data-path=/path/to/besu/datadir storage revert-metadata v2-to-v1`.

### Upcoming Breaking Changes
- Receipt compaction will be enabled by default in a future version of Besu. After this change it will not be possible to downgrade to the previous Besu version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you evaluated the possibility to add a storage subcommand to revert to previous format? does it make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested some migration code as part of my testing as it took approximately 17 hours on mainnet so it would be better to just resync with it taking that long. It's not the exact code as I would use for a subcommand as this put the receipts in a different column family so I could compare sizes but I think it would take a similar time.

…ersioned with receipt compaction

Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Jason Frame <[email protected]>
Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been thinking about the database versioning, wanted to discuss the following options:

  1. Current state of PR: new format enabled/disabled in tandem with optional flag. Backwards compatible but not forwards compatible if a user downgraded.

  2. Update the database version regardless of the feature flag, i.e. version 3 supports --receipt-compaction-enabled behaviour even when the flag is disabled. Effectively the same as (1) but versioning is coupled to the release rather than flag.

  3. Avoid updating the database version for this: a justification could be that we introduce a convention where we only use the versioning for "schema" changes rather than changes to the data contents. Non-forwards-compatible downgrade issue remains though, once the feature has been enabled.
    I think you said the only benefit of the updated database version is ability to generate an appropriate warning message if they downgrade?


Orthogonal to the versioning...think the original plan was to enable the flag by default. Maybe this can be reconsidered now Dencun has passed. Do any exclusions apply to this (e.g. FULL sync?), or do we rely on announcement/breaking changes warning for users who require historic RPC to evaluate this feature?

* Current Bonsai version, with receipts using compaction, in order to make Receipts use less disk
* space
*/
BONSAI_WITH_RECEIPT_COMPACTION(DataStorageFormat.BONSAI, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent here to be "BONSAI_WITH_VARIABLES_AND_WITH_RECEIPT_COMPACTION" (V2 + RECEIPT_COMPACTION)
or "BONSAI_ORIGINAL_WITH_RECEIPT_COMPACTION"
and you'd need both versions to make up the right combination of features?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the former i.e. a one-way cumulation of features then I'm a bit dubious of using feature names for the different versions. Could this be confusing when we're several versions down the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a one-way cumulation of features. You are right the name for BONSAI_ORIGINAL_WITH_RECEIPT_COMPACTION isn't quite right as it isn't Bonsai original + receipts, it also includes variables.

What about BONSAI_VARIABLES_WITH_RECEIPT_COMPACTION?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought I think will become a mess if we just keep accumulating the changes from every version into the name. Rather just keep it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this was not easy for me, so I appreciate suggestions for improvements, perhaps we can just, name them like BONSAI_V3, and leverage the javadoc to specify that it is based on V2 plus the receipt compaction, so move the accumulation of the changes from the name to javadoc.

Note, that *_ORIGINAL are there only for reference, since it is not possible to use them anymore, so we could also remove them, or better document that are no more supported

if (runtimeVersion == BONSAI_WITH_VARIABLES || runtimeVersion == FOREST_WITH_VARIABLES) {
LOG.warn(
"Database contains compacted receipts but receipt compaction is not enabled, new receipts will "
+ "be not stored in the compacted format. If you want to remove compacted receipts from the "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ "be not stored in the compacted format. If you want to remove compacted receipts from the "
+ "be not stored in the compacted format. If you want to restore complete receipts to the "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think describing the receipts as complete bit misleading. The data missing is redundant so it's not missing anything. Not sure on the best terminology here but I was going with compacted and non-compacted.

@jframe
Copy link
Contributor Author

jframe commented Mar 18, 2024

Have been thinking about the database versioning, wanted to discuss the following options:

  1. Current state of PR: new format enabled/disabled in tandem with optional flag. Backwards compatible but not forwards compatible if a user downgraded.
  2. Update the database version regardless of the feature flag, i.e. version 3 supports --receipt-compaction-enabled behaviour even when the flag is disabled. Effectively the same as (1) but versioning is coupled to the release rather than flag.
  3. Avoid updating the database version for this: a justification could be that we introduce a convention where we only use the versioning for "schema" changes rather than changes to the data contents. Non-forwards-compatible downgrade issue remains though, once the feature has been enabled.
    I think you said the only benefit of the updated database version is ability to generate an appropriate warning message if they downgrade?

Orthogonal to the versioning...think the original plan was to enable the flag by default. Maybe this can be reconsidered now Dencun has passed. Do any exclusions apply to this (e.g. FULL sync?), or do we rely on announcement/breaking changes warning for users who require historic RPC to evaluate this feature?

Had a discussion with @siladu on these points and the outcome was to leave the PR as is using 1.

Using option 2 has the benefit of a single version and is more directly tied to the capabilities rather than the configuration of --receipt-compaction-enabled flag but if the version was updated regardless of the flag then users wouldn't be able to downgrade to the previous version of Besu.

By having this version conditional the backwards compatibility only affects users who have explicitly enabled this feature rather than everyone and give people plenty of time to be aware of the change. In a future release, the plan is to enable this by default and remove the conditional database version upgrade. At this point, we will have several versions of Besu that users can downgrade if there is an issue in the release.

@jframe
Copy link
Contributor Author

jframe commented Mar 19, 2024

Doing another sync on mainnet to ensure the receipt changes still work after receipt changes

@jframe jframe enabled auto-merge (squash) March 26, 2024 00:34
@jframe jframe merged commit 15d54af into hyperledger:main Mar 26, 2024
42 checks passed
@jframe jframe deleted the receipt-size branch March 26, 2024 05:53
jflo pushed a commit to jflo/besu that referenced this pull request Mar 26, 2024
Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: Justin Florentine <[email protected]>
amsmota pushed a commit to Citi/besu that referenced this pull request Apr 16, 2024
Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: amsmota <[email protected]>
amsmota pushed a commit to Citi/besu that referenced this pull request Apr 16, 2024
Signed-off-by: Jason Frame <[email protected]>
Signed-off-by: amsmota <[email protected]>
matthew1001 pushed a commit to kaleido-io/besu that referenced this pull request Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce Receipt Size
5 participants