Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move processing cache out of DA #5420

Merged
merged 6 commits into from
Apr 10, 2024

Conversation

dapplion
Copy link
Collaborator

Current unstable has a processing cache that tracks both blocks and blobs while they are being processed. To be specific, in the case of blobs "processing" means the time to KZG verify the proofs and insert them in the data availability overflow cache:

self.data_availability_checker
    .notify_rpc_blobs(slot, block_root, &blobs);
// < processing starts here
check_slashable_header(blobs);
let availability = self.data_availability_checker.put_rpc_blobs(block_root, blobs)?;
// < processing ends here
let r = self.process_availability(slot, availability).await;
self.remove_notified(&block_root, r)

The purpose of the cache is to:

  1. Satisfy recent requirement (late '23) to serve blocks over RPC that are gossip verified but not fully verified / imported
  2. Prevent duplicate requests on (now deleted) delayed lookup Remove delayed lookups #4992

Let's address each one for blocks and blobs

1 / blocks

Block processing can take a few hundred milliseconds due to execution validation. There must exist some cache but does not need to be tied to DA.

1 / blobs

This cache will be useful if we get a blobs by root request for a blob that we have just received which happens to be in the few milliseconds KZG proof validation takes. I can be convinced otherwise, but this use-case does not justify the complexity.

2

After deleting delayed lookup logic this argument is mostly void. In rare conditions we may end up

Benefits

The AvailabilityView abstraction spills complexity that only affects the accumulators of the availability cache and block lookups also into the processing cache.

By making the processing cache not concerned with DA Lighthouse is a bit more maintainable and easy to reason about.

@dapplion dapplion force-pushed the non-da-processing-cache branch from 3c5af19 to 0c86593 Compare March 17, 2024 14:39
@@ -305,6 +305,11 @@ impl<T: BeaconChainTypes> Critical<T> {
Ok(())
}

/// Returns true if the block root is known, without altering the LRU ordering
pub fn has_block(&self, block_root: &Hash256) -> bool {
self.in_memory.peek(block_root).is_some() || self.store_keys.get(block_root).is_some()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethDreamer is this line right, i.e. should I also check the store keys?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, the store keys are referencing the entries in the data availability cache that over overflowed to disk. So it's an extension of the availability cache really.

Side note - but once we've merged tree states we should consider getting rid of the overflow logic. It would be a nice complexity reduction.

@realbigsean
Copy link
Member

realbigsean commented Mar 17, 2024

overall I agree with your reasoning and support the change! This probably could/should have happened along with the removal of delayed lookup logic

@dapplion
Copy link
Collaborator Author

dapplion commented Apr 6, 2024

Sharing some notes justifying the change as is. We can address the de-duplication gadget on another PR

Lighthouse processing cache

Processing cache is tied to the AvailabilityView trait, complicating extension like Data Availability Sampling.

  • Do processing cache items need to implement AvailabilityView?
  • Is the processing cache really necessary?

The processing cache purposes are:

  • Prevent re-downloads of blocks and blobs (e.g. while block from gossip is doing EL verification, receive an attestation that triggers an unknown head lookup)
  • Make blocks and blobs that pass gossip validation available to ReqResp consumers

Gossip block journey

  • Receive gossip block
  • Spawn worker for process_gossip_block
    • Run process_gossip_unverified_block
    • Insert + check into duplicate_cache.check_and_insert(block_root)
    • Run process_gossip_verified_block
      • Insert into processing_cache
      • Run process_block
        • Perform execution validation (SLOW)
        • Insert into data_availability_checker
          IF AVAILABLE
          • Evict from data_availability_checker
          • Import block into fork-choice
          • Add block + blobs to early_attester_cache
          • Evict from processing_cache
    • Evict from duplicate_cache

Gossip blob journey

  • Receive gossip blob
  • Spawn worker for process_gossip_blob
    • Run chain.verify_blob_sidecar_for_gossip
    • Run process_gossip_verified_blob
      • Check if known to fork_choice_read_lock, if true ignore
      • Insert into processing_cache
      • Run KZG verification (~1ms)
      • Insert into data_availability_checker
        IF AVAILABLE
        • Evict from data_availability_checker
        • Import block into fork-choice (~4ms, *see below)
        • Add block + blobs to early_attester_cache
        • Evict from processing_cache

Unknown head attestation journey

  • Receive gossip attestation
  • Spawn worker for process_gossip_attestation
    • Early, run verify_head_block_is_known to check against
        chain.canonical_head.fork_choice_read_lock()
        .get_block(&attestation.data.beacon_block_root)

IF UNKNOWN

  • .
    • Inform sync send(SyncMessage::UnknownBlockHashFromAttestation())
    • Schedule for re-process

ON SYNC

  • Receive UnknownBlockHashFromAttestation
  • Create a new_current_lookup for block_root
  • Check block_already_downloaded
    • Checks self.da_checker.has_block(block_root)
      • Checks self.processing_cache.has_block(block_root)
  • Check blobs_already_downloaded
    • Checks processing_cache with
let Some(processing_components) = da_checker.processing_cache.get(block_root)
else { return MissingBlobs::fetch_all_for_block(block_root) };
da_checker.get_missing_blob_ids(block_root, processing_components)

Is the processing cache necessary for blobs?

For early ReqResp serving

No

The processing cache is not checked for ReqResp blob serving. And that's okay, blobs are inserted to the data_availability_checker almost immediately after being inserted into the processing cache

For de-duplication

No

If we rely only on data_availability_checker + fork-choice we lose knowledge of the blob during two instances:

  • During KZG verification (~1 ms)
  • During block import into fork-choice (~4 ms, *see below)

beacon_fork_choice_process_block_seconds metrics, most runs are instant with 1 run per epoch lasting 100ms

image

Which matches the long term average of 4ms seen below = 100ms/32

image

In the unlikely case that sync attempts to download a blob during a slow run of fork-choice block import, the worst case is to download a set of blobs.

Is the processing cache necessary for blocks?

For early ReqResp serving

Yes

duplicate_cache does not hold the block itself but only the root.

For de-duplication

No*

Blocks enjoy another cache, the duplicate_cache with the properties:

  • Blocks are inserted into duplicate_cache slightly before than the processing_cache
  • When evicted from the duplicate_cache there three outcomes:
    • Block was available and got imported into fork_choice
    • Block was not available and got inserted into data_availability_checker
    • There was an error

So duplicate_cache + data_availability_checker cover all paths of processing_cache with the exception of block import into fork-choice (same as blobs)

@realbigsean
Copy link
Member

removed a TODO that was outdated, removed the processing cache file, and updated the missing blob ids calculation to consider if we're in deneb (keeps us from requesting blobs pre-deneb)

a6cd314

@realbigsean realbigsean added ready-for-merge This PR is ready to merge. and removed work-in-progress PR is a work-in-progress labels Apr 9, 2024
@realbigsean
Copy link
Member

@Mergifyio queue

Copy link

mergify bot commented Apr 9, 2024

queue

🛑 The pull request has been removed from the queue default

Pull request #5420 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@realbigsean
Copy link
Member

@Mergifyio requeue

Copy link

mergify bot commented Apr 9, 2024

requeue

❌ This pull request head commit has not been previously disembarked from queue.

@realbigsean
Copy link
Member

@Mergifyio dequeue

Copy link

mergify bot commented Apr 9, 2024

dequeue

✅ The pull request has been removed from the queue default

@realbigsean
Copy link
Member

@Mergifyio requeue

Copy link

mergify bot commented Apr 9, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Apr 9, 2024

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Apr 10, 2024
@jimmygchen
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Apr 10, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Apr 10, 2024

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@realbigsean
Copy link
Member

@Mergifyio queue

Copy link

mergify bot commented Apr 10, 2024

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@realbigsean
Copy link
Member

@Mergifyio requeue

Copy link

mergify bot commented Apr 10, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Apr 10, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 30dc260

@mergify mergify bot merged commit 30dc260 into sigp:unstable Apr 10, 2024
29 checks passed
@dapplion dapplion deleted the non-da-processing-cache branch January 24, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants