Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(era_manager): add era support + pre-fetching #1375

Merged
merged 8 commits into from
Aug 29, 2024

Conversation

KolbyML
Copy link
Member

@KolbyML KolbyML commented Aug 13, 2024

What was wrong?

  • we need to support era files to contain executing to HEAD - 8192
  • downloading era files and recovering senders takes a lot of time, prefetch the next era file to execute in the background

How was it fixed?

  • I implemented era support
  • I added prefetching era files
  • I made it so we recover the transaction senders on prefetch, so executing transactions isn't blocked by it

@KolbyML KolbyML self-assigned this Aug 13, 2024
@KolbyML KolbyML requested a review from morph-dev August 13, 2024 02:09
@KolbyML KolbyML force-pushed the add-era-support-to-era-manager branch 6 times, most recently from 06190da to 6c543f2 Compare August 21, 2024 16:56
const ERA_DIR_URL: &str = "https://mainnet.era.nimbus.team/";

/// Fetches era file download links
pub async fn get_era_file_download_links(http_client: &Client) -> anyhow::Result<Vec<String>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Are we sure there is no api that we can use that returns better structured content (e.g. json), instead of parsing HTML ourself?

nit: Maybe return HashMap<u64, String>, where key is the epoch number (keep the logic of processing this file in one place)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Are we sure there is no api that we can use that returns better structured content (e.g. json), instead of parsing HTML ourself?

Not that I am aware of

nit: Maybe return HashMap<u64, String>, where key is the epoch number (keep the logic of processing this file in one place)?

sure

@@ -80,6 +80,25 @@ impl Era {
Ok(era_state.state)
}

/// Iterate over beacon blocks.
pub fn iter_blocks(raw_era1: Vec<u8>) -> impl Iterator<Item = CompressedSignedBeaconBlock> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return iterator over Result<CompressedSignedBeaconBlock>.

I know it doesn't look that good, but it is a standard way of doing things (as far as I can tell), for example: std::io::Lines implements Iterator<Item=Result<String, std::io::Error>>

@@ -80,6 +80,25 @@ impl Era {
Ok(era_state.state)
}

/// Iterate over beacon blocks.
pub fn iter_blocks(raw_era1: Vec<u8>) -> impl Iterator<Item = CompressedSignedBeaconBlock> {
let file = E2StoreMemory::deserialize(&raw_era1).expect("invalid era1 file");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused here. Are these era or era1 files? Why are we processing era1 files in era.rs?

.expect("missing block index entry")
.slot_index;

let mut next_slot = block_index.starting_slot;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this logic is already implemented at line 52. Maybe extract it into separate function?

Also question about this logic, because I'm not sure I have all the knowledge and I would like to make sure I understand it. We can't always increment next_slot by one because slot can be skipped. And we need to know slot index in order to figure out the fork (in order to parse block correctly). What happens if the slot that is right before the new fork is skipped? Wouldn't this logic fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic can only break during the transition to the next fork. This is a best effort approach. Currently it should work for the 3 forks we are supporting, but essentially because we don't know what the next epoch is we have to guess, every time a new fork happens we just have to configure it to work, as era files shouldn't change I would think at least.

What happens if the slot that is right before the new fork is skipped?

This shouldn't matter as long as we fix it within a 6 months window of the fork activating. The reason being is we can get the last 6 months of blocks from the CL itself.

So even if we get it wrong when a new fork is launched, we just have to fix it within 6 months so not the biggest deal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I understand the structure of the era file better, there is proper way of doing this, by using information from the SlotIndexBlockEntry. The indices field can tell us which slots are present and which ones are not (value 0 means it's absent).

I think something like this would work (but it would be good to be tested first):

block_index.indices
    .iter()
    .enumerate()
    .filter_map(|(i, index)| if index != 0 { Some(block_index.starting_slot + i) } else { None })
    .zip_eq(file.entries)  // zip_eq from itertools crate
    .map(|(slot, entry)| {
        let fork = get_beacon_fork(slot);
        CompressedSignedBeaconBlock::try_from(&entry, fork)
    })
    .collect::<Result<Vec<_>, _>>()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation I did is a little different then the suggestion, but it follows the same idea

@@ -174,6 +174,17 @@ impl SignedBeaconBlock {
SignedBeaconBlock::Deneb(block) => block.message.slot,
}
}

/// Returns execution block number.
pub fn execution_block_number(&self) -> u64 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it feels a bit weird to be this specific and return the execution block number directly from SignedBeaconBlock. Maybe make this one returns ExecutionPayload and caller can get block number directly from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make us do a match on ExecutionPayload every time we want to get the block number, because each fork has its own ExecutionPayload type.

match executionpayload {
bellatrix => number
capella => number
deb => number
}

I think the current implementation is the cleanest, but if this is a major concern we can discuss it more

async fn fetch_era_file(&mut self, block_number: u64) -> anyhow::Result<ProcessedEra> {
info!("Fetching the next era file");

// If the current is already downloaded, return it, else download the current one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find this comment a bit confusing in regard to the logic. Shouldn't it be something like:
If the next_era is already downloading, then wait and return it. Otherwise, download it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, I tried to rewrite the comment to make it more clear

.expect("to be able to download compressed beacon block");
let entry = Entry::deserialize(&compressed_beacon_block)?;

// download slot_index_state to get the starting slot so we get the correct fork
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked more into spec, and I believe that everything after this point is unnecessary and actually wrong.
Here are stuff that I found:

  1. We don't need to fetch Content-Length header first, because according to Range documentation we can use Range: bytes=-32 to fetch last 32 bytes (I tried it and it works)
  2. slot_index_state.slot_index.starting_slot (line 133) will give us slot number of the CompressedBeaconState entry which corresponds to the slot after executing all blocks for this era file, so we would use wrong fork for decoding first block of this era file (when fork happens in the middle of era file)
  3. (and most importantly) I believe that we can calculate slot number that belongs to the first block of era file directly! If I'm not mistaken, each era file has exactly 8192 slots (but it could be less blocks). The mainnet-00000-... has no slots/blocks and is the exception. So if we downloading first block from the era file mainnet-xxxxx-... the slot number of the first block should be (x-1)*8192, which we can use to figure out the fork and decode the first block.
    • there is small caveat to this. If enough blocks are skipped at the start of era (and blocks don't actually exist), we might now be in the following fork and this would fail (but we don't have to optimize for this case)

So all in all, I believe that we only need to pass the era epoch number, download first block (as you did above), calculate the slot number and fork based on era epoch number, and decode the block.

Err(anyhow::anyhow!("Block not found in any era file"))
}

fn does_era_contain_block(era: &Era, block_number: u64) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just check if block_number is in the range of execution_block_number of first and last block? Also, should this maybe be in era.rs file?

.expect("to be able to parse epoch index");

while start_epoch_index <= end_epoch_index {
let mid = end_epoch_index + (start_epoch_index - end_epoch_index) / 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's probably cleaner if this would be:
let mid = (start_epoch_index + end_epoch_index) / 2;

I have an idea on how this can be optimized (meaning we can guess mid point more accurately), but we can leave that for a followup pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That matters only if start_epoch_index + end_epoch_index can go overflow, which shouldn't happen in our case.

I'm fine with your approach as well, no big deal. But, I think you also didn't do it correctly. I think variables are u64 so your formula should panic, right? Shouldn't it be start + (end - start) / 2?

let mid_block_number = mid_block.block.execution_block_number();

// minimize the worse case by fetching the first block number of the after mid .era
let mid_plus_one_block = EraBinarySearch::download_first_beacon_block_from_era(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work if epoch we are looking for is the last epoch, or later (epoch not yet created).

But I think that we can avoid looking for mid_plus_one every loop iteration. For example, we can simply check where block_number is in relationship to mid_block_number and mid_block_number+8192, something like this:

(start_epoch_index, end_epoch_index) = match block_number {
    ..=(mid_block_number - 1) => (start_epoch_index, mid - 1),
    (mid_block_number + 8192).. => (mid + 1, end_epoch_index),
    _ => {
        // mid is a likely candidate, but we are not sure
        // we should peek at the following epochs until we are sure
        let mut target = mid;
        while target < <last_epoch> {
            let next_block = download_first_beacon_block_from_era(target + 1);
            if next_block.block.execution_block_number() < block_number {
                // target is our epoch
                todo!("download and return target")
            } else {
                target += 1;
            }
        }
        // we are at the last epoch. we should download it and try it
        // note: this can probably be optimized by downloading the SlotIndexBlock and verifying,
        // but we can ignore that optimization
        todo!("download and try")
    },
}

This is just an example. I'm sure it can be made a bit more readable and maybe some functionality can be extracted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that I have to work around the issue when mid_block_number == 0 (first epoch with execution block). I think this can be as simple as

if mid_block_number == 0 {
    mid_block_number = MERGE_BLOCK;
}

Copy link
Member Author

@KolbyML KolbyML Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work if epoch we are looking for is the last epoch, or later (epoch not yet created).

For blocks from the last 6 months we should just be able to get them from the CL client. So if we are at the last epoch we can just throw an error to say continue from CL client

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning error is fine. But it will panic with the current code when mid is equal to the last epoch (which is valid case). It will panic here:

let mid_plus_one_block = EraBinarySearch::download_first_beacon_block_from_era(
    era_links[mid as usize + 1].clone(),
    http_client.clone(),
)

because mid as usize + 1 is non-existing index of era_links.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a if case to through an era in this case

@KolbyML KolbyML force-pushed the add-era-support-to-era-manager branch from d5d8849 to 2ce7d59 Compare August 24, 2024 23:37
@KolbyML
Copy link
Member Author

KolbyML commented Aug 24, 2024

@morph-dev ready for another look

@KolbyML KolbyML requested a review from morph-dev August 26, 2024 03:47
@@ -49,12 +49,28 @@ impl Era {
let slot_index_state = SlotIndexStateEntry::try_from(&file.entries[entries_length - 1])?;

// Iterate over the block entries. Skip the first and last 3 entries.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this comment is no longer relevant. if you want, put it down close to ensure! to explain why there is - 4, but rephrase it in that case

let block_index =
SlotIndexBlockEntry::try_from(&file.entries[entries_length - 2])?.slot_index;

let slot_indexes = block_index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is repeated functionality, maybe it can be extracted into separate function?

const ERA_DIR_URL: &str = "https://mainnet.era.nimbus.team/";

/// Fetches era file download links
pub async fn get_era_file_download_links(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that there is much difference between this and the function above, other than:

  • they are using different url
  • era1 version is checking that length matches and is shuffling them at the end

I would make separate function that accepts url as parameter, fetches them and returns Result<HashMap<u64, String>>.

Then I would:

  • create 2 functions: get_era_files and get_era1_files that call it and return the same type (era1 would also check the length)
  • get_shuffled_era1_files would call get_era1_files

(epoch_index, format!("{ERA_DIR_URL}{href}"))
})
.collect();
Ok(era_files)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what do you think about checking that keys are starting from 0 and consecutive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean do you want me to loop from 0 to max era count and see if all exist?

Because since we are using a hashmap, the the values are stored in the positions of the hash of the keys which wouldn't be in sorted order, if we were to iterate over the buckets in order

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think loop and checking is simple enough. Something like:

ensure!(
    (0..(era_files.len()).all(|epoch| era_files.contains(&epoch)),
    "Epoch indices are not starting from zero or not consecutive",
);

You can also calculate min and max and check if they are as desired (this still requires looping, so no saving performance wise).

})
}

pub async fn download_raw_era(era_path: String, http_client: Client) -> anyhow::Result<Vec<u8>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to not be era specific, it just downloads files with retry logic. Maybe rename function and move it to more general place (e.g. trin-utils)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to limit the amount of workspace crates trin-execution pulls from in the case it becomes its own project.

I am not aware why trin would want to download a file from http.

Also ideally this retry logic is geared for downloading era files

.expect("to be able to parse epoch index");

while start_epoch_index <= end_epoch_index {
let mid = end_epoch_index + (start_epoch_index - end_epoch_index) / 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That matters only if start_epoch_index + end_epoch_index can go overflow, which shouldn't happen in our case.

I'm fine with your approach as well, no big deal. But, I think you also didn't do it correctly. I think variables are u64 so your formula should panic, right? Shouldn't it be start + (end - start) / 2?

.expect("to be able to download compressed beacon block");
let entry = Entry::deserialize(&compressed_beacon_block)?;

let slot_index = (era_index - 1) * 8192;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you have this constant in era1.rs.
You can even extract this as mini function there (e.g. pub fn start_slot_index(...))

block_number: u64,
) -> anyhow::Result<ProcessedEra> {
let era_links = get_era_file_download_links(&http_client).await?;
let last_epoch_index = era_links.len() - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe move last_epoch_index few lines below: let last_epoch_index = end_epoch_index;

http_client: Client,
block_number: u64,
) -> anyhow::Result<ProcessedEra> {
let era_links = get_era_file_download_links(&http_client).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we know the first execution block that is in era file, right?
I would do a check that block_number is not less than that.

let mid_block_number = mid_block.block.execution_block_number();

if mid + 1 > last_epoch_index as u64 {
return Err(anyhow::anyhow!("Block not found in any era file"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't true. It can be in the mid_block, and we could return it.

@KolbyML
Copy link
Member Author

KolbyML commented Aug 27, 2024

@morph-dev ready for another look

Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. Few, minor comments, mostly in EraManager.

})
.collect::<Vec<u64>>();

// an era file should has 4 entries which are not blocks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// an era file should has 4 entries which are not blocks
// an era file has 4 entries which are not blocks


pub async fn get_era1_files(http_client: &Client) -> anyhow::Result<HashMap<u64, String>> {
let era1_files = download_era_links(http_client, ERA1_DIR_URL).await?;
ensure!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: similar to era files, we can also check that all keys are in the range of (0..ERA1_FILE_COUNT)

current_era: Option<ProcessedEra>,
next_era: Option<JoinHandle<ProcessedEra>>,
http_client: Client,
era1_files: Vec<String>,
}

impl EraManager {
pub async fn new() -> anyhow::Result<Self> {
pub async fn new(starting_block: u64) -> anyhow::Result<Self> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I no longer like the idea of the get_current_block(). It seems to be used only in the case of the starting block. Also, I don't think constructor should be async and blocking. Instead, I think we can:

  1. Constructor would leave self.current_era as None, but would start fetching required era (based on starting_block) and store its handle in self.next_era.
  2. Remove get_current_block
  3. The first call to get_next_block would return block with index starting_block (that is passed in the constructor)
  4. Rename self.current_block_number to self.next_block_number (also rename getter).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I no longer like the idea of the get_current_block(). It seems to be used only in the case of the starting block.

Well none of this logic is used in this PR, I will remove it for now and if we need it in the future we can add it back

current_era: Option<ProcessedEra>,
next_era: Option<JoinHandle<ProcessedEra>>,
http_client: Client,
era1_files: Vec<String>,
}

impl EraManager {
pub async fn new() -> anyhow::Result<Self> {
pub async fn new(starting_block: u64) -> anyhow::Result<Self> {
let http_client: Client = Config::new()
.add_header("Content-Type", "application/xml")
.expect("to be able to add header")
.try_into()?;
let era1_files = get_shuffled_era1_files(&http_client).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we should probably use get_era1_files and keep it as hashmap. This would remove looking up the right url from below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we should probably use get_era1_files and keep it as hashmap. This would remove looking up the right url from below.

my concern is what if a new epoch is added, but I guess it doesn't matter since we will continue syncing from the running CL client anyways, so it doesn't matter if we have a few new era files or not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking of storing era_files, just era1_files like before. But store them as HashMap.

With that being said, I'm fine if we also store era_files as well, since they are not frequently updated and as you said we can rely on CL clients for the most recent blocks anyway.

Additionally, we can store era_files and update them only when we need something more recent than the one we have, but we can leave this for future PR (maybe create an issue to track this).

Ok(processed_era.get_block(self.current_block_number))
}

async fn fetch_era_file_link(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of passing block_number, we could pass EraType instead. That would make the meaning of epoch_index a bit more explicit.

It can be simplified if self.era1_file is hash map (which it can be).

Also, this function can panic, and I don't think this is desired. Especially if we are trying to fetch era that doesn't exist yet (e.g. we are processing latest Era file, we want to start pre-processing next one, which doesn't exist, so we panic). So I would make it return Result<Option<String>> and handle it outside.

};

// Download the next era file
let mut next_epoch_index = current_era.epoch_index + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this whole block (lines 108:117) a bit hard to read and understand. I think we can make it very simple:

let mut next_era_type = current_era.era_type;
let mut next_epoch_index = current_era.epoch_index + 1;
// Handle transition from era1 to era
if next_era_type == EraType::Era1 && next_epoch_index == ERA1_FILE_COUNT {
    next_era_type = EraType::Era;
    next_epoch_index = FIRST_ERA_EPOCH_WITH_EXECUTION_PAYLOAD;
}
let next_era_path = self.fetch_era_file_link(next_era_type, next_epoch_index).await?;

@KolbyML
Copy link
Member Author

KolbyML commented Aug 29, 2024

@morph-dev ready for another review

@KolbyML KolbyML requested a review from morph-dev August 29, 2024 02:55
Copy link
Collaborator

@morph-dev morph-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great! Few more smaller comments. Thank you for the patience

:shipit:

current_era: Option<ProcessedEra>,
next_era: Option<JoinHandle<ProcessedEra>>,
http_client: Client,
era1_files: Vec<String>,
}

impl EraManager {
pub async fn new() -> anyhow::Result<Self> {
pub async fn new(starting_block: u64) -> anyhow::Result<Self> {
let http_client: Client = Config::new()
.add_header("Content-Type", "application/xml")
.expect("to be able to add header")
.try_into()?;
let era1_files = get_shuffled_era1_files(&http_client).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking of storing era_files, just era1_files like before. But store them as HashMap.

With that being said, I'm fine if we also store era_files as well, since they are not frequently updated and as you said we can rely on CL clients for the most recent blocks anyway.

Additionally, we can store era_files and update them only when we need something more recent than the one we have, but we can leave this for future PR (maybe create an issue to track this).

self.current_era.as_ref().expect("current_era to be some")
}
_ => {
let current_era = self.fetch_era_file().await?;
self.current_era.insert(current_era)
}
};
Ok(processed_era.get_block(self.current_block_number))
}
let block = processed_era.get_block(self.next_block_number);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It should, but I think we can't be sure that processed_era actually contains the next_block_number (because of the _ => { case above).

It's better to check than panic (in my opinion), so I would add ensure! just before this line.

let era_path = self.fetch_era_file_link(block_number, epoch_index).await?;
let raw_era1 = download_raw_era(era_path, self.http_client.clone()).await?;
let Some(era1_path) = era1_files.get(&epoch_index).cloned() else {
return Err(anyhow::anyhow!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can replace return Err(anyhow::anyhow!( with anyhow::bail!(

@@ -370,6 +370,8 @@ mod tests {
let raw_era1 = fs::read("../test_assets/era1/mainnet-00000-5ec1ffb8.era1").unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using both raw file and EraManager (it should be one or the other).

I would say we shouldn't use EraManager in tests if we can avoid it (because EraManager would download from public sources).

I think we can obtain ProcessedEra directly from raw_era1, and remove usage of EraManager in this test.

era::utils::process_era1_file(raw_era1, /* epoch_index= */ 0)

EraType::Era1 => self.era1_files.get(&next_epoch_index).cloned(),
EraType::Era => self.era_files.get(&next_epoch_index).cloned(),
}) else {
return Err(anyhow::anyhow!("Unable to find next era file's path: index {next_epoch_index} type {next_era_type:?}"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should return error here. This is valid thing to happen if we are processing most recent era file.

I would say, we have two approaches:

  1. pass next_era_path as Option<String> to the spawn task below, which would return error
    • or maybe even better, in case of an EraType::Era, it can try to download most recent era_files
  2. if next_era_path is None, skip next few lines and leave self.next_era as empty (and update message at line 89)

I think I like option 1 more, but up to you.

Ok(era_files[&(epoch_index)].clone())
}
}
Ok(block)
}

async fn fetch_era_file(&mut self) -> anyhow::Result<ProcessedEra> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this function name doesn't make much sense to me anymore. I would rename it to something like get_next_processed_era and add comment describing what it actually does (getting processed era, and start processing next one).

@@ -101,30 +86,29 @@ impl EraManager {
let current_era = self.next_era.take();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find this naming a bit confusing. How about:

let processed_era = match self.next_era.take() {
    Some(era_handle) => era_handle.await??,
    None => bail!("The next_era handle can't be None"),
};

Or at least new_current_era if you don't like processed_era?

Also, comment above this line is no longer valid.

let era1_files = era_manager.era1_files.clone();
let http_client = era_manager.http_client.clone();
let join_handle = tokio::spawn(async move {
Self::init_current_era(era1_files, http_client, next_block_number).await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It is a bit weird to call function init_current_era and then assign it to the self.next_era (while we have self.current_era). How about we rename function to something like fetch_and_process_era_by_block_number()?

smallest nit: up to you, I don't think join_handle adds much to readability, so I would just have

era_manager.next_era = Some(tokio::spawn(async move {

or at least rename join_handle to next_era or next_era_handle.

@KolbyML KolbyML merged commit e5f9020 into ethereum:master Aug 29, 2024
8 checks passed
@KolbyML KolbyML deleted the add-era-support-to-era-manager branch January 22, 2025 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants