Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize reschedule_recovered #545

Closed
jennijuju opened this issue Aug 12, 2022 · 6 comments · Fixed by #677
Closed

Optimize reschedule_recovered #545

jennijuju opened this issue Aug 12, 2022 · 6 comments · Fixed by #677
Assignees
Labels
enhancement New feature or request P2

Comments

@jennijuju
Copy link
Member

Optimize the following code block so that we dont have to loop through all sectors every time -> which is expensive on gas

self.iter_while_mut(|_epoch, expiration_set| {
let on_time_sectors: BTreeSet<SectorNumber> = expiration_set
.on_time_sectors
.bounded_iter(ENTRY_SECTORS_MAX)
.context("too many sectors to reschedule")?
.map(|i| i as SectorNumber)
.collect();
let early_sectors: BTreeSet<SectorNumber> = expiration_set
.early_sectors
.bounded_iter(ENTRY_SECTORS_MAX)
.context("too many sectors to reschedule")?
.map(|i| i as SectorNumber)
.collect();
// This loop could alternatively be done by constructing bitfields and intersecting them, but it's not
// clear that would be much faster (O(max(N, M)) vs O(N+M)).
// If faults are correlated, the first queue entry likely has them all anyway.
// The length of sectors has a maximum of one partition size.
for sector in sectors.iter() {
let sector_number = sector.sector_number;
let power = power_for_sector(sector_size, sector);
let mut found = false;
if on_time_sectors.contains(&sector_number) {
found = true;
// If the sector expires on-time at this epoch, leave it here but change faulty power to active.
// The pledge is already part of the on-time pledge at this entry.
expiration_set.faulty_power -= &power;
expiration_set.active_power += &power;
} else if early_sectors.contains(&sector_number) {
found = true;
// If the sector expires early at this epoch, remove it for re-scheduling.
// It's not part of the on-time pledge number here.
expiration_set.early_sectors.unset(sector_number);
expiration_set.faulty_power -= &power;
sectors_rescheduled.push(sector);
}
if found {
recovered_power += &power;
remaining.remove(&sector.sector_number);
}
}
expiration_set.validate_state()?;
let keep_going = !remaining.is_empty();
Ok(keep_going)
})?;

@Stebalien
Copy link
Member

Note, the comment:

// Traverse the expiration queue once to find each recovering sector and remove it from early/faulty there.
// We expect this to find all recovering sectors within the first FaultMaxAge/WPoStProvingPeriod entries
// (i.e. 14 for 14-day faults), but if something has gone wrong it's safer not to fail if that's not met.

Should say 40, not 14, because the fault max age is 40 days (due to https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0026.md).

That means we:

  1. Iterate over 40 queue entires.
  2. Iterate over all the sectors being recovered (often all the sectors in the partition).

Which isn't great.


What we should do is:

  1. From the sector infos (sectors), we should group the sectors by quantized expiration.
  2. For each entry in the expiration queue, only compare on_time_sectors with the sectors known to expire at that epoch. Possibly using a bitfield intersection if faster.
  3. For each entry, only compare early_sectors if non-empty (we expect one big early_sectors bitfield because corresponding to a single expiration). Ideally, we'd do this with a bitfield intersection as well (if faster).

@jennijuju jennijuju moved this to Todo / In Scope in Network nv17 Aug 12, 2022
@jennijuju jennijuju moved this from Todo / In Scope to In Progress in Network nv17 Aug 25, 2022
@Stebalien
Copy link
Member

An additional optimization: Compact these queues. Currently, we associate quantized epochs with values (i.e., epochs rounded up to the next deadline). Instead, we index by ceil(EPOCH/EPOCHS_PER_DEADLINE) which would reduce the number of objects we'd need to read/write when messing with this queue.

macro-ss added a commit to macro-ss/builtin-actors that referenced this issue Sep 1, 2022
macro-ss added a commit to macro-ss/builtin-actors that referenced this issue Sep 1, 2022
@macro-ss
Copy link
Contributor

macro-ss commented Sep 1, 2022

Here is the patch for this case basically following up the comments except the grouping and additional comments, and test passed with "cargo test reschedule_recover_restores_all_sector_stats -- --nocapture", Pls. help review it,
d5e46c0
I have found the group_expiration_set in the same file, but can't sure to do that, because which ignored the early_sectors queue.

@marco-storswift
Copy link

builtin-actors/actors/miner/src/expiration_queue.rs

Lines 394 to 396 in ac3d3e8
the power value should be inited in found condition is meet

@anorth
Copy link
Member

anorth commented Oct 2, 2022

@Kubuxu did you do something here? Could you link it to this issue? Shall we close it?

@jennijuju
Copy link
Member Author

close by the #677

Repository owner moved this from In Progress to Done in Network nv17 Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P2
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants