Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Dynamically adjust proposal block ranges #1139

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

pxrl
Copy link
Contributor

@pxrl pxrl commented Jan 5, 2024

This commit introduces dynamic adjustment of the block ranges that the proposer will propose. The intention of this change is to make the proposer more robust to incomplete or inconsistent data being supplied by one or more RPC providers.

Dynamic adjustment is enacted in the event that any of the following conditions are detected:

  • An invalid fill traces to a missing deposit (no deposit found for the specified depositId).
  • A deposit is known to be filled, without any associated FilledRelay event.

In the first case, this results in narrowing of the block range on both the origin and destination chains. For the latter, only the destination chain is narrowed.

Some additional comments on why this approach was chosen:

  • It's minimally invasive to the existing dataworker proposal flow, and it's ++easy to disable in the event that any issue arises in prod. Ideally we'd refactor in favour of progressively incrementing the block ranges as deposits and fills are validated, rather than attempting to propose over the entire range and then winding back in case of issue. That would however require a much larger refactor, and there's no appetite for that given the pending v3 changes.

Closes ACX-1767

This commit introduces dynamic adjustment of the block ranges that the
proposer will propose. The intention of this change is to make the
proposer more robust to incomplete or inconsistent data being supplied
by one or more RPC providers.

Dynamic adjustment is enacted in the event that any of the following
conditions are detected:
 - An invalid fill traces to a missing deposit (no deposit found for
   the specified depositId).
 - A deposit is known to be filled, without any associated FilledRelay
   event.

In the first case, this results in narrowing of the block range on both
the origin and destination chains. For the latter, only the destination
chain is narrowed.
Copy link
Contributor Author

@pxrl pxrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding some context.

src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
test/Dataworker.proposeRootBundle.ts Show resolved Hide resolved
src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
Copy link

linear bot commented Jan 5, 2024

@pxrl
Copy link
Contributor Author

pxrl commented Jan 5, 2024

fwiw am aware of a single test failure associated with this change. It's not obvious to me why it's failing at the moment; I will dig into it shortly.

Update - resolved here: ccb5df0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nb. In general, this new feature probably warrants some additional test cases to stress out the implementation. This is especially important as it's not possible to backtest the change against previous bundles.

Copy link
Member

@nicholaspai nicholaspai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on first read through

src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
src/dataworker/Dataworker.ts Show resolved Hide resolved
src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
// - Narrow the destination block range to exclude the invalid fill.
allInvalidFills
.filter(({ code }) => code === InvalidFill.DepositIdNotFound)
.forEach(({ fill: { depositId, originChainId, destinationChainId, blockNumber } }) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sort these by block number by chain to allow us to exit the following map earlier?

Copy link
Contributor Author

@pxrl pxrl Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally confident on this, but I'm not sure that we can actually simply sort and exit earlier. This is because we need to avoid making assumptions about the ordering of fills vs. deposits.

For example, the first invalid fill on a destination chain might correspond to the last missing deposit on the origin chain. Then, some later invalid fill on the destination chain might actually fill an earlier missing deposit on the origin chain. This also seems more likely to occur in the case that the RPCs are serving inconsistent data.

If we group/sort by destinationChainId and destination block number, we might not narrow the originChainId block range correctly. If we group/sort originChainId and origin block number, we might not narrow the destinationChainid block range correctly.

In general, even when it's really bad, the number of missing deposits we typically see is about 20 - 30. In order to identify both the earliest block for both origin and destination chains we might end up looping multiple times over all of the invalid fills. In the case where we only have tens missing events, it seems like it might be cheaper and simpler to just process them one by one.

Full disclosure: I might have overlooked something really obvious here. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point, that the latest deposit might match with the earliest fill and vice versra

src/dataworker/Dataworker.ts Outdated Show resolved Hide resolved
Comment on lines +510 to +513
const previousDeposit = originSpokePoolClient
.getDepositsForDestinationChain(destinationChainId)
.filter((deposit: DepositWithBlock) => deposit.blockNumber < blockNumber)
.at(-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the deposit is from a gap between spoke pool deployments or something? So there actually is no deposit for this id? Will this always return undefined in that case, which would mean no change?

If there are edge cases like that that are handled in a subtle way here, we may want to add a brief comment explaining them.

Copy link
Contributor Author

@pxrl pxrl Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SpokePoolClient only has the ability to handle a single SpokePool deployment, so in the case that we don't find any preceding deposit then previousDeposit resolves to undefined. In this case, the updateEndBlock() helper detects that the updated endBlock is undefined and implements a soft-pause of the chain by proposing over the previous endBlock (as we do for Boba).

I'll update the comment on line 509 to clarify this.

There's arguably a scenario where the first deposit on a new chain goes missing. This would implicitly result in proposing over the range [0,0] for that chain because the HubPoolClient supplies previousEndBlock 0 in that scenario. I think this is an extremely remote probability, and activating a new chain requires multiple responsible adults to test the deployment and monitor the proposal, so I'm probably not inclined to handle it explicitly in the code. wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would a bundle block range look like for a new spoke pool address? whether it be a new chain or an updated spoke pool address for an existing chain?

Copy link
Contributor Author

@pxrl pxrl Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bundlEndBlock is sourced from HubPoolClient.getLatestBundleEndBlockForChain(), so we inherit its behaviour. For a new deployment on an existing chain, we're bound to continue from the previous bundleEndBlock. This should work as expected.

For a new chain, the chainId index isn't found in the previous bundle, so we default to proposing from 0. This is also as was the case for the activation of zkSync and Base - so we inherit the existing behaviour.

The only change that I can foresee here is that if that initial proposal contains invalid fills due to missing deposits on the new chain, or fills for missing deposits on another chain, then we'd revert to proposing over [0,0]. I'm not sure whether that's a problem in itself, but it's a pretty extreme edge case because the number of deposits and fills for the new chain in the initial proposal are likely to be very low. and we'd detect it immediately because activating a new chain implies manual review of the proposal block ranges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 0,0 case is remote and doesn't cause an obvious problem, so I wouldn't worry about handling it explicitly.

This all makes sense. I think a comment that says something like:

This can return undefined if there is no known preceding deposit in range.
That will result in the chain being soft-paused by setting the endBlock equal to the previous endBlock, making this bundle cover no blocks on that chain.

Would be really helpful to the reader.

Suggested by Matt.

Co-authored-by: Matt Rice <[email protected]>
if (blockNumber > updatedBlockRanges[originChainId][1]) {
return false; // Fill event is already out of scope due to previous narrowing.
}
return allValidFills.find(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a spoke pool function or map lookup to get the fill for a deposit rather than looping through all fills on each deposit?

We store by a special hash in the client to avoid this n^2 loop (which has caused speed issues in the past).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have backed that entire section out (for now) per this thread with Nick: #1139 (comment).

Here's the change: 093d80b

I did consider accessing these via a map, but tentatively decided against it because I don't think this will actually use much CPU time in practice. Arriving here is conditional on at least one deposit is detected as filled, despite no known FilledRelay event. Then, we iterate over the àllValidFills` array once per instance of "filled-but-missing" event.

So in the normal case it costs nothing, but it does cost CPU in the bad case where a chain/RPC is misbehaving. However, because we've already filtered on the missing deposits and have narrowed the block range accordingly, we should skip over most of those missing fill events anyway. So in practice I think the impact of this would be fairly limited, even though it is technically inefficient to search the allValidFills array in this way. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, it sounds like you're saying this loop will be run rarely so optimization isn't a concern. I wasn't aware of that when I initially read the code, but will take another pass.

const [originalStartBlock] = blockRanges[idx];
return (
(endBlock > startBlock && startBlock === originalStartBlock) ||
(startBlock === endBlock && startBlock === originalStartBlock - 1) // soft-pause
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the new start block be one behind the original start block in this case? And is it okay to leave it that way?

Copy link
Contributor Author

@pxrl pxrl Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originalStartBlock is normally previousEndBlock + 1, so if startBlock === endBlock is true then we're implementing a soft-pause on the chain, and will propose over the range of [previousEndBlock, previousEndBlock]. This is consistent with the way that chains are "hard-paused" (i.e. disabled via the ConfigStore), so this check is simply verifying the invariant.

To be more correct, rather than assuming that previousEndBlock === originalStartBlock - 1, this check should resolve the previous ending block via HubPoolClient.getLatestBundleEndBlockForChain().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. That makes sense.

pxrl and others added 4 commits January 10, 2024 07:08
Proceeding with the initial "invalid fills" change alone. The subsequent
change can be re-introduced later.
@pxrl pxrl requested review from nicholaspai and mrice32 January 17, 2024 11:12
const destSpokePoolClient = spokePoolClients[destinationChainId];
[startBlock, endBlock] = updatedBlockRanges[destinationChainId];

if (blockNumber <= endBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wouldn't blockNumbers for invalid fills always be within the bundle block range by definition when calling loadData?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, because we source endBlock from updatedBlockRanges, so it is subject to being iteratively updated within this loop. So if we are examining a fill where the blockNumber has already been excluded by narrowing then we should just skip over that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified here: 7559948

Copy link
Member

@nicholaspai nicholaspai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new implementation is a lot more clear and therefore I left more targeted questions

pxrl and others added 3 commits January 17, 2024 19:47
@pxrl pxrl requested a review from nicholaspai January 17, 2024 19:47
Copy link
Member

@nicholaspai nicholaspai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with merging this, closely monitoring the data worker proposals for a few days

OR

adding some more unit tests before we activate this.

We are probably safe to launch this and in the worst case, we'll self-dispute since the disputer does not have this narrowBlockRanges logic. And AFAICT the disputer's logic to construct the block ranges from getWidestPossibleBlockRanges hasn't changed

Copy link
Contributor

@mrice32 mrice32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments! Sorry, this logic is pretty nuanced

@@ -289,7 +295,8 @@ export class BundleDataClient {
if (historicalDeposit.found) {
addRefundForValidFill(fill, historicalDeposit.deposit, blockRangeForChain);
} else {
allInvalidFills.push(fill);
assert(historicalDeposit.found === false); // Help tsc to narrow the discriminated union type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is annoying -- ts really should not require runtime code to force it to get that it must be this type. It won't compile without this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC @pxrl what is the error that is thrown?


// Find the previous known deposit. This may resolve a deposit before the immediately preceding depositId.
const previousDeposit = originSpokePoolClient
.getDepositsForDestinationChain(destinationChainId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are guaranteed to be sorted, which might mean this deposit would be too early, right?

Comment on lines +510 to +513
const previousDeposit = originSpokePoolClient
.getDepositsForDestinationChain(destinationChainId)
.filter((deposit: DepositWithBlock) => deposit.blockNumber < blockNumber)
.at(-1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 0,0 case is remote and doesn't cause an obvious problem, so I wouldn't worry about handling it explicitly.

This all makes sense. I think a comment that says something like:

This can return undefined if there is no known preceding deposit in range.
That will result in the chain being soft-paused by setting the endBlock equal to the previous endBlock, making this bundle cover no blocks on that chain.

Would be really helpful to the reader.

Comment on lines +487 to +490
assert(
endBlock < currentEndBlock,
`Invalid block range update for chain ${chainId}: block ${endBlock} >= ${currentEndBlock}`
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these updates can come in out of order, I would expect this case to happen, no?

I find a missing deposit at block 7, then later I see a missing deposit at block 12.

Maybe I'm missing the logic that prevents this.

@@ -263,10 +266,10 @@ export class Dataworker {
* of log level
* @returns Array of blocks ranges to propose for next bundle.
*/
_getNextProposalBlockRanges(
async _getNextProposalBlockRanges(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function only has one await and it's at the very last line. Thoughts on removing this async and returning the promise generated by the call to this.narrowProposalBlockRanges(blockRangesForProposal, spokePoolClients)

blockRanges: number[][],
spokePoolClients: SpokePoolClientsByChain,
logData = true,
isUBA = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this need to be changed for the UBA removal changes?

Copy link
Contributor

@james-a-morris james-a-morris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants