Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick batcher fixes #344

Merged
merged 10 commits into from
Mar 6, 2025
Merged

Conversation

karlb
Copy link

@karlb karlb commented Mar 4, 2025

Cherry-pick the fixes from op-batcher/v1.11.2, op-batcher/v1.11.3, op-batcher/v1.11.4.

I don't think we actually need the changes, but they were easy enough to cherry-pick and I don't think they break anything.

Replaces #338
Fixes https://github.com/celo-org/celo-blockchain-planning/issues/927

@karlb karlb force-pushed the karlb/cherry-pick-batcher-fixes branch from 8c8da81 to d815d81 Compare March 4, 2025 13:42
@karlb karlb marked this pull request as ready for review March 4, 2025 14:14
@seolaoh
Copy link

seolaoh commented Mar 4, 2025

Could you take a look at this release as well? It's released 4 hours ago and also related to batcher bug :(

@palango
Copy link
Collaborator

palango commented Mar 4, 2025

@karlb karlb force-pushed the karlb/cherry-pick-batcher-fixes branch from d815d81 to a8b1dcb Compare March 4, 2025 15:09
@karlb
Copy link
Author

karlb commented Mar 5, 2025

I changed this to include the changes from the last three batcher releases. Our TestBatcher_FailoverToEthDA_FallbackToAltDA test fails at the moment. Apart from that, it looks fine.

@palango
Copy link
Collaborator

palango commented Mar 5, 2025

What do we do about the failing test?

@karlb
Copy link
Author

karlb commented Mar 5, 2025

Bisecting says 070ab3d is the problem. I'll still have to find out why. EDIT: This is misleading, the failure is flaky. After bisecting with a higher number of reruns, f463477 looks like the more likely culprit.

OlgaYang and others added 10 commits March 5, 2025 18:20
…esPerChannel` (ethereum-optimism#14310)

* fix op-batcher pack over MaxRLPBytesChannel

* add test cases from different CompressionAlgo

* add fresh compression logic and corresponding comments

* refactor: enhance MaxRLPBytesPerChannel test

* refactor: rename variable and add required messages
…ch publishing into separate goroutines (ethereum-optimism#14244)

* op-batcher: overhaul throttling, reading and writing loops

* no longer evaluate throttling conditions on a ticker
* break main loop into "reading" and "writing"
* reading loop signals to throttling and writing when ALL blocks are loaded (this could be done in a more fine grained way, even when each block is loaded)
* these run concurrently

* improvements

* readloop unblocks writing loop once, and then writing loop goes forever (pausing when it runs out of data)

tests pass but timeout due to bad shutdown

* rename to prevent shadowing

* allow for receipt handling when receipt and err are both nil

* split shutdownCtx into producers and consumers

* throttling loop ranges over a channel, does not take a context

* processReceiptsLoop ranges over a channel, does not take a context

* unify wait groups

* unify contexts

* writeLoop uses a ticker instead of a sleep

* WIP: add throughput test for batcher

* make txQueue local, not global state

* reduce diff

* reduce diff further

* make pendingBytesUpdated a local, not global

* read and write loop communicate with a channel

no more ticker required

* rename

* rename

* abstract promptLoop

* rename

* update readme

* add diagram to readme

* remove test

I'm not sure it is adding any value. May return to it in future

* sendToThrottlingLoop uses mutex

* harmonize "*Loop returning" logs

* tidy

* remove TODO

* rip out ThrottleInterval config var

* add activeSequencerChanged channel and wire it up to onActiveSequencerChanged hook in active rollup provider

* set callback at runtime, not in constructor

* reinstate startup order

* tidy

* remove dead code

* remove unintentional change

* rename readLoop to blockLoadingLoop and writeLoop to publishingLoop

* improve diagram

* replace ThrottleInterval with ThrottleThreshold as enabling var

* remove onActiveProviderChanged arg from constructors

* protect callback invocation with nullity check

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <[email protected]>

* remove ThrottelInterval from flags

* docs: mention active sequencer signal

* only attach callback if throttling is enabled

* increase buffer of activeSequencerChanged

means that we buffer a single event if the throttlingLoop is busy, even when using a try-send

* add buffer to pendingBytesUpdated

see previous commit

* reduce indentation

* remove dangling ref

* pass killCtx to publishingLoop and avoid accessing this context globally

* remove continue and always signal publishing loop

* remove unecessary mutex wrangle

* replace signalPublishingLoop method with trySignal fn

* extend TestRollupProvider_FailoverOnInactiveSequencer to cover callback fn

* use retryTimer in throttlingLoop

if RPC calls fail, they will be retried after retryInterval (or before, if another event triggers updateParams)

* op-batcher: improve active-seq-changed signalling setup

* remove unused state var

* rename blocksLoaded channel to publishSignal channel

---------

Co-authored-by: Sebastian Stammler <[email protected]>
…river.sendTx` (ethereum-optimism#14517)

* op-batcher: add TestBatchSubmitter_sendTx_FloorDataGas

* op-batcher: use floorDataGas for transactions if greater than intrinsicGas

* Update op-batcher/batcher/driver.go

Co-authored-by: Sebastian Stammler <[email protected]>

* log error instead of ignoring

---------

Co-authored-by: Sebastian Stammler <[email protected]>
…essBlocks()` (ethereum-optimism#14520)

* op-batcher: always updateCursorAndMetrics when returning from processBlocks()

* Update op-batcher/batcher/channel_manager.go

Co-authored-by: Sebastian Stammler <[email protected]>

---------

Co-authored-by: Sebastian Stammler <[email protected]>
…hannel timeout log (ethereum-optimism#14553)

* op-batcher: remove ChannelManager.CheckExpectedProgress

* op-batcher: add warning log when a channel times out on chain
…d()` (ethereum-optimism#14561)

* op-batcher: correctly track block metrics in handleChannelInvalidated

Includes test which fails without the fix.

* op-batcher: log out channels which are dropped during handleChannelInvalidated()

* change to warn log for dropped channels
…#14563)

* improve computeSyncActions logging

* fixup test and make sure all cases run (!)

* use more friendly format for structured logger
…imism#14587)

* op-batcher: introduce PREFER_LOCAL_SAFE_L2 config var

* lint

* Apply suggestions from code review

Co-authored-by: Sebastian Stammler <[email protected]>

* lint

---------

Co-authored-by: Sebastian Stammler <[email protected]>
@karlb karlb force-pushed the karlb/cherry-pick-batcher-fixes branch from 286bfae to 3dc7177 Compare March 5, 2025 17:20
@karlb
Copy link
Author

karlb commented Mar 5, 2025

The upstream code didn't cleanly shut down when AltDA is used. I submitted a PR to upstream in addition to pushing the fix here: ethereum-optimism#14654

@palango palango requested review from seolaoh and palango March 6, 2025 10:42
@palango palango merged commit 49de8e6 into celo-rebase-12 Mar 6, 2025
32 checks passed
@palango palango deleted the karlb/cherry-pick-batcher-fixes branch March 6, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants