-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make txArriveTimeout Configurable w/ CLI Flag #734
Make txArriveTimeout Configurable w/ CLI Flag #734
Conversation
…c to handler to fetcher. TODO verify default, expose var to api
…default) settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with few comments.
… to indicate the new default value. updated tx_fetcher_test to test using the new default value
…flags to duration type. Tested on live both w/o flag set (default) and w/ flag set
… if txArrivalWait < txGatherSlack
Added a check that sets the The rationale is due to the context of the usage of txGatherSlack by the fetcher: if (The exception to the above is if the node operator has modified the hardcoded |
@JekaMas Wanted to make sure you saw the change to add in a min value for the |
@JekaMas Hey - I believe I erroneously removed @temaniarpit27 and @0xsharma from the "review" position - could you re-add them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, update all the TOML configuration files for consistency.
…ArrivalWaitRaw string for toml and hcl value inputs
Thank you for your interest @ajb ! Polygon dev team welcomes all opinions and proposals. |
@pratikspatil024 @temaniarpit27 @0xsharma I am ok with that change, it does nothing with the network, although will simply the research. Let's merge it, if you are ok too. |
Sure, I will participate. Please let me know. |
@ajb great! By far could you give as the way to reproduce your experiments and data? Like what steps, setup, etc. |
What do you mean? PR #627 is not mine. I would be happy to do some experiments regarding the PR I authored (#292) but I'm unsure how to really advocate for that change, since with these types of changes, you would need to make the change on the network first, then wait for bots to change their behavior accordingly. |
A slightly different take - Ethereum had spam issues too although not as much as the low fee chains. Introduction of mev-geth eliminated most of that. mev-bor as an alternative to bor already exists and it can be updated to feature full block templates like mev-boost. Allowing client diversity may be a simpler solution to the spam issue. @JekaMas we would be interested in joining the group as well! |
This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
@thogard785 we would like to go ahead with your PR. Can you please pull in the latest changes and resolve conflicts? |
I had a concern regarding this PR that I think could be solved with a minor addition. There's a danger of misconfiguration of this parameter by a validator. Validator nodes rely on the TxFetcher in order to receive all transactions, since the small number of peers means most transaction will not be directly propagated. If a validator were to inadvertently set this value too high, it could result in a validator being unable to process most, or all of the transactions during their sprint. In the case of a normal validator, a small minority of transaction might still make it through with direct propagation, and with PFL validators only those transactions that win an PFL auction might make it through. This would result in a pretty bad degradation in user experience, and would be difficult to debug by either the validator or polygon team. It's hard to predict what kind of effects on the network there long periods of no transaction processing would have on the network, so it's certainly possible this would have network level effects on other participants. Could I suggest putting a cap on this value enforced inside bor? I think the old value of .5 seconds would make a lot of sense. I can understand why a node operator might want to lower the value, but it seems unlikely that a they would want to place set this value higher than the old default. If there's a concern over resource load, I'd suggest other settings to be considered, as this parameter would have almost no effect on resources given how little the TxFetcher is used by nodes with any reasonable number of peers. In the PFL whitepaper, the number one design objective is that the system "Incurs no transaction delay beyond the limit of what is already possible.". The suggested value would certainly help to achieve this. There might be some perverse incentive for PFL / validators to want to delay transactions with this parameter in order to extract more MEV; this kind of thing being supported opens up a lot of other questions / issues that I would need to be clearfully considered. |
@CitizenSearcher that's good feedback, and you're right about our objective of not delaying regular users past the "default" of 500ms. I'm 100% ok with a 500ms ceiling. @temaniarpit27 thoughts on adding a cap? |
@CitizenSearcher thanks a lot for the feedback |
…longer durations than maxTxArrivalWait will default to the maxTxArrivalWait duration
|
1d73d4e
to
1e926cf
Compare
@temaniarpit27 I updated the branch to resolve conflicts: |
@thogard785 can you please fix the failing CI? |
Hey @temaniarpit27 - sorry for the delay on this. I was sidelined for health reasons, but I'm back to 100% now. These changes should fix the cuddling issues the linter was having w/ the codebase. The core/tx_list.go change seemed to be totally unrelated to what i was working on. It looks like that cuddle issue was fixed separately in the develop branch.. so i overwrote my own fix w/ the one from develop branch when i updated this branch w/ the changes to develop. |
Looks like merging w/ dev again added some new checks to fail - will fix on Monday. |
Ya will need to fix testcases as well |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #734 +/- ##
===========================================
- Coverage 56.64% 56.60% -0.04%
===========================================
Files 611 611
Lines 72238 72447 +209
===========================================
+ Hits 40919 41012 +93
- Misses 27823 27926 +103
- Partials 3496 3509 +13
☔ View full report in Codecov by Sentry. |
updated the fuzzer. @temaniarpit27 codecov/project is still failing but all the other tests are passing. I'm unsure how to resolve the codecov/project fail - any advice? |
Description
Following up on:
txArriveTimeout
totxArrivalWait
to more accurately describe the purpose of the argument - this is a variable that controls how long to wait before requesting a transaction, there's no action/request/response that has been 'timed out'.txarrivalwait
to allow node operators to set their own maximum wait period before explicitly requesting an announced transaction.txarrivalwait
period to 500ms (EDIT: updated from 100ms per comment from jekamas). While the motives of the author of PR 627 are unclear, his analysis and data collection are very thorough. Extremely thorough for somebody's first PR...I suspect that the optimal
txarrivalwait
will vary per node - some nodes may prioritize a decreased load or may be in higher-latency locations while others may want a very rapid response. By allowing the setting to be configurable, we give node operators the chance to choose a setting that best fits their need.Note that this also will allow validators the ability to customize their interactions with the FastLane MEV system, as referenced in PR 707.
Changes
Nodes audience
This
Checklist
Testing
Additional comments
I do not have high conviction either for or against the default value of 100ms and am open to changing it. The rationale for the selection of 100ms as default is solely due to PR 627 being accepted. EDIT: changed to 500ms per comments from JekaMas.
EDIT: Added a minimum
txArrivalWait
value Added a check that sets thef.txArrivalWait
value to thetxGatherSlack
value iff.txArrivalWait
<txGatherSlack
.The rationale is due to the context of the usage of
txGatherSlack
by the fetcher:if time.Duration(f.clock.Now()-instance)+txGatherSlack > f.txArrivalWait {...
(found on lines 451 and 488 )
if
f.txArrivalWait
is less thantxGatherSlack
, it is impossible for the above conditional to return false, rendering the value off.txArrivalWait
meaningless and potentially leading to node operators misunderstanding the impact of thetxArrivalWait
value they set.(The exception to the above is if the node operator has modified the hardcoded
gatherSlack
variable (default: 100ms) to be less than thetxGatherSlack
variable (default: 20ms). Comparingf.txArrivalWait
against these two values istxArrivalWait
's only hot path usage in bor.d56370b