chore: Reduce number of solutions that fail after calling `/settle` #2007

MartinquaXD · 2023-10-22T21:57:03Z

Background

Similar to #1999 this issue aims to reduce the time we waste due to incomplete data in the autopilot.
For example it can happen that some solver is very quick to respond with a solution for a given auction. But if some solvers take all the time they can use before reporting a solution they will return solutions with more up-to-date date whereas the quick to respond solvers would be unaware that a new block appeared that made their solution revert.

Details

I see 2 ways to handle this:

in the driver keep requesting new solutions from the matching engine until the time runs out. This is the actually the most rational behavior and would produce the most recent data the solver can provide at the cost of more compute resources on the external solver's end.
In the run loop of the autopilot don't just pick the highest score and declare them the winner. Instead call /reveal on the individual drivers from the highest to the lowest score and simulate their solutions. Only call /settle for the highest ranking solutions that still simulates. This solution would be totally fine for as long as we are running all the drivers but would undermine a pretty important aspect of the redesigned system (improved market maker support that only expects you to produce call data if you actually won).

We could of course go for both solutions (or some other solutions I didn't consider) but since just doing 2 would result in a system that is effectively identical to our legacy system (together with the mentioned PR) and is easy to implement I would go for that until we figure out a better way to avoid wasting precious run loops.

Acceptance criteria

Some change that results in more winning solutions actually ending up on-chain.

The text was updated successfully, but these errors were encountered:

MartinquaXD · 2023-10-22T21:58:17Z

Since we have to have way higher throughput and should overall have the lowest latency possible on prod I consider this issue as Blocking Prod.

sunce86 · 2023-10-23T09:30:25Z

Considering that is blocking prod I would be ok with (2) but would make immediate follow up plan what to do in a cleanup - probably communicating with external solvers about this and implement (1) and potentially reduce the solving time to further reduce the change of solution being invalid.

# Description Fixes #2007. The preferred solution described in that issue was not straight forward to implement. It expected the `autopilot` to call `/reveal` from the highest scoring solution to the lowest scoring one to simulate it in order to avoid cases where a solver wins the auction with a solution that would revert by now. The issue was that this requires the `autopilot` to know the address of every solver which would be a significant change. Instead this PR implements a strategy on the `driver` side that a rational actor would be expected to follow as well. All solvers know when they have to return a solution at the latest. Because a solution can be more accurate the closer it was computed to the time it is supposed to be executed it makes sense for all solvers to delay their response as much as possible (something might change in the mean time which might enable an even better solution). This deadline gets propagated to the `solver` engine so ideally it would take as much time computing the optimal solution as possible. If the solver returns earlier (like some currently do) it still makes sense to assume that other solvers will submit their solution at a later point in time. In that case the only reasonable action for the `driver` is to wait until the deadline approaches and continuously re-simulate the computed solution whenever a new block gets detected. If in the meantime the solution would start to revert the `driver` would then withhold the computed solution to not accidentally win the auction and get slashed for not submitting it on-chain. # Changes `driver` waits until deadline before returning the solution on `/solve` and checks whether it's still viable on every new block. It also updates the score in case the solution still simulates but becomes better or worse (gas usage might change). Also slightly adjusted to `Ethereum::new()` to panic on any init error since we can't handle those errors anyway because the type is essential to the program. ## How to test This can be tested by a new e2e test which I would like to implement in a follow up PR when existing e2e tests don't break.

MartinquaXD added the E:3.1 Driver Colocation See https://github.com/cowprotocol/pm/issues/14 for details label Oct 22, 2023

MartinquaXD mentioned this issue Oct 23, 2023

Wait for solve deadline #2008

Merged

MartinquaXD closed this as completed in #2008 Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Reduce number of solutions that fail after calling `/settle` #2007

chore: Reduce number of solutions that fail after calling `/settle` #2007

MartinquaXD commented Oct 22, 2023

MartinquaXD commented Oct 22, 2023

sunce86 commented Oct 23, 2023

chore: Reduce number of solutions that fail after calling /settle #2007

chore: Reduce number of solutions that fail after calling /settle #2007

Comments

MartinquaXD commented Oct 22, 2023

Background

Details

Acceptance criteria

MartinquaXD commented Oct 22, 2023

sunce86 commented Oct 23, 2023

chore: Reduce number of solutions that fail after calling `/settle` #2007

chore: Reduce number of solutions that fail after calling `/settle` #2007