chore: Investigate auction propagation delay between driver <> solver engine #2133

fleupold · 2023-12-07T08:55:09Z

Background

Similar to #2095 there is another 1.7s propagation delay between the driver and the solver-enginer (cf. https://production-6de61f.kb.eu-central-1.aws.cloud.es.io/app/r/s/U1XTl)

2023-12-07T08:38:34.078Z DEBUG request{id="17245"}:/solve:auction{id=8122282}: shared::http_solver: http solver instance name is 2023-12-07_08:38:34.078506830_UTC_Ethereum___Mainnet_1_8122282
2023-12-07T08:38:32.337Z DEBUG request{id="17245"}:/solve{solver=otex-solve auction_id=8122282}: driver::infra::api::routes::solve: auction task execution time elapsed=556.865172ms auction_id=8122282

It would be good to see if we can reduce this as well since we are leaking time the external solvers can actually spend on finding a solution.

MartinquaXD · 2023-12-07T09:15:34Z

There are only 2 async operations happening in the measured code path:

fetching current gas price
fetching token meta data (symbol, decimals, contract balance)

2 is heavily cached but if we encounter a token that is not cached all solvers would send 3 RPC requests per uncached token. If this is the culprit we could use the RequestSharing component here to ensure they don't send duplicate requests.

Edit: Looks like the culprit could actually be lock contention. If you configure enough solvers you can reproduce slow times even without cache misses.

# Description We've noticed weird delays between the autopilot sending an auction and the driver being ready to start working on it. The linked issue suggests that we spend a lot of time converting the auction and enriching it with token meta data. This should be super fast because all the data should be cached at all times. However, was one token that caused it all to break down: `0xeeeee...` (the address we use to denote the chains native token) This is roughly what happened: 1. every driver receives auction 2. every driver collects all the addresses for which we don't have cached metadata 3. this always yielded `0xeeee...` (it's not a real token so we never get data when calling ERC20 functions on it) 4. every driver wasted some time sending RPC requests for `0xeeee` 5. every driver tried to take an exclusive lock to write no data (empty vec) to the cache Overall this wasted time with unnecessary network requests and serially taking an exclusive lock for all drivers. # Changes * most importantly filter out `0xeee` * avoid sending duplicate RPC requests by using `RequestSharing` * don't take an exclusive lock if some other driver cached the same data in the meantime * moved logging of token balance update task out of the critical section All changes combined results in us now spending a couple microseconds on getting cached balances instead of seconds. And because this was the bulk of the work captured by the log `auction task execution time` we reduced that time to ~11ms. ## How to test e2e tests should still pass and manual test confirmed the reduced latency. Fixes #2133

MartinquaXD · 2023-12-13T11:17:00Z

Follow up on this. Each driver now spends 2-3 ms on this part most of the time.
But there can still be unfortunate timings when we see a new block right before the auction or even during the auction.
For example here 1 driver was able to finish the task in 3ms and then all the subsequent drivers needed multiple 100ms because we saw a new block.
But I think there is nothing that can reasonably be done about this.

fleupold added the E:3.1 Driver Colocation See https://github.com/cowprotocol/pm/issues/14 for details label Dec 7, 2023

MartinquaXD self-assigned this Dec 7, 2023

MartinquaXD mentioned this issue Dec 7, 2023

Speed up token meta data look up #2134

Merged

MartinquaXD closed this as completed in #2134 Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Investigate auction propagation delay between driver <> solver engine #2133

chore: Investigate auction propagation delay between driver <> solver engine #2133

fleupold commented Dec 7, 2023

MartinquaXD commented Dec 7, 2023 •

edited

Loading

MartinquaXD commented Dec 13, 2023

chore: Investigate auction propagation delay between driver <> solver engine #2133

chore: Investigate auction propagation delay between driver <> solver engine #2133

Comments

fleupold commented Dec 7, 2023

Background

MartinquaXD commented Dec 7, 2023 • edited Loading

MartinquaXD commented Dec 13, 2023

MartinquaXD commented Dec 7, 2023 •

edited

Loading