Faster mapping execution #856

Jannis · 2019-04-04T16:39:52Z

No description provided.

leoyvens · 2019-04-09T13:02:59Z

I assume this issue is about improving indexing time. First we shoud try to answer questions such as:

What subgraphs are representative of the use pattern we want to speed up? How complex are their mappings? Do they call contracts or IPFS? How many events are there per block? How many entities are there per block?
What speedup are we targeting?
Are we considering a warm database with blocks loaded (probably from indexing a previous version of the subgraph) or cold?
Do we want to speed up indexing from genesis or indexing the newest block in a subgraph that's already synced?

Case studies

I went ahead and did three case studies: Compound, ENS and Uniswap. One case I didn't cover is IPFS, which none of these use.

Compound makes 3-4 contract calls per event. The time to make a contract call is dependent on the latency to the eth node, so I can't measure on my machine, we'd need production timings, but I'd suspect that the contract calls dominate the indexing time.

ENS does no contract or ipfs calls and has very simple mappings. From a cold db, fetching blocks from the eth node dominates the indexing time. From a warm DB, it gets more subtle. The three major steps to process a block (and my local measurments for ENS) are:

Get block from the block stream 3-10ms
Process events in the runtime ~3ms*n of events in block
Apply operations to the DB 10-20ms, proportional to n of operations
Repeat.

Steps 1 and 3 take a while because they involve the DB, while step 2 is the interaction with wasm.

Uniswap like ENS doesn't do contract or ipfs calls. However it has more complex mappings that take ~15ms per event, making the runtime the slowest step in many blocks.

Possible solutions

From a cold db I don't have a better idea other than making sure there's good networking to the eth node, and then there are also bigger ideas such as local contract execution. From a warm DB there are a few things:

For Compound, caching contract calls might help. Or running nodes in the local cluster to reduce latency.

For ENS step 3 is significant, applying the operations to the DB seems to be a bottleneck if the mapping is simple and each block has only a few events. We could pipeline operations to the DB so that we could start processing the next block while the previous one is being applied. This would require architectural changes, currently the block stream cannot proceed while the previous block has not been fully processed. Could be an up to 2x speed up for some blocks.

For Uniswap, Step 2, the runtime, becomes the bottleneck. I see two things that we can improve: First being smarter with allocations and pre-allocating memory when passing objects to AS. I believe this would give a 2x speed up. Then there is swapping out wasmi for something more performant, Cranelift + wasmtime or wasmer. This is more work but should give at least a 10x speed up to the runtime.

leoyvens · 2019-05-04T11:43:25Z

As a part of this, I've been observing the timings of the Decentraland subgraph in production. Observations:

store.get times vary a lot, from 2-3ms to 40-60ms for a same operation. Possible reasons:
- Varying DB load, but the DB has a lot more resources than it uses right now.
- Entities of a same type having drastically different sizes, I don't think this is the case for this subgraph.
- Contention in the DB connection pool in the indexing node.
  - We should check for this.
- Varying DB performance for some other reason.
Factoring out store.get, event processing time still varies a lot. Possible reasons:
- Contention in the Tokio thread pool in the indexing node.
  - We should check, DB operations in tasks could cause this. We should also log the size of the pool.
- Varying CPU performance for some reason, even though we have more CPU than we use and it's not spiking.

leoyvens · 2019-05-10T16:05:07Z

Looking at logs in the hosted service, DB contention doesn't look like a big issue, so I don't have new or confirmed ideas for why store.get performance varies.

On general event processing time, thread pool contention is clearly being a factor, #926 will give clear confirmation of this. My plan there would be:

Set the tokio threadpool size to 100, this should confortably allow for 20 subgraphs to run in a single node.
Solve Move slow db interactions to tokio blocking pool #905 by moving all store operations into a separate, blocking threadpool, and have them return futures. This is not hard but it's laborious and will have a big diff.
Try going back to the default threadpool size.

We should do 1 now, and if it works leave 2-3 in the technical debt bucket.

leoyvens · 2020-06-18T19:08:46Z

Old, also we switched to wasmtime.

Jannis assigned leoyvens Apr 4, 2019

Jannis added area/runtime area/subgraphs labels Apr 4, 2019

Jannis added the needs planning label Apr 15, 2019

leoyvens added the epic label Apr 18, 2019

This was referenced May 8, 2019

Monitor contention in tokio threadpool #917

Merged

Measure effect of tokio contention in event processing #926

Merged

This was referenced May 13, 2019

Reduce thread contention in subgraph indexing #937

Closed

Track the cost of a subgraph #940

Closed

leoyvens closed this as completed Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster mapping execution #856

Faster mapping execution #856

Jannis commented Apr 4, 2019

leoyvens commented Apr 9, 2019 •

edited

Loading

leoyvens commented May 4, 2019

leoyvens commented May 10, 2019

leoyvens commented Jun 18, 2020

Faster mapping execution #856

Faster mapping execution #856

Comments

Jannis commented Apr 4, 2019

leoyvens commented Apr 9, 2019 • edited Loading

Case studies

Possible solutions

leoyvens commented May 4, 2019

leoyvens commented May 10, 2019

leoyvens commented Jun 18, 2020

leoyvens commented Apr 9, 2019 •

edited

Loading