feat: trie cache factory to allow variable cache sizes #7022

Longarithm · 2022-06-13T14:18:51Z

In the following release, we want to have variable sizes for trie caches, because shard 3 is going to get increased load. To do so, I introduce TrieCacheFactory initialized by the store config data, and move trie cache logic creation there.

It's not clear if we can create all caches from the very beginning - from what I remember, new caches have to be created if shard split logic is triggered.

We also reduce TRIE_LIMIT_CACHED_VALUE_SIZE to 1000 due to two reasons:

most frequently occured nodes have size < 1000; e.g. branches use ~32 * 16 = 512 bytes
this will make RAM increase smaller - from 1.6 GB to 0.4 + 2 = 2.4 GB.

~~I partially use work from #7027 due to urgency of the change, we want to try adding it to the next release.~~

Testing

Manual state-viewer run on shard 2:

$ ./target/release/neard --unsafe-fast-startup view_state --readwrite apply_range --start-index 91713500 --end-index 91713700 --shard-id 2 --sequential
Applying chunks in the range 91713500..=91713700 for shard_id 2                                                                                                                               [4/1867]
Printing results including outcomes of applying receipts

Processed 50 blocks, 100443 ms passed, 0.4978 blocks per second (0 skipped), 303.34 secs remaining 3 empty blocks 204.21 avg gas per non-empty block
Processed 100 blocks, 72478 ms passed, 0.6899 blocks per second (0 skipped), 146.41 secs remaining 4 empty blocks 199.30 avg gas per non-empty block
Processed 150 blocks, 60252 ms passed, 0.8298 blocks per second (0 skipped), 61.46 secs remaining 17 empty blocks 252.45 avg gas per non-empty block
Processed 200 blocks, 65808 ms passed, 0.7598 blocks per second (0 skipped), 1.32 secs remaining 9 empty blocks 206.71 avg gas per non-empty block

firatNEAR

LGTM, we need to wait out for the mainnet shardId that is going to have increased load before we merge.

akhi3030

LGTM! A couple of minor comments. It will be good to get reviews from someone more seasoned in the code base as well though.

core/store/src/config.rs

akhi3030 · 2022-06-14T08:26:53Z

core/store/src/lib.rs

@@ -67,8 +69,9 @@ impl Store {
    /// Caller must hold the temporary directory returned as first element of
    /// the tuple while the store is open.
    pub fn tmp_opener() -> (tempfile::TempDir, StoreOpener<'static>) {
+        static CONFIG: Lazy<StoreConfig> = Lazy::new(StoreConfig::test_config);


I am not sure I understand why we need to cache the config here. Can we not always call test_config? Is calling test_config a very expensive operation or can it return different values when called again and again?

cc @mina86, this is the kind of side-effects which made me to avoid lifetime parameters by default (echoing back #6973 (review)).

This is one of changes I took from #7027, and I don't fully understand the reasoning as well. Is it fine to proceed with this PR and discuss this moment in #7027?

core/store/src/trie/shard_tries.rs

core/store/src/config.rs

core/store/src/trie/shard_tries.rs

matklad · 2022-06-14T10:29:09Z

With or_insert -> or_insert_with change looks good to me as a quick fix.

However, the overall code does feel a bit "bolted on". I would like to fix the following things in the follow up:

make sure that default and overriden cache capacity is handled in the same way. It doesn't make sense that the former is hard-coded, but the latter is overridable
Cleanup defaults in StoreConfig, it feels messy that the source of truth are awkward functions which exists solely for serde
Replace TrieFactory with just TrieConfig -- we don't need doer object here, just a bag of values
revisit the format of the config,
```
"trie_cache_capacities": [
  [
    {
      "version": 1,
      "shard_id": 2
    },
    2000000
  ]
]
```
doesn't look like a good config format, something like { "1:2": "2000000" } would'd be much user friendlier.

Co-authored-by: Aleksey Kladov <[email protected]>

Co-authored-by: Akhilesh Singhania <[email protected]>

core/store/src/trie/shard_tries.rs

matklad · 2022-06-14T11:39:52Z

However, the overall code does feel a bit "bolted on". I would like to fix the following things in the follow up:

Note: I think we shouldn't puse all that in this PR immediately :)

Longarithm · 2022-06-14T12:55:16Z

Great points @matklad, I will use the moment and ask some questions:

make sure that default and overriden cache capacity is handled in the same way. It doesn't make sense that the former is hard-coded, but the latter is overridable

I think that putting all capacities to config leads to lots of repeated data in the config, especially if we have more shards.

Replace TrieFactory with just TrieConfig -- we don't need doer object here, just a bag of values

I would like to, but logically we create trie caches' objects before creating ShardTries object, so we need some doer. Having create_cache method in TrieConfig looks weird to me.

matklad · 2022-06-14T13:42:52Z

I think that putting all capacities to config leads to lots of repeated data in the config, especially if we have more shards.

So I think that users would want to put nothing in their config in either case -- we are relying on default values to makes sense. The problem isn't how the data in the config looks, but rather the fact that the actual size of the thing at runtime is determined by two completely separate sources of information:

a hard-coded constant in the code
a value from the config object

I think it wold be beneficial, purely from the code quality point of view, to make sure that there's a single place in code which ultimately determines the size of the cache. Given that we want to make some aspect of this to be configurable, it makes sense for config to be this source of truth. Other than that, I think even operationally it would be useful to configure the size of cache for all shards withoug explicitly overriding each shard. But, again, this is not something we should be tackling in this PR: if there's time pressure, we can ship minimal correct diff, and work at making API nicer separately.

I would like to, but logically we create trie caches' objects before creating ShardTries object, so we need some doer.

We need some bag of parameters, but we don't need this bag to have doer semantics I would think, an inert "plain old data" object would do. I'd imagine something like this would work:

pub struct ShardTriesParams {
    pub shard_version: ShardVersion,
    pub num_shards: NumShards,

    pub default_shard_capacity: usize,
    pub shard_capacities: HashMap<ShardUId, usize>,
}

impl ShardTriesParams {
    fn shard_capacity(&self, shard_uid: ShardUId) -> usize {
        *self.shard_capacities.get(&shard_uid).unwrap_or(&self.default_shard_capacity)
    }
}

...
            let mut caches = caches_to_use.write().expect(POISONED_LOCK_ERR);
            caches
                .entry(shard_uid)
                .or_insert_with(|| TrieCache::with_capacity(self.0.params.shard_capacity(shard_uid)))
                .clone()
...

Why I think this would be better:

when designing API, it's always beneficial to think from the call-site perspective. At the call-site, we don't care that the thing is used internally to create caches, at that level of abstraction we only care that there's a bunch of knobs on the tries, and we want to set those knobs to particular value.
we don't actually need a factory object here: there's no meaningful state the factory would hold. A simple fn crate_shard_cache(params: ShardTriesParams, shard: ShardId) -> TrieCache would do.
Between "object with state and behavior" and "plain old data", data is usually the simpler of the two, and should be default choice.

But, again, that's probably better to be left for future refactor.

matklad · 2022-06-14T13:44:47Z

Should this have auto-merge label? It collected all of the approvals I think?

matklad · 2022-06-14T13:45:21Z

LGTM, we need to wait out for the mainnet shardId that is going to have increased load before we merge.

Guess I've found the answer: not yet

This reverts commit 1f1701b.

Revert default value (to 50K) because after #7022 we got more evidence that it doesn't help to speed up storage ops. ## Testing Existing tests.

Revert default value (to 50K) because after near#7022 we got more evidence that it doesn't help to speed up storage ops. ## Testing Existing tests.

Longarithm force-pushed the trie-cache-factory branch from 8757766 to 46b09f9 Compare June 13, 2022 20:17

Longarithm changed the title ~~implement trie cache factory~~ feat: trie cache factory to allow variable cache sizes Jun 13, 2022

implement trie cache factory

6da2d66

Longarithm force-pushed the trie-cache-factory branch from dfd2705 to 6da2d66 Compare June 13, 2022 20:49

Longarithm marked this pull request as ready for review June 13, 2022 20:50

Longarithm requested a review from a team as a code owner June 13, 2022 20:50

Longarithm requested a review from matklad June 13, 2022 20:50

add comments

28b0c0b

Longarithm requested review from firatNEAR and akhi3030 June 13, 2022 21:00

Longarithm self-assigned this Jun 13, 2022

Longarithm added the T-core Team: issues relevant to the core team label Jun 13, 2022

Looogarithm added 2 commits June 14, 2022 01:26

add comments

54a2857

remove comment

dd09878

firatNEAR approved these changes Jun 14, 2022

View reviewed changes

akhi3030 approved these changes Jun 14, 2022

View reviewed changes

matklad suggested changes Jun 14, 2022

View reviewed changes

core/store/src/config.rs Outdated Show resolved Hide resolved

core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved

Looogarithm and others added 2 commits June 14, 2022 15:22

Update core/store/src/trie/shard_tries.rs

ce1aabe

Co-authored-by: Aleksey Kladov <[email protected]>

Update core/store/src/config.rs

c60e8c5

Co-authored-by: Akhilesh Singhania <[email protected]>

matklad reviewed Jun 14, 2022

View reviewed changes

core/store/src/trie/shard_tries.rs Outdated Show resolved Hide resolved

Longarithm marked this pull request as draft June 14, 2022 11:32

Longarithm force-pushed the trie-cache-factory branch from e0d7d12 to c60e8c5 Compare June 14, 2022 12:19

Looogarithm added 2 commits June 14, 2022 16:23

fix comment

8eb514f

remove pub

4e55005

Longarithm marked this pull request as ready for review June 14, 2022 12:46

Longarithm requested a review from matklad June 14, 2022 12:58

matklad approved these changes Jun 14, 2022

View reviewed changes

Merge branch 'master' into trie-cache-factory

933881a

Longarithm changed the title ~~feat: trie cache factory to allow variable cache sizes~~ [do not merge] feat: trie cache factory to allow variable cache sizes Jun 14, 2022

shard 2 -> shard 3

90a79eb

Longarithm changed the title ~~[do not merge] feat: trie cache factory to allow variable cache sizes~~ feat: trie cache factory to allow variable cache sizes Jun 15, 2022

Merge branch 'master' into trie-cache-factory

a747ab2

Longarithm added the S-automerge label Jun 15, 2022

near-bulldozer bot merged commit fc16eb2 into near:master Jun 15, 2022

nikurt pushed a commit that referenced this pull request Jun 15, 2022

feat: trie cache factory to allow variable cache sizes (#7022)

416cbd4

nikurt pushed a commit that referenced this pull request Jun 15, 2022

feat: trie cache factory to allow variable cache sizes (#7022)

1f1701b

Longarithm mentioned this pull request Jun 16, 2022

chore: update changelog #7048

Closed

matklad mentioned this pull request Jun 20, 2022

Remove two sources of truth for default trie cache size #7066

Closed

nikurt added a commit that referenced this pull request Jun 20, 2022

Revert "feat: trie cache factory to allow variable cache sizes (#7022)"

d84448f

This reverts commit 1f1701b.

Longarithm mentioned this pull request Jun 23, 2022

fix: revert default value for trie cache capacity #7101

Merged

near-bulldozer bot pushed a commit that referenced this pull request Jun 24, 2022

fix: revert default value for trie cache capacity (#7101)

15d1c16

Revert default value (to 50K) because after #7022 we got more evidence that it doesn't help to speed up storage ops. ## Testing Existing tests.

jakmeier mentioned this pull request Sep 6, 2022

refactor: trie cache configuration #7566

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: trie cache factory to allow variable cache sizes #7022

feat: trie cache factory to allow variable cache sizes #7022

Longarithm commented Jun 13, 2022 •

edited

Loading

firatNEAR left a comment •

edited

Loading

akhi3030 left a comment

akhi3030 Jun 14, 2022

matklad Jun 14, 2022

Longarithm Jun 14, 2022

matklad commented Jun 14, 2022

matklad commented Jun 14, 2022

Longarithm commented Jun 14, 2022

matklad commented Jun 14, 2022

matklad commented Jun 14, 2022

matklad commented Jun 14, 2022

feat: trie cache factory to allow variable cache sizes #7022

feat: trie cache factory to allow variable cache sizes #7022

Conversation

Longarithm commented Jun 13, 2022 • edited Loading

Testing

firatNEAR left a comment • edited Loading

Choose a reason for hiding this comment

akhi3030 left a comment

Choose a reason for hiding this comment

akhi3030 Jun 14, 2022

Choose a reason for hiding this comment

matklad Jun 14, 2022

Choose a reason for hiding this comment

Longarithm Jun 14, 2022

Choose a reason for hiding this comment

matklad commented Jun 14, 2022

matklad commented Jun 14, 2022

Longarithm commented Jun 14, 2022

matklad commented Jun 14, 2022

matklad commented Jun 14, 2022

matklad commented Jun 14, 2022

Longarithm commented Jun 13, 2022 •

edited

Loading

firatNEAR left a comment •

edited

Loading