avoid memory allocations and copies when loading states #937

arnetheduck · 2020-04-27T10:35:30Z

rolls back some of the ref changes
adds utility to calculate stack sizes

arnetheduck · 2020-04-27T10:50:44Z

incidentally, this also fixes most new warnings

zah · 2020-04-27T10:59:29Z

beacon_chain/beacon_chain_db.nim

+  proc decode(data: openArray[byte]) =
+    try:
+      # TODO can't write to output directly..
+      outputAddr[] = SSZ.decode(data, BeaconState)


It's easy to create a helper that can write to a var directly. I'll look into after the PR is merged.

the serialization library would have to be rewritten to not use exceptions and assume a blank slate (empty seqs etc) which would complicate it somewhat - on the plus side it could reuse seq memory etc instead of allocating new seqs - in general though, it's uglier and less safe - specially in the presence of exceptions - this BeaconChainDB acts as an exception barrier so it can do tricks like this somewhat safely as long as it has the rollback workaround

the bigger issue here is the lifetime of outputAddr - I wish there was a safe construct for this.

Note that if BeaconState becomes a ref, the compiler might move it somewhere else on the heap and invalidate the address.

well, I put unsafeAddr here so that we can grep for it easily - it should be safe though since the scope is local

besides, nim isn't fancy enough to do compacting, is it?

zah · 2020-04-27T10:59:47Z

beacon_chain/beacon_node.nim

@@ -122,11 +122,15 @@ proc getStateFromSnapshot(conf: BeaconNodeConf): NilableBeaconStateRef =
      quit 1

  try:
-    result = SSZ.decode(snapshotContents, BeaconStateRef)
+    let res = BeaconStateRef()


Why did you consider it necessary to rewrite the code here?

to have only one code path for loading BeaconState avoiding subtle differences and bugs - the most important use case is loading states efficiently in the BlockPool that manages state rewind which is used all over the place in security critical contexts thus this use case guides the tradeoffs in the API

I don't follow. My question was why did you decide to rewrite:

result = SSZ.decode(snapshotContents, BeaconStateRef)

into

let res = BeaconStateRef() res[] = SSZ.decode(snapshotContents, BeaconState) result = res

I see no practical benefits of doing this.

this way, we don't hit any ref code paths in SSZ that implicitly allocate references - indeed, it means we only use one set of code paths to load BeaconState across the whole codebase - allocate-then-set.

I find this argument unconvincing. The ref code paths are already used at will in the test suite. One can argue they are better tested :)

well - not really - there are no ref inside any SSZ objects any more so the ref support in SSZ could be taken away, simplifying it - in fact, this was one of the points raised on the eth1 branch, that loading ref:s inside SSZ is better avoided since it's not a major use case and the code would be simpler without it - loading ref with value semantics as is implemented right now is questionable: they're no longer refs then, but values that behave like refs which is weird and unexpected.

"Simplify" is an overstatement here. Since SSZ is based on generic functions, you'll compile the ref support only if you actually use it. And if you use it, it will produce just a single additional function that will have exactly the same body as the 3 lines written here.

Since writing 1 line is simpler than writing 3 lines, I'd say the having the ref support simplifies things. I'm not sure why you seem to be fighting the idea that we could employ data sharing between the SSZ objects, but this would require supporting refs.

I'd flip the question and ask why it was changed to begin - this wasn't a necessary change and broke away from what the other code was doing already - as of now, hiding allocations inside ssz doesn't bring any benefits, while it does add costs - there was code designed specifically to avoid it, so instead of having 2 styles, it seems simpler to have one made out of simpler pieces.

we can change this one back, but ideally we should be moving towards hashedbeaconstate/statedata consistently to decrease the surface of options going around.

Elegance? I wouldn't insist if you feel so strongly about it, but the 3 line code is definitely triggering my OCD tendencies :)

ok - pushed a newClone update which should satisfy the 1-line OCD, keeps the ref allocation visible and separate from SSZ and exploit RVO at the same time - I view the hidden memory allocation as a risk and would prefer that it was removed from SSZ as far as is possible, until there's a motivation that gives a return for that risk - how's that for a compromise?

If yes, I'd go ahead and remove some of the ref changes that were introduced with eth1 to further de-risk SSZ - again, until there's an actual benefit that can be evaluated on its own, and not forced in as part of an unrelated feature.

zah · 2020-04-27T11:02:54Z

beacon_chain/block_pool.nim

-  doAssert stateOpt.isSome, "failed to obtain latest state. database corrupt?"
-  let tmpState = stateOpt.get
+  let tmpState = BeaconStateRef()
+  let stateOpt = db.getState(


Since this usage pattern is repeated few times, maybe a helper like db.getNewState() would be useful.

It can have the added bonus that it avoids the allocation in the "not found" case.

this is init - it crashes if it doesn't find the state - see above, about code paths. that said, it could be rewritten to load the state directly into the result - I'm hesitant to do so though until more of the BeaconChainDB interface has been refactored - in particular, it should save the state and its stateroot index atomically and ditto for loading - that would make init a lot more clear.

I've flagged 4 usages of this potential API. 3 of them are in non-testing code.

the intent is that the BlockPool and BeaconStateDB don't work with BlockState directly because this causes some non-atomic behaviour: BeaconState should always be saved with a state root - I've updated the code to reflect this more clearly, but that refactoring is slightly out of scope here - it would need transaction support in the kv store which is still todo

Added getStateRef to the db tests which simplifies them a bit, but sticks with the same loading as block pool

zah · 2020-04-27T11:14:49Z

ncli/ncli_hash_tree_root.nim

@@ -8,16 +8,17 @@ import
 cli do(kind: string, file: string):

  template printit(t: untyped) {.dirty.} =
-      let v =
-        if cmpIgnoreCase(ext, ".ssz") == 0:
-          SSZ.loadFile(file, t)


I think the code was better before. Why do you insist on avoiding printit(BeaconStateRef)?

same reason as elsewhere: one way of decoding states

zah · 2020-04-27T11:15:10Z

ncli/ncli_pretty.nim

-        else:
-          echo "Unknown file type: ", ext
-          quit 1
+    let v = new t


kills the warning, has the same practical effect

Hmm, you can kill the warnings with printit(NilableBeaconStateRef) too.

zah · 2020-04-27T11:27:50Z

ncli/ncli_transition.nim

    blckX = SSZ.loadFile(blck, SignedBeaconBlock)
    flags = if verifyStateRoot: {skipStateRootValidation} else: {}

-  var stateY = HashedBeaconState(data: stateX, root: hash_tree_root(stateX))
-  if not state_transition(stateY, blckX, flags, noRollback):
+  stateY.data = SSZ.loadFile(pre, BeaconState)


This can be fixed as well. It could be SSZ.loadFile(pre, stateY.data) (var version)

On a second thought, RVO is already doing the var transformation, so the only gain is type inference.

indeed - which is why I think it's sort of not worth it

well - there is one more reason: RVO in nim is buggy and slow - but to write it correctly is hard, so it'll likely be buggy anyway

zah · 2020-04-27T11:29:34Z

nfuzz/libnfuzz.nim

@@ -10,23 +10,23 @@ import

 type
  AttestationInput = object
-    state: BeaconStateRef
+    state: BeaconState


I don't see the benefit of this change. We are just creating more and more types that are unsafe to use as stack variables.

ditto - one code path for loading states to maintain the same behaviour when fuzzing and when running rewinds

the way I view it is that we consistently give the caller the chance to control the allocation instead making that choice for them - in the 90% case, I'd agree that the caller should not be bothered, but in this case we've identified a particular bottleneck that warrants more careful allocation

zah · 2020-04-27T11:34:15Z

tests/test_beacon_chain_db.nim

-      else:
-        # TODO re-check crash here in mainnet
-        true
+      not db.getState(Eth2Digest(), tmpState[], noRollback)


Another usage of db.getNewState

I'd prefer not to add test-only helpers in the the prod code - it seems better to stress the code that's used in real scenarios - is it motivated to add it to testutils? a helper there could call the prod code.

zah · 2020-04-27T12:19:49Z

beacon_chain/beacon_node.nim

-      if state.isSome:
-        return jsonResult(state.get)
+      let tmp = BeaconStateRef() # TODO use tmpState - but load the entire StateData!
+      let state = node.db.getState(root.get, tmp[], noRollback)


Another usage of db.getNewState()

the idea is not to allocate here but rather reuse an existing instance like tmpState - that requires more rewriting though which was planned for a later PR

beacon_chain/block_pool.nim

mratsim · 2020-04-27T12:36:04Z

beacon_chain/beacon_chain_db.nim

+  proc decode(data: openArray[byte]) =
+    try:
+      # TODO can't write to output directly..
+      outputAddr[] = SSZ.decode(data, BeaconState)


Note that if BeaconState becomes a ref, the compiler might move it somewhere else on the heap and invalidate the address.

research/stackSizes.nim

* rolls back some of the ref changes * adds utility to calculate stack sizes * works around bugs in nim exception handling and rvo

fixes #707

arnetheduck force-pushed the stateload branch from 09671fd to 310c304 Compare April 27, 2020 10:39

arnetheduck requested review from tersec and zah April 27, 2020 10:45

arnetheduck force-pushed the stateload branch from 310c304 to c1f38c2 Compare April 27, 2020 10:50

zah reviewed Apr 27, 2020

View reviewed changes

tersec approved these changes Apr 27, 2020

View reviewed changes

zah reviewed Apr 27, 2020

View reviewed changes

beacon_chain/block_pool.nim Outdated Show resolved Hide resolved

tersec mentioned this pull request Apr 27, 2020

remove a pointless hash_tree_root(BeaconState) per node per proposed block #933

Merged

mratsim reviewed Apr 27, 2020

View reviewed changes

arnetheduck added 2 commits April 27, 2020 20:57

avoid memory allocations and copies when loading states

98bb1b1

* rolls back some of the ref changes * adds utility to calculate stack sizes * works around bugs in nim exception handling and rvo

simplify init

b80caf3

fixes #707

arnetheduck force-pushed the stateload branch from 9bf1298 to b80caf3 Compare April 27, 2020 18:57

restore 1-line loading

10830aa

This was referenced Apr 28, 2020

avoid memory allocations and copies when loading states #942

Merged

ssz: move ref support outside #943

Merged

arnetheduck closed this Apr 28, 2020

arnetheduck deleted the stateload branch May 5, 2020 08:35

avoid memory allocations and copies when loading states #937

avoid memory allocations and copies when loading states #937

Conversation

arnetheduck commented Apr 27, 2020

arnetheduck commented Apr 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zah Apr 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zah Apr 27, 2020 •

edited

Loading

arnetheduck Apr 28, 2020 •

edited

Loading