Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of tokens which can be replayed (reparsed) #1620

Merged
merged 3 commits into from
May 12, 2022

Conversation

turbolent
Copy link
Member

Description

The parser may reparse parts of the program in cases of ambiguity.

Limit this to a sensible number of tokens. We may want to adjust this limit.

I validated this limit by parsing all Mainnet contracts and all of them remain parsable.


  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Code follows the standards mentioned here
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

@turbolent turbolent self-assigned this May 4, 2022
@turbolent turbolent requested review from SupunS and dsainati1 as code owners May 4, 2022 20:59
@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #1620 (76e031f) into master (f615bea) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1620      +/-   ##
==========================================
+ Coverage   74.73%   74.77%   +0.04%     
==========================================
  Files         288      288              
  Lines       55340    55408      +68     
==========================================
+ Hits        41357    41433      +76     
+ Misses      12489    12482       -7     
+ Partials     1494     1493       -1     
Flag Coverage Δ
unittests 74.77% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
runtime/parser2/parser.go 90.64% <100.00%> (+0.39%) ⬆️
runtime/parser2/expression.go 92.80% <0.00%> (-0.01%) ⬇️
runtime/interpreter/value.go 63.66% <0.00%> (+0.03%) ⬆️
runtime/interpreter/interpreter_expression.go 84.70% <0.00%> (+0.08%) ⬆️
runtime/runtime.go 86.98% <0.00%> (+0.10%) ⬆️
runtime/interpreter/interpreter.go 88.88% <0.00%> (+0.17%) ⬆️
runtime/parser2/lexer/lexer.go 96.15% <0.00%> (+0.20%) ⬆️
runtime/interpreter/storage.go 72.78% <0.00%> (+1.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f615bea...76e031f. Read the comment docs.

@github-actions
Copy link

github-actions bot commented May 4, 2022

Cadence Benchstat comparison

This branch with compared with the base branch onflow:master commit 909fd42
The command for i in {1..N}; do go test ./... -run=XXX -bench=. -benchmem -shuffle=on; done was used.
Bench tests were run a total of 7 times on each branch.

Results

old.txtnew.txt
time/opdelta
CheckContractInterfaceFungibleTokenConformance-2149µs ± 3%150µs ± 8%~(p=0.534 n=6+7)
ContractInterfaceFungibleToken-244.3µs ± 2%44.2µs ± 5%~(p=0.945 n=6+7)
InterpretRecursionFib-22.87ms ± 7%2.87ms ± 7%~(p=0.902 n=7+7)
NewInterpreter/new_interpreter-21.25µs ± 4%1.32µs ± 3%+5.09%(p=0.005 n=7+6)
NewInterpreter/new_sub-interpreter-22.51µs ± 7%2.51µs ±10%~(p=0.869 n=7+7)
ParseArray-28.34ms ± 8%8.43ms ± 8%~(p=0.535 n=7+7)
ParseDeploy/byte_array-212.6ms ± 7%12.8ms ±11%~(p=0.620 n=7+7)
ParseDeploy/decode_hex-21.30ms ± 6%1.32ms ± 6%~(p=0.456 n=7+7)
ParseFungibleToken-2152µs ± 8%159µs ± 8%~(p=0.097 n=7+7)
ParseInfix-27.31µs ± 2%7.25µs ± 7%~(p=0.657 n=6+7)
QualifiedIdentifierCreation/One_level-22.52ns ± 7%2.89ns ± 9%+14.48%(p=0.001 n=7+7)
QualifiedIdentifierCreation/Three_levels-2156ns ± 6%156ns ± 8%~(p=0.902 n=7+7)
RuntimeFungibleTokenTransfer-21.29ms ±25%1.30ms ±23%~(p=0.902 n=7+7)
RuntimeResourceDictionaryValues-26.76ms ± 7%7.17ms ± 9%+6.15%(p=0.008 n=6+7)
Transfer-291.1ns ± 3%94.0ns ± 7%~(p=0.234 n=6+7)
 
alloc/opdelta
CheckContractInterfaceFungibleTokenConformance-266.3kB ± 0%66.3kB ± 0%~(p=0.972 n=6+7)
ContractInterfaceFungibleToken-226.7kB ± 0%26.7kB ± 0%~(p=1.000 n=7+7)
InterpretRecursionFib-21.14MB ± 0%1.14MB ± 0%~(p=0.538 n=7+6)
NewInterpreter/new_interpreter-2848B ± 0%848B ± 0%~(all equal)
NewInterpreter/new_sub-interpreter-21.34kB ± 0%1.34kB ± 0%~(all equal)
ParseArray-22.94MB ± 2%2.93MB ± 2%~(p=0.805 n=7+7)
ParseDeploy/byte_array-24.38MB ± 0%4.31MB ± 3%~(p=1.000 n=7+7)
ParseDeploy/decode_hex-2213kB ± 0%213kB ± 0%+0.04%(p=0.003 n=7+6)
ParseFungibleToken-236.2kB ± 0%36.3kB ± 0%+0.05%(p=0.023 n=7+7)
ParseInfix-22.10kB ± 0%2.12kB ± 0%+0.77%(p=0.001 n=7+7)
QualifiedIdentifierCreation/One_level-20.00B 0.00B ~(all equal)
QualifiedIdentifierCreation/Three_levels-264.0B ± 0%64.0B ± 0%~(all equal)
RuntimeFungibleTokenTransfer-2234kB ± 0%234kB ± 0%~(p=1.000 n=7+7)
RuntimeResourceDictionaryValues-22.24MB ± 0%2.24MB ± 0%~(p=0.710 n=7+7)
Transfer-248.0B ± 0%48.0B ± 0%~(all equal)
 
allocs/opdelta
CheckContractInterfaceFungibleTokenConformance-21.07k ± 0%1.07k ± 0%~(all equal)
ContractInterfaceFungibleToken-2460 ± 0%460 ± 0%~(all equal)
InterpretRecursionFib-223.8k ± 0%23.8k ± 0%~(all equal)
NewInterpreter/new_interpreter-213.0 ± 0%13.0 ± 0%~(all equal)
NewInterpreter/new_sub-interpreter-240.0 ± 0%40.0 ± 0%~(all equal)
ParseArray-270.0k ± 0%70.0k ± 0%~(p=1.000 n=7+7)
ParseDeploy/byte_array-2105k ± 0%105k ± 0%~(p=0.417 n=7+6)
ParseDeploy/decode_hex-279.0 ± 0%79.0 ± 0%~(all equal)
ParseFungibleToken-21.06k ± 0%1.06k ± 0%~(all equal)
ParseInfix-266.0 ± 0%66.0 ± 0%~(all equal)
QualifiedIdentifierCreation/One_level-20.00 0.00 ~(all equal)
QualifiedIdentifierCreation/Three_levels-22.00 ± 0%2.00 ± 0%~(all equal)
RuntimeFungibleTokenTransfer-24.57k ± 0%4.57k ± 0%~(p=0.735 n=7+7)
RuntimeResourceDictionaryValues-237.6k ± 0%37.6k ± 0%~(p=0.773 n=7+7)
Transfer-21.00 ± 0%1.00 ± 0%~(all equal)
 

@@ -229,13 +231,27 @@ func (p *parser) acceptBuffered() {
}
}

// replayLimit is a sensible limit for how many tokens may be replayed
// while parsing a program
const replayLimit = 2 << 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we come up with this value?

It seems like this is setting a total limit on the amount of backtracking for the whole program. Would it not be better to instead impose a (smaller) limit on the amount of backtracking that can happen at once? I.e. impose a limit for each backtracking cursor on the stack instead of a total limit for the stack as a whole?

This has the benefit of being able to localize errors for users should they encounter them; instead of complaining that the entire program is too ambiguous at some arbitrary location where we pass the total limit, we can error at the actual location that has too much backtracking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I determined it mostly through experimentation, there is no perfect value. We need to find a limit that rejects invalid programs as quickly as possible, while not rejecting valid programs.

There are problems with both local and global limits. With the global limit, the user may encounter the limit in a locally small ambiguity, because the parser already re-parsed too many tokens before. With the local limit, finding a sensible limit is hard: A small value will reject too many programs, a slightly larger value will not reject invalid programs quickly.

Maybe we can have both local and global limits to balance user experience and security.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea: Maybe we can enforce the limit locally to each top-level ambiguity, instead of the whole program? We could reset replayedCount in acceptBuffered

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the token replay limit "local" to each top-level reparse in f2c81d9. Nested replays count towards the top-most buffering, so errors stay local.

Copy link
Contributor

@dsainati1 dsainati1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for keeping the errors local.

@turbolent turbolent force-pushed the bastian/limit-reparsing branch from 2321e92 to 76e031f Compare May 12, 2022 17:55
@turbolent turbolent merged commit e5bc3ad into master May 12, 2022
@turbolent turbolent deleted the bastian/limit-reparsing branch May 12, 2022 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants