Limit number of tokens which can be replayed (reparsed) #1620

turbolent · 2022-05-04T20:59:11Z

Description

The parser may reparse parts of the program in cases of ambiguity.

Limit this to a sensible number of tokens. We may want to adjust this limit.

I validated this limit by parsing all Mainnet contracts and all of them remain parsable.

Targeted PR against master branch
Linked to Github issue with discussion and accepted design OR link to spec that describes this work
Code follows the standards mentioned here
Updated relevant documentation
Re-reviewed Files changed in the Github PR explorer
Added appropriate labels

codecov · 2022-05-04T21:07:49Z

Codecov Report

Merging #1620 (76e031f) into master (f615bea) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1620      +/-   ##
==========================================
+ Coverage   74.73%   74.77%   +0.04%     
==========================================
  Files         288      288              
  Lines       55340    55408      +68     
==========================================
+ Hits        41357    41433      +76     
+ Misses      12489    12482       -7     
+ Partials     1494     1493       -1

Flag	Coverage Δ
unittests	`74.77% <100.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
runtime/parser2/parser.go	`90.64% <100.00%> (+0.39%)`	⬆️
runtime/parser2/expression.go	`92.80% <0.00%> (-0.01%)`	⬇️
runtime/interpreter/value.go	`63.66% <0.00%> (+0.03%)`	⬆️
runtime/interpreter/interpreter_expression.go	`84.70% <0.00%> (+0.08%)`	⬆️
runtime/runtime.go	`86.98% <0.00%> (+0.10%)`	⬆️
runtime/interpreter/interpreter.go	`88.88% <0.00%> (+0.17%)`	⬆️
runtime/parser2/lexer/lexer.go	`96.15% <0.00%> (+0.20%)`	⬆️
runtime/interpreter/storage.go	`72.78% <0.00%> (+1.36%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f615bea...76e031f. Read the comment docs.

github-actions · 2022-05-04T21:13:56Z

Cadence Benchstat comparison

This branch with compared with the base branch onflow:master commit 909fd42
The command for i in {1..N}; do go test ./... -run=XXX -bench=. -benchmem -shuffle=on; done was used.
Bench tests were run a total of 7 times on each branch.

Results

	old.txt	new.txt
	time/op		delta
CheckContractInterfaceFungibleTokenConformance-2	149µs ± 3%	150µs ± 8%	~	(p=0.534 n=6+7)
ContractInterfaceFungibleToken-2	44.3µs ± 2%	44.2µs ± 5%	~	(p=0.945 n=6+7)
InterpretRecursionFib-2	2.87ms ± 7%	2.87ms ± 7%	~	(p=0.902 n=7+7)
NewInterpreter/new_interpreter-2	1.25µs ± 4%	1.32µs ± 3%	+5.09%	(p=0.005 n=7+6)
NewInterpreter/new_sub-interpreter-2	2.51µs ± 7%	2.51µs ±10%	~	(p=0.869 n=7+7)
ParseArray-2	8.34ms ± 8%	8.43ms ± 8%	~	(p=0.535 n=7+7)
ParseDeploy/byte_array-2	12.6ms ± 7%	12.8ms ±11%	~	(p=0.620 n=7+7)
ParseDeploy/decode_hex-2	1.30ms ± 6%	1.32ms ± 6%	~	(p=0.456 n=7+7)
ParseFungibleToken-2	152µs ± 8%	159µs ± 8%	~	(p=0.097 n=7+7)
ParseInfix-2	7.31µs ± 2%	7.25µs ± 7%	~	(p=0.657 n=6+7)
QualifiedIdentifierCreation/One_level-2	2.52ns ± 7%	2.89ns ± 9%	+14.48%	(p=0.001 n=7+7)
QualifiedIdentifierCreation/Three_levels-2	156ns ± 6%	156ns ± 8%	~	(p=0.902 n=7+7)
RuntimeFungibleTokenTransfer-2	1.29ms ±25%	1.30ms ±23%	~	(p=0.902 n=7+7)
RuntimeResourceDictionaryValues-2	6.76ms ± 7%	7.17ms ± 9%	+6.15%	(p=0.008 n=6+7)
Transfer-2	91.1ns ± 3%	94.0ns ± 7%	~	(p=0.234 n=6+7)

	alloc/op		delta
CheckContractInterfaceFungibleTokenConformance-2	66.3kB ± 0%	66.3kB ± 0%	~	(p=0.972 n=6+7)
ContractInterfaceFungibleToken-2	26.7kB ± 0%	26.7kB ± 0%	~	(p=1.000 n=7+7)
InterpretRecursionFib-2	1.14MB ± 0%	1.14MB ± 0%	~	(p=0.538 n=7+6)
NewInterpreter/new_interpreter-2	848B ± 0%	848B ± 0%	~	(all equal)
NewInterpreter/new_sub-interpreter-2	1.34kB ± 0%	1.34kB ± 0%	~	(all equal)
ParseArray-2	2.94MB ± 2%	2.93MB ± 2%	~	(p=0.805 n=7+7)
ParseDeploy/byte_array-2	4.38MB ± 0%	4.31MB ± 3%	~	(p=1.000 n=7+7)
ParseDeploy/decode_hex-2	213kB ± 0%	213kB ± 0%	+0.04%	(p=0.003 n=7+6)
ParseFungibleToken-2	36.2kB ± 0%	36.3kB ± 0%	+0.05%	(p=0.023 n=7+7)
ParseInfix-2	2.10kB ± 0%	2.12kB ± 0%	+0.77%	(p=0.001 n=7+7)
QualifiedIdentifierCreation/One_level-2	0.00B	0.00B	~	(all equal)
QualifiedIdentifierCreation/Three_levels-2	64.0B ± 0%	64.0B ± 0%	~	(all equal)
RuntimeFungibleTokenTransfer-2	234kB ± 0%	234kB ± 0%	~	(p=1.000 n=7+7)
RuntimeResourceDictionaryValues-2	2.24MB ± 0%	2.24MB ± 0%	~	(p=0.710 n=7+7)
Transfer-2	48.0B ± 0%	48.0B ± 0%	~	(all equal)

	allocs/op		delta
CheckContractInterfaceFungibleTokenConformance-2	1.07k ± 0%	1.07k ± 0%	~	(all equal)
ContractInterfaceFungibleToken-2	460 ± 0%	460 ± 0%	~	(all equal)
InterpretRecursionFib-2	23.8k ± 0%	23.8k ± 0%	~	(all equal)
NewInterpreter/new_interpreter-2	13.0 ± 0%	13.0 ± 0%	~	(all equal)
NewInterpreter/new_sub-interpreter-2	40.0 ± 0%	40.0 ± 0%	~	(all equal)
ParseArray-2	70.0k ± 0%	70.0k ± 0%	~	(p=1.000 n=7+7)
ParseDeploy/byte_array-2	105k ± 0%	105k ± 0%	~	(p=0.417 n=7+6)
ParseDeploy/decode_hex-2	79.0 ± 0%	79.0 ± 0%	~	(all equal)
ParseFungibleToken-2	1.06k ± 0%	1.06k ± 0%	~	(all equal)
ParseInfix-2	66.0 ± 0%	66.0 ± 0%	~	(all equal)
QualifiedIdentifierCreation/One_level-2	0.00	0.00	~	(all equal)
QualifiedIdentifierCreation/Three_levels-2	2.00 ± 0%	2.00 ± 0%	~	(all equal)
RuntimeFungibleTokenTransfer-2	4.57k ± 0%	4.57k ± 0%	~	(p=0.735 n=7+7)
RuntimeResourceDictionaryValues-2	37.6k ± 0%	37.6k ± 0%	~	(p=0.773 n=7+7)
Transfer-2	1.00 ± 0%	1.00 ± 0%	~	(all equal)

dsainati1 · 2022-05-05T00:25:15Z

runtime/parser2/parser.go

@@ -229,13 +231,27 @@ func (p *parser) acceptBuffered() {
 	}
 }

+// replayLimit is a sensible limit for how many tokens may be replayed
+// while parsing a program
+const replayLimit = 2 << 12


How did we come up with this value?

It seems like this is setting a total limit on the amount of backtracking for the whole program. Would it not be better to instead impose a (smaller) limit on the amount of backtracking that can happen at once? I.e. impose a limit for each backtracking cursor on the stack instead of a total limit for the stack as a whole?

This has the benefit of being able to localize errors for users should they encounter them; instead of complaining that the entire program is too ambiguous at some arbitrary location where we pass the total limit, we can error at the actual location that has too much backtracking.

I determined it mostly through experimentation, there is no perfect value. We need to find a limit that rejects invalid programs as quickly as possible, while not rejecting valid programs.

There are problems with both local and global limits. With the global limit, the user may encounter the limit in a locally small ambiguity, because the parser already re-parsed too many tokens before. With the local limit, finding a sensible limit is hard: A small value will reject too many programs, a slightly larger value will not reject invalid programs quickly.

Maybe we can have both local and global limits to balance user experience and security.

Another idea: Maybe we can enforce the limit locally to each top-level ambiguity, instead of the whole program? We could reset replayedCount in acceptBuffered

Made the token replay limit "local" to each top-level reparse in f2c81d9. Nested replays count towards the top-most buffering, so errors stay local.

dsainati1

Nice! Thanks for keeping the errors local.

runtime/parser2/parser.go

limit number of tokens which can be replayed (reparsed)

a798b7d

turbolent added the Improvement label May 4, 2022

turbolent self-assigned this May 4, 2022

turbolent requested review from SupunS and dsainati1 as code owners May 4, 2022 20:59

dsainati1 reviewed May 5, 2022

View reviewed changes

make token replay limitation local

f2c81d9

dsainati1 approved these changes May 11, 2022

View reviewed changes

SupunS reviewed May 12, 2022

View reviewed changes

runtime/parser2/parser.go Outdated Show resolved Hide resolved

turbolent commented May 12, 2022

View reviewed changes

runtime/parser2/parser.go Show resolved Hide resolved

explain second check

76e031f

turbolent force-pushed the bastian/limit-reparsing branch from 2321e92 to 76e031f Compare May 12, 2022 17:55

SupunS approved these changes May 12, 2022

View reviewed changes

turbolent merged commit e5bc3ad into master May 12, 2022

turbolent deleted the bastian/limit-reparsing branch May 12, 2022 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit number of tokens which can be replayed (reparsed) #1620

Limit number of tokens which can be replayed (reparsed) #1620

turbolent commented May 4, 2022

codecov bot commented May 4, 2022 •

edited

Loading

github-actions bot commented May 4, 2022 •

edited

Loading

dsainati1 May 5, 2022

turbolent May 10, 2022

turbolent May 10, 2022

turbolent May 11, 2022

dsainati1 left a comment

Limit number of tokens which can be replayed (reparsed) #1620

Limit number of tokens which can be replayed (reparsed) #1620

Conversation

turbolent commented May 4, 2022

Description

codecov bot commented May 4, 2022 • edited Loading

Codecov Report

github-actions bot commented May 4, 2022 • edited Loading

Cadence Benchstat comparison

Results

dsainati1 May 5, 2022

Choose a reason for hiding this comment

turbolent May 10, 2022

Choose a reason for hiding this comment

turbolent May 10, 2022

Choose a reason for hiding this comment

turbolent May 11, 2022

Choose a reason for hiding this comment

dsainati1 left a comment

Choose a reason for hiding this comment

codecov bot commented May 4, 2022 •

edited

Loading

github-actions bot commented May 4, 2022 •

edited

Loading