-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
infer_language_version
api
#1265
base: main
Are you sure you want to change the base?
Conversation
This parser is trying to stick to npm's flavor of semver, which Solidity also uses for its version pragma
…g common syntax forms, and they are all passing. The biggest thing that's missing right now is handling Solidity's specific semver quirks and edge cases, which are definitely not handled.
…nts of a Solidity source file as input and shallowly parses it to pull out any version pragmas. It thencompares all of the pragmas to the full list of available versions and returns a list of all the versions which pass.
…urce files against the actual version used to compile them according to Sanctuary. I'm marking two kinds of version errors here: one for when we can't infer a version from a source file, and one for when the actual version is not included in the inferred versions list.
…d of Sanctuary testing. At first this was extacting the version pragmas by just going line-by-line and finding lines that start with `pragma solidity`. This is obviously an overly simplistic way to do this. One glaring issue here is that if a source file has a version pragma inside a block comment, then that would still be included in the list of pragmas that must be matched. This update uses slang's parser instead to extract all of the real version pragmas, and then the custom semver parser to do the rest. After making this change we have 0 version inference errors from Sanctuary.
…ators (`^` and `~`) in hyphen ranges. These two things don't make sense to be included together, and most likely a correct parser would fail to parse this range. However, Solidity does/has just ignored these operators in the case of a hyphen range, so we will do that too.
I copied the semver parsing/matching test cases from solc and recreated them here. From there I fixed any weird behaviors that weren't already handled. Most importantly, solc allows users to specify operators in front of partial version ranges, which leads to some unintuitive behavior.
…rly in some circumstances * Refactored the parser code to move the types and base implementations to `semver/mod.rs`, and all the parsing code in `semver/parser.rs`.
…ype of thing is getting parsed. Also trying to simplify/consolidate some of the higher-level parsing bits.
|
…otes are included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of suggestions, but the implementation looks good in general. I have two questions though:
- Re: the internal version types, wouldn't it be possible to reuse the semver crate types? It's not too much code, but it seems redundant. What is preventing you from reusing them?
- The parsing code also somewhat duplicates the main language parser, and IIUC, there are some cases accepted by this new parser that the language will not parse. Maybe we should not try to parse versions from the language definition at all. Or, on the opposite side, try to build the version ranges from the parsed CST nodes. WDYT?
use metaslang_cst::text_index::TextIndex; | ||
|
||
use crate::cst::NonterminalKind; | ||
use crate::generated::utils::LanguageFacts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably read crate::utils::LanguageFacts
. The generated
module is an implementation detail which should be hidden by the #[path]
attribute in the utils
module.
/// versions that can fulfill those requirements. | ||
pub fn infer_language_version(src: &str) -> Vec<semver::Version> { | ||
let parser = crate::parser::Parser::create(LanguageFacts::LATEST_VERSION).unwrap(); | ||
let output = parser.parse_nonterminal(NonterminalKind::SourceUnit, src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to use parse_file_contents
instead.
let parser = crate::parser::Parser::create(LanguageFacts::LATEST_VERSION).unwrap(); | ||
let output = parser.parse_nonterminal(NonterminalKind::SourceUnit, src); | ||
|
||
let mut cursor = output.tree.create_cursor(TextIndex::ZERO); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to use output.create_tree_cursor()
let examples = [ | ||
"\"0.8\"", | ||
"\"0.\" 8", | ||
"\"0\" .8", | ||
"0 . \"8\"", | ||
"0 '.' 8", | ||
"'0.8'", | ||
"\"0\".\"8\"", | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are accepted by the language parser. Should we try to fix the language definition, or maybe just drop any attempts to parse the pragma version there? (cc @OmarTawfik)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is accepted by solc: ethereum/solidity#14826
fn empty_version() { | ||
let range = parse(""); | ||
|
||
println!("Range: {range}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left-over debugging print?
fn major_wildcard_concat() { | ||
let range = parse("x.1.0 >0.5.0"); | ||
|
||
println!("Range: {range}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print debugging left over?
// Version::new(0, 0, 0), | ||
// Version::new(0, 1, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these match? It's unclear what this test's range covers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should match, I had commented them out while debugging something and forgot to turn them back on.
@@ -48,6 +48,9 @@ struct TestCommand { | |||
/// Enables checking bindings for each contract, failing if any symbol cannot be resolved. | |||
#[arg(long, default_value_t = false)] | |||
check_bindings: bool, | |||
|
|||
#[arg(long, default_value_t = false)] | |||
check_infer_version: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to add an input parameter to the GitHub workflow as well.
pub no_version: u64, | ||
pub wrong_version: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's worth it to tally these failures separately. Maybe adding them to failed
should be good enough?
return Ok(()); | ||
} else if !inferred_versions.contains(&version) { | ||
events.version_inference_error(format!( | ||
"[{version}] Did not find correct version for {path}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"[{version}] Did not find correct version for {path}", | |
"[{version}] Inferred versions do not match with reported for {path}", |
* Add `has_known_version_mismatch` check to the sanctuary tests so that we skip files where the version in sanctuary is known to not match the version pragmas
… our purposes they're interchangable, all of the custom semver parsing/handling happens in `PartialVersion`, `Comparator`, and `ComparatorSet` - once we reduce the versions to concrete versions then we don't need anything special. This results in a bit less custom code.
Added a new API which generates a list of valid language versions for a Solidity source file by parsing the
pragma solidity
declarations.Here's a basic example of how this API might be used:
As stated in the Solidity documentation, the semver parsing is based on npm's semver implementation. However, solc's implementation has had several bugs/undocumented features currently and in the past which we need to support.
I've added 30 unit tests to the parser, each of which tests various positive and negative cases for different types of semvers. I've also included some of solc's test cases from here to ensure that we're passing/failing the same matches that they are.