`infer_language_version` api #1265

mjoerussell · 2025-03-04T23:37:52Z

Added a new API which generates a list of valid language versions for a Solidity source file by parsing the pragma solidity declarations.

Here's a basic example of how this API might be used:

let solidity_source = get_source();

let valid_versions = Parser::infer_language_version(&solidity_source);

// Use the latest support version to parse the file
let parser = Parser::create(valid_versions.last());

As stated in the Solidity documentation, the semver parsing is based on npm's semver implementation. However, solc's implementation has had several bugs/undocumented features currently and in the past which we need to support.

Handling quotes within semvers (e.g. "0".8"" == "0.8")
Users can add operators to semvers inside hyphen ranges. These get ignored.
Users can add operators to partial version ranges. These are handled differently depending on the operator and how the partial version range was defined.
Parsing is overall a lot more permissive than a standard semver parser, and we need to be able to skip over some invalid inputs.

I've added 30 unit tests to the parser, each of which tests various positive and negative cases for different types of semvers. I've also included some of solc's test cases from here to ensure that we're passing/failing the same matches that they are.

This parser is trying to stick to npm's flavor of semver, which Solidity also uses for its version pragma

…g common syntax forms, and they are all passing. The biggest thing that's missing right now is handling Solidity's specific semver quirks and edge cases, which are definitely not handled.

…nts of a Solidity source file as input and shallowly parses it to pull out any version pragmas. It thencompares all of the pragmas to the full list of available versions and returns a list of all the versions which pass.

…urce files against the actual version used to compile them according to Sanctuary. I'm marking two kinds of version errors here: one for when we can't infer a version from a source file, and one for when the actual version is not included in the inferred versions list.

…d of Sanctuary testing. At first this was extacting the version pragmas by just going line-by-line and finding lines that start with `pragma solidity`. This is obviously an overly simplistic way to do this. One glaring issue here is that if a source file has a version pragma inside a block comment, then that would still be included in the list of pragmas that must be matched. This update uses slang's parser instead to extract all of the real version pragmas, and then the custom semver parser to do the rest. After making this change we have 0 version inference errors from Sanctuary.

…ators (`^` and `~`) in hyphen ranges. These two things don't make sense to be included together, and most likely a correct parser would fail to parse this range. However, Solidity does/has just ignored these operators in the case of a hyphen range, so we will do that too.

I copied the semver parsing/matching test cases from solc and recreated them here. From there I fixed any weird behaviors that weren't already handled. Most importantly, solc allows users to specify operators in front of partial version ranges, which leads to some unintuitive behavior.

…rly in some circumstances * Refactored the parser code to move the types and base implementations to `semver/mod.rs`, and all the parsing code in `semver/parser.rs`.

…ype of thing is getting parsed. Also trying to simplify/consolidate some of the higher-level parsing bits.

changeset-bot · 2025-03-04T23:37:55Z

⚠️ No Changeset found

Latest commit: 0d24a25

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

…otes are included

ggiraldez

Left a couple of suggestions, but the implementation looks good in general. I have two questions though:

Re: the internal version types, wouldn't it be possible to reuse the semver crate types? It's not too much code, but it seems redundant. What is preventing you from reusing them?
The parsing code also somewhat duplicates the main language parser, and IIUC, there are some cases accepted by this new parser that the language will not parse. Maybe we should not try to parse versions from the language definition at all. Or, on the opposite side, try to build the version ranges from the parsed CST nodes. WDYT?

ggiraldez · 2025-03-05T15:33:15Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/mod.rs

+use metaslang_cst::text_index::TextIndex;
+
+use crate::cst::NonterminalKind;
+use crate::generated::utils::LanguageFacts;


This should probably read crate::utils::LanguageFacts. The generated module is an implementation detail which should be hidden by the #[path] attribute in the utils module.

ggiraldez · 2025-03-05T15:35:45Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/mod.rs

+/// versions that can fulfill those requirements.
+pub fn infer_language_version(src: &str) -> Vec<semver::Version> {
+    let parser = crate::parser::Parser::create(LanguageFacts::LATEST_VERSION).unwrap();
+    let output = parser.parse_nonterminal(NonterminalKind::SourceUnit, src);


Maybe better to use parse_file_contents instead.

ggiraldez · 2025-03-05T15:37:46Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/mod.rs

+    let parser = crate::parser::Parser::create(LanguageFacts::LATEST_VERSION).unwrap();
+    let output = parser.parse_nonterminal(NonterminalKind::SourceUnit, src);
+
+    let mut cursor = output.tree.create_cursor(TextIndex::ZERO);


Maybe better to use output.create_tree_cursor()

ggiraldez · 2025-03-05T16:36:10Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/parser.rs

+    let examples = [
+        "\"0.8\"",
+        "\"0.\" 8",
+        "\"0\" .8",
+        "0  . \"8\"",
+        "0 '.' 8",
+        "'0.8'",
+        "\"0\".\"8\"",
+    ];


I don't think these are accepted by the language parser. Should we try to fix the language definition, or maybe just drop any attempts to parse the pragma version there? (cc @OmarTawfik)

This is accepted by solc: ethereum/solidity#14826

ggiraldez · 2025-03-05T16:38:31Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/parser.rs

+fn empty_version() {
+    let range = parse("");
+
+    println!("Range: {range}");


Left-over debugging print?

ggiraldez · 2025-03-05T16:44:09Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/parser.rs

+fn major_wildcard_concat() {
+    let range = parse("x.1.0 >0.5.0");
+
+    println!("Range: {range}");


Print debugging left over?

ggiraldez · 2025-03-05T16:45:21Z

crates/solidity/outputs/cargo/crate/src/extensions/semver/parser.rs

+            // Version::new(0, 0, 0),
+            // Version::new(0, 1, 0),


Should these match? It's unclear what this test's range covers.

These should match, I had commented them out while debugging something and forgot to turn them back on.

ggiraldez · 2025-03-05T16:51:46Z

crates/solidity/testing/sanctuary/src/main.rs

@@ -48,6 +48,9 @@ struct TestCommand {
    /// Enables checking bindings for each contract, failing if any symbol cannot be resolved.
    #[arg(long, default_value_t = false)]
    check_bindings: bool,
+
+    #[arg(long, default_value_t = false)]
+    check_infer_version: bool,


You may want to add an input parameter to the GitHub workflow as well.

ggiraldez · 2025-03-05T16:53:25Z

crates/solidity/testing/sanctuary/src/results.rs

+    pub no_version: u64,
+    pub wrong_version: u64,


Not sure it's worth it to tally these failures separately. Maybe adding them to failed should be good enough?

ggiraldez · 2025-03-05T16:54:58Z

crates/solidity/testing/sanctuary/src/tests.rs

+            return Ok(());
+        } else if !inferred_versions.contains(&version) {
+            events.version_inference_error(format!(
+                "[{version}] Did not find correct version for {path}",


Suggested change

"[{version}] Did not find correct version for {path}",

"[{version}] Inferred versions do not match with reported for {path}",

* Add `has_known_version_mismatch` check to the sanctuary tests so that we skip files where the version in sanctuary is known to not match the version pragmas

… our purposes they're interchangable, all of the custom semver parsing/handling happens in `PartialVersion`, `Comparator`, and `ComparatorSet` - once we reduce the versions to concrete versions then we don't need anything special. This results in a bit less custom code.

github-actions added 13 commits February 26, 2025 11:28

Initial version of semver parsing.

e54b782

This parser is trying to stick to npm's flavor of semver, which Solidity also uses for its version pragma

Semver parsing is basically working now. I have ~10 test cases testin…

5f36dc1

…g common syntax forms, and they are all passing. The biggest thing that's missing right now is handling Solidity's specific semver quirks and edge cases, which are definitely not handled.

Refactoring/cleaning up parsing code, adding some documentation

f9b46a0

* Fixed an issue where parse errors could not be recovered from prope…

ce37974

…rly in some circumstances * Refactored the parser code to move the types and base implementations to `semver/mod.rs`, and all the parsing code in `semver/parser.rs`.

Run infra ci/formatter

ae1386d

Sort found language versions before returning them

0ae9872

Add some more test cases for the semver parser

dc333c0

Refactoring parse code to handle errors the same way no matter what t…

ddc3341

…ype of thing is getting parsed. Also trying to simplify/consolidate some of the higher-level parsing bits.

mjoerussell requested review from a team as code owners March 4, 2025 23:37

stefanoban requested review from OmarTawfik and ggiraldez March 5, 2025 12:21

Small refactor of is_whitespace and is_non_whitespace, explain why qu…

f46a73d

…otes are included

ggiraldez reviewed Mar 5, 2025

View reviewed changes

github-actions added 4 commits March 5, 2025 11:37

Cleanup code in infer_language_version

1632f7c

Move test code to a separate file

ca2d28e

* Add "check_infer_version" option to the sanctuary github workflows

282cf1d

* Add `has_known_version_mismatch` check to the sanctuary tests so that we skip files where the version in sanctuary is known to not match the version pragmas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`infer_language_version` api #1265

`infer_language_version` api #1265

mjoerussell commented Mar 4, 2025

changeset-bot bot commented Mar 4, 2025 •

edited

Loading

ggiraldez left a comment

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

mjoerussell Mar 5, 2025

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

mjoerussell Mar 5, 2025

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

ggiraldez Mar 5, 2025

	"[{version}] Did not find correct version for {path}",
	"[{version}] Inferred versions do not match with reported for {path}",

infer_language_version api #1265

Are you sure you want to change the base?

infer_language_version api #1265

Conversation

mjoerussell commented Mar 4, 2025

changeset-bot bot commented Mar 4, 2025 • edited Loading

⚠️ No Changeset found

ggiraldez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`infer_language_version` api #1265

`infer_language_version` api #1265

changeset-bot bot commented Mar 4, 2025 •

edited

Loading