Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infer_language_version api #1265

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

mjoerussell
Copy link
Contributor

Added a new API which generates a list of valid language versions for a Solidity source file by parsing the pragma solidity declarations.

Here's a basic example of how this API might be used:

let solidity_source = get_source();

let valid_versions = Parser::infer_language_version(&solidity_source);

// Use the latest support version to parse the file
let parser = Parser::create(valid_versions.last());

As stated in the Solidity documentation, the semver parsing is based on npm's semver implementation. However, solc's implementation has had several bugs/undocumented features currently and in the past which we need to support.

  • Handling quotes within semvers (e.g. "0".8"" == "0.8")
  • Users can add operators to semvers inside hyphen ranges. These get ignored.
  • Users can add operators to partial version ranges. These are handled differently depending on the operator and how the partial version range was defined.
  • Parsing is overall a lot more permissive than a standard semver parser, and we need to be able to skip over some invalid inputs.

I've added 30 unit tests to the parser, each of which tests various positive and negative cases for different types of semvers. I've also included some of solc's test cases from here to ensure that we're passing/failing the same matches that they are.

github-actions added 13 commits February 26, 2025 11:28
This parser is trying to stick to npm's flavor of semver, which Solidity also uses for its version pragma
…g common syntax forms, and they are all passing. The biggest thing that's missing right now is handling Solidity's specific semver quirks and edge cases, which are definitely not handled.
…nts of a Solidity source file as input and shallowly parses it to pull out any version pragmas. It thencompares all of the pragmas to the full list of available versions and returns a list of all the versions which pass.
…urce files against the actual version used to compile them according to Sanctuary. I'm marking two kinds of version errors here: one for when we can't infer a version from a source file, and one for when the actual version is not included in the inferred versions list.
…d of Sanctuary testing.

At first this was extacting the version pragmas by just going line-by-line and finding lines that start with `pragma solidity`. This is obviously an overly simplistic way to do this. One glaring issue here is that if a source file has a version pragma inside a block comment, then that would still be included in the list of pragmas that must be matched.

This update uses slang's parser instead to extract all of the real version pragmas, and then the custom semver parser to do the rest. After making this change we have 0 version inference errors from Sanctuary.
…ators (`^` and `~`) in hyphen ranges. These two things don't make sense to be included together, and most likely a correct parser would fail to parse this range. However, Solidity does/has just ignored these operators in the case of a hyphen range, so we will do that too.
I copied the semver parsing/matching test cases from solc and recreated them here. From there I fixed any weird behaviors that weren't already handled. Most importantly, solc allows users to specify operators in front of partial version ranges, which leads to some unintuitive behavior.
…rly in some circumstances

* Refactored the parser code to move the types and base implementations to `semver/mod.rs`, and all the parsing code in `semver/parser.rs`.
…ype of thing is getting parsed. Also trying to simplify/consolidate some of the higher-level parsing bits.
@mjoerussell mjoerussell requested review from a team as code owners March 4, 2025 23:37
Copy link

changeset-bot bot commented Mar 4, 2025

⚠️ No Changeset found

Latest commit: 0d24a25

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@ggiraldez ggiraldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of suggestions, but the implementation looks good in general. I have two questions though:

  1. Re: the internal version types, wouldn't it be possible to reuse the semver crate types? It's not too much code, but it seems redundant. What is preventing you from reusing them?
  2. The parsing code also somewhat duplicates the main language parser, and IIUC, there are some cases accepted by this new parser that the language will not parse. Maybe we should not try to parse versions from the language definition at all. Or, on the opposite side, try to build the version ranges from the parsed CST nodes. WDYT?

use metaslang_cst::text_index::TextIndex;

use crate::cst::NonterminalKind;
use crate::generated::utils::LanguageFacts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably read crate::utils::LanguageFacts. The generated module is an implementation detail which should be hidden by the #[path] attribute in the utils module.

/// versions that can fulfill those requirements.
pub fn infer_language_version(src: &str) -> Vec<semver::Version> {
let parser = crate::parser::Parser::create(LanguageFacts::LATEST_VERSION).unwrap();
let output = parser.parse_nonterminal(NonterminalKind::SourceUnit, src);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to use parse_file_contents instead.

let parser = crate::parser::Parser::create(LanguageFacts::LATEST_VERSION).unwrap();
let output = parser.parse_nonterminal(NonterminalKind::SourceUnit, src);

let mut cursor = output.tree.create_cursor(TextIndex::ZERO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to use output.create_tree_cursor()

Comment on lines 814 to 822
let examples = [
"\"0.8\"",
"\"0.\" 8",
"\"0\" .8",
"0 . \"8\"",
"0 '.' 8",
"'0.8'",
"\"0\".\"8\"",
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are accepted by the language parser. Should we try to fix the language definition, or maybe just drop any attempts to parse the pragma version there? (cc @OmarTawfik)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is accepted by solc: ethereum/solidity#14826

fn empty_version() {
let range = parse("");

println!("Range: {range}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left-over debugging print?

fn major_wildcard_concat() {
let range = parse("x.1.0 >0.5.0");

println!("Range: {range}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print debugging left over?

Comment on lines 857 to 858
// Version::new(0, 0, 0),
// Version::new(0, 1, 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these match? It's unclear what this test's range covers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should match, I had commented them out while debugging something and forgot to turn them back on.

@@ -48,6 +48,9 @@ struct TestCommand {
/// Enables checking bindings for each contract, failing if any symbol cannot be resolved.
#[arg(long, default_value_t = false)]
check_bindings: bool,

#[arg(long, default_value_t = false)]
check_infer_version: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to add an input parameter to the GitHub workflow as well.

Comment on lines +15 to +16
pub no_version: u64,
pub wrong_version: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth it to tally these failures separately. Maybe adding them to failed should be good enough?

return Ok(());
} else if !inferred_versions.contains(&version) {
events.version_inference_error(format!(
"[{version}] Did not find correct version for {path}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"[{version}] Did not find correct version for {path}",
"[{version}] Inferred versions do not match with reported for {path}",

github-actions added 4 commits March 5, 2025 11:37
* Add `has_known_version_mismatch` check to the sanctuary tests so that we skip files where the version in sanctuary is known to not match the version pragmas
… our purposes they're interchangable, all of the custom semver parsing/handling happens in `PartialVersion`, `Comparator`, and `ComparatorSet` - once we reduce the versions to concrete versions then we don't need anything special. This results in a bit less custom code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants