Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement the file functions from the WDL standard library. #254

Merged
merged 33 commits into from
Nov 19, 2024

Conversation

peterhuene
Copy link
Collaborator

@peterhuene peterhuene commented Nov 18, 2024

This PR implements the following functions from the WDL standard library:

  • basename
  • join_paths
  • glob
  • size
  • stdout
  • stderr
  • read_string
  • read_int
  • read_float
  • read_boolean
  • read_lines
  • write_lines
  • read_tsv
  • write_tsv
  • read_map
  • write_map
  • read_json
  • write_json
  • read_object
  • read_objects
  • write_object
  • write_objects

It also corrects some issues in the definitions of the functions in wdl-analysis and constraining certain generic function parameter types to structs with only primitive members.

The function call implementation in wdl-engine has been refactored so that the current working directory and temp directory and available to the call context; consequently, the function tests in wdl-engine have been refactored to make it easier to create files in a temporary directory.

The function implementations themselves have been refactored into their own modules so as to make the implementations/tests isolated from one another.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these
    changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see
    "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

This commit implements the `basename` function from the WDL standard library.
This commit implements the `join_paths` function from the WDL standard library.

It also exposes some of the well-known types from `STDLIB` in `wdl-analysis`
for use in `wdl-engine`.
This commit implements the `glob` function from the WDL standard library.

It also refactors out the arguments to the function callbacks into a
`CallContext` struct so that the callbacks always receive one argument; the
argument coercion has also been simplified in each function's implementation.
This commit implements the `size` function in the WDL standard library.

Also fixes a bug in the signatures of `size` to allow passing a literal `None`
without an ambiguity.

Also fixes an incorrect panic in `as_string` implementations.
This was a regression when adding support for URLs to the CLI.
This commit implements the `stdout` and `stderr` functions from the WDL
standard library.
This commit implements the `read_string` function from the WDL standard
library.
This commit implements the `read_int` function from the WDL standard library.
This commit implements the `read_float` function from the WDL standard library.
This commit implements the `read_boolean` function from the WDL standard
library.
This commit implements the `read_lines` function from the WDL standard library.
This commit implements the `write_lines` function from the WDL standard
library.

Additionally, this refactors the evaluation tests to make it easier to create
temporary files and use a specified temporary directory for functions that
create files.
This commit implements the `read_tsv` function from the WDL standard library.
This commit refactors the standard library implementation such that each
function's implementation gets its own module.
This commit implements the `write_tsv` function from the WDL standard library.

Additionally, it fixes some overloads of `read_tsv` and `write_tsv` that should
be marked as minimum version 1.2.

The constraints on the file functions that take arrays of structs should also
be further constrained on the requirement that the structs only contain
primitive members.
This commit implements the `read_map` function from the WDL standard library.
This commit implements the `write_map` function from the WDL standard library.
This commit implements the `read_json` function from the WDL standard library.

It also moves `is_ident` into `wdl-grammar`.
This commit implements the `write_json` function from the WDL standard library.

Additionally, it fixes two issues in `wdl-analysis`: the JSON-serializable
constraint should allow any map key type that is coercible to string and object
literal expressions should type check the member expressions of the literal.

This also implements serializing values to JSON.
This commit replaces the implementation of `Value::from_json` with a new
`deserialize` method that will work for any self-describing serde deserializer.

It also makes the API consistent with serialization.
This commit implements the `read_object` function from the WDL standard
library.
This commit implements the `read_objects` function from the WDL standard
library.

It also changes the storage of array values to be simply `Arc<Vec<Value>>`.
This commit implements the `write_object` function from the WDL standard
library.

It also corrects other TSV writing functions to ensure headers and values do
not themselves contain tab characters.
This commit implements the `write_objects` function from the WDL standard
library.

It also refactors writing TSV values into a helper function.
@peterhuene
Copy link
Collaborator Author

The tests have some Windows-path-specific logic to them that I haven't tested on Windows yet, so the Windows CI might fail; I'll investigate if it does.

@peterhuene
Copy link
Collaborator Author

peterhuene commented Nov 18, 2024

So gauntlet is actually missing a diagnostic that should have been previously emitted.

The problematic line is:

disk_size          = size(flatten(hifi_reads), "GB") + 10

Where disk_size is expected to be an Int, but size returns Float and thus the right hand side expression is Float (which doesn't coerce to Int).

This was actually fixed in the upstream repository by wrapping the size call with a ceil call.

Looking into which specific change in this PR caused gauntlet to fail (in a good way).

@peterhuene
Copy link
Collaborator Author

The windows failures are due to path separator issues; I'll get that cleaned up.

@peterhuene
Copy link
Collaborator Author

After determining that none of these changes should have affected the gauntlet diagnostic, I can confirm that I'm seeing the missing diagnostic in main as well. I'm not sure why the diagnostic is missing; I'll push up a refresh of gauntlet.

Additionally, this changes gauntlet's output to not duplicate the duration of
analysis on each file, since it's the same for every file in the repo (all
files are analyzed as one).

The duration is now printed out with the footer.
wdl-engine/src/stdlib/glob.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/join_paths.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/read_boolean.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/read_float.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/read_json.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/read_lines.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/read_map.rs Outdated Show resolved Hide resolved
Remove unnecessary `trim_end_matches` calls.
wdl-engine/src/stdlib/read_tsv.rs Outdated Show resolved Hide resolved
wdl-engine/src/stdlib/read_tsv.rs Show resolved Hide resolved
wdl-engine/src/stdlib/read_tsv.rs Outdated Show resolved Hide resolved
@peterhuene peterhuene changed the title feat: implement the file function from the WDL standard library. feat: implement the file functions from the WDL standard library. Nov 19, 2024
wdl-engine/src/stdlib/size.rs Show resolved Hide resolved
wdl-engine/src/stdlib/read_tsv.rs Show resolved Hide resolved
wdl-engine/src/stdlib/size.rs Show resolved Hide resolved
Copy link
Member

@a-frantz a-frantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've addressed everything I pointed out, so LGTM! Making good progress here 🚂

@peterhuene peterhuene merged commit 37f2a42 into stjude-rust-labs:main Nov 19, 2024
16 checks passed
@peterhuene peterhuene deleted the stdlib branch November 19, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants