-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modelling for Test Assertions #18787
Conversation
8fc578f
to
347639a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool.
return v | ||
|
||
|
||
has_line_line_description = """The full line of text to search for in the output.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intention of defining the description text separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the generated code would read more clean if the potentially large strings were defined on their own. Probably a pretty arbitrary choice either way.
|
||
has_line_max_description = """Maximum number (default: infinity), can be suffixed by ``(k|M|G|T|P|E)i?``""" | ||
|
||
has_line_negate_description = """A boolean that can be set to true to negate the outcome of the assertion.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description should be the same for all assertions that implement negate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs have a single source of truth in the parameters types - I don't think it is worth while to try to de-duplicate the strings here. Does that make sense?
<xs:group ref="TestAssertionsJson" minOccurs="0" maxOccurs="unbounded"/> | ||
<xs:group ref="TestAssertionsH5" minOccurs="0" maxOccurs="unbounded"/> | ||
<xs:group ref="TestAssertionsImage" minOccurs="0" maxOccurs="unbounded"/> | ||
<xs:group ref="TestAssertion"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to ensure that this part is not edited?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can regenerated these with some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we could put the definitions in their own file. That would probably be a better design in the abstract.
Edit: I don't know what ramifications this would have - it is nice to have a single URL to download for tooling. I'll add the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've regenerated the docs with a lot more warnings about things being auto-generated and not to modify them. I've also rearranged things so the XSD has comments about where modules are defined so the docs can be updated.
7032070
to
b93e8a9
Compare
2ca12f9
to
b7e9957
Compare
b7e9957
to
90d682f
Compare
Summary
The source of truth for the documentation around test assertions from the Galaxy Tool XSD file (including semantically important things like are parameters required and their xsd datatype) has been pulled out of the XSD and put into the Python modules via docstrings and
Annotated
parameters (stack overflow on how to use Annotated). This now centralized documentation and typing information is used to generate Pydantic models for assertion lists. The JSON versions of assertion lists is used by the Planemo test format (https://planemo.readthedocs.io/en/latest/test_format.html) and in experimental YAML tool definitions. Validating these is an important part of a hardened workflow tool chain.Why not XSD?
Out-of-sync
I've always been a little uncomfortable with the assertion documentation in
galaxy.xsd
. The plugins were meant to be defined as isolated Python modules. Spreading out the documentation in this other format is a bit problematic because the two sources of truth can easily become out of sync - and indeed had in some ways (see bug fixes below).Developer Familiarity
While I do think XSD is very clever and kind of wonderful to develop against - it is a bit of a dying format and is almost certainly less familiar to nearly anyone interested in developing test assertions than Python. I hope centralizing these things makes it easier for new developers to more rapidly develop and document test assertions. This point pairs well with the new test cases that make it really easy to test assertion parsing. The developer experience should be a lot better despite the assertions themselves being more generically useful.
Not Intrinsically Tied to XML
These test assertions are very readily ported to JSON and useful there. Indeed, the Pydantic models provide stronger validation and typing. For this reason, I think placing the "source of truth" documentation in XSD is a bit less than ideal.
But...
I am very confident this switch is the right choice, but I will admit there is a "but". We are losing some minor things by not using hand crafted XSD. The structured of the XSD itself and how XSD abstractions are used is a bit less than with the auto-generated code. The file is a bit bigger now and a bit more redundant. This is the nature of auto-generated code and I think is fine - the benefits of centralization described above are worth the cost and then some.
Typing Improvements
In addition to improving the documentation of the Python code, I also made a pass at making all the types more specific where it makes sense. For instance, various image assertion XSD attributes could have used the type
xs:nonNegativeInteger
and were instead usingxs:integer
. I also added validators to get similar specificity out of Pydantic. In some places, I was able to go even farther with Pydantic. One instance is thecenter_of_mass
parameter that has the format<float>, <float>
. It is trivial to validate this format in Pydantic but would likely be more difficult to validate in XSD. Additionally, the Pydantic version of validation compiles all embedded regular expressions to ensure they are correct. Despite the differences in validation quality between XML and JSON - all the declaration of typing used to do this is centralized in the module definitions themselves. Maintaining the single source of truth I would hope for from a "plugin".Tests & Bug Fixes
I've added a file that has positive and negative validation test cases for I think every assertion test case in both JSON (for the generated pydantic) and XML (for the generated XSD). These test cases are really trivial to write - one just needs to add new snippets to either positive or negative validation lists.
These test cases found a few issues around XSD. Recursive XML testing (from asserts/xml.py ) definitions were documented in the XSD and would work in tools but I think would not validate before. Only the archive assertion was setup correctly in the XSD for these recursive assertions.
Linting the XSD
In order to do automatic code generation for the XSD well, I've auto formatted it with
xmllint
. The project is various familiar with code formatting and the advantages it provides for Python and Typescript - I just wanted to note that the XSD is now being formatted in the some way. TheMakefile
target I added isformat-xsd
Applications
I've started work on Pydnatic models for the Planemo test format (as an alternative approach to galaxyproject/planemo#1417) at https://github.com/jmchilton/galaxy/pull/new/test_format. This PR doesn't include that work but it provides the infrastructure to do the hardest parts and ensure the test format stays in line and on par with the tool XML assertions over the long term. I think the validation that we do using Pydantic goes beyond what is done for XSD so I think the validation available via this approach will be stronger that that going down the XSD -> jsonschema route. JSON schema for these models can still be generated from the Pydantic using the following command:
Additionally, there are bits and pieces of progress toward dynamic tools in Galaxy (for workflows) and YAML tools. For the dynamic tools - we probably want to use validated YAML - so I think these models will be very important to ensure those things are hardened and can be maintained long term.
Resyncing
The XSD and Pydantic can be resync-d against the Python implementation of the assert plugins using.
These validation artifacts can be validated against the new tests using the pytest command:
How to test the changes?
(Select all options that apply)
License