Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test metadata #38

Merged
merged 19 commits into from
Dec 16, 2020
Merged

Test metadata #38

merged 19 commits into from
Dec 16, 2020

Conversation

simleo
Copy link
Collaborator

@simleo simleo commented Dec 7, 2020

This PR adds support for a test-oriented extension of Workflow RO-Crates called Workflow Testing RO-Crate, based on the "test" namespace recently added to RO-Terms.

It includes:

  • new model classes for the test-specific entities
  • a new test crate with workflow tests and new unit tests to check the model extension on it
  • an example that shows how to use the library to read test metadata and drive a simple test execution

This PR is meant to complement the new Workflow Testing RO-Crate spec with a concrete implementation and runnable tests and examples. Feedback on both this PR and the spec would be very useful. I've opened an issue on the Life Monitor repo for Feedback on the spec.

@simleo simleo requested a review from stain December 7, 2020 16:29
@simleo simleo marked this pull request as ready for review December 7, 2020 16:29
Copy link
Contributor

@stain stain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(summarizing discussion on Slack)

Agree on this approach and on https://github.com/crs4/life_monitor/wiki/Workflow-Testing-RO-Crate

My main two requests:

  • #planemo in one RO-Crate is not the same as in another crate, where they may have different versions etc. Yet the test engine has to somehow recognize that it is the "Planemo binding code" it needs to run, from some hardcoding. Would we also need a @type in the RO Terms with a corresponding "Supported test engines" table in the wiki page, say https://w3id.org/ro/terms/test#PlanemoEngine ?

  • Use of hasPart with a contextual non-File entity in a Dataset (#hash identifier) can be confusing as people would expect files. It is not breaking RO-Crate spec as such - just confusing. Perhaps use mentions at root Dataset level and/or about at test/ Dataset.

Also:
If there are multiple tests/workflows - how would the engine pick the correct workflow file? Add another mainEntity link from the TestSuite so we know the whole suite is about that particular workflow. (or can a suite cover multiple workflows? Move to TestInstance)

@simleo
Copy link
Collaborator Author

simleo commented Dec 9, 2020

Thanks for the feedback @stain. I've added 4113169 and da0d78f to address the second and third change you requested. The first one is more complicated.

The idea here was that the higher application layer (e.g., Life Monitor) would know what to do with the test definition based on the engine's @id and version (in this case: need to pip install 'planemo>=0.70' and call some planemo-specific library code to run the test). This was inspired by the "Supported Workflow Types" section of the Workflow RO-Crate spec, which basically tells the crate creator to add a specific pre-built ComputerLanguage entity to the crate to be chosen among the supported ones. In the test engine case, only the version should be variable, and the check would simply be something like engine.id == '#planemo'. The fact that the entity might have a different version in another crate should not be a problem. However, there might be a problem if the crate creator wanted to list different suites that run with different planemo versions in the same crate. I think this could be considered a bit of a corner case, but somebody might want this feature. This could be solved by adding a PlanemoEngine class as you suggested (or by having all ids for planemo engines start with '#planemo', though that starts to feel a bit hacky). The same could be done to characterize test service types.

Since the above change requires a pass through RO-Terms, if you are ok with the other changes I made, I propose you merge this PR now, then we can sort out the other bit separately.

@simleo
Copy link
Collaborator Author

simleo commented Dec 10, 2020

Update

I pushed a new commit (99b8198) where I use two new properties, testEngineType and testServiceType to characterize a test engine or service independently of their @id. The new properties are similar to the several *Type properties already present in schema.org, such as engineType. The upper layer now has to check for the value of these properties in order to find out how to handle the rest of the metadata. For instance, in the example included with this PR, the check changed as follows:

-    assert suite.definition.engine.id == "#planemo"
+    assert suite.definition.engine.testEngineType == "planemo"

In this way there's no need to create a separate type for each supported service or engine, so the test namespace does not have to be updated every time a new supported service or engine is added. The spec will just have to list the supported types, specifying the appropriate values for the above properties.

I had to add a new TestEngine entity type since testEngineType is not a standard schema.org property allowed for SoftwareApplication.

@stain please let me know if this looks good to you now. If so, we can merge the PR and I can update the test namespace in ro-terms to reflect these changes.

@stain
Copy link
Contributor

stain commented Dec 14, 2020

Well, I would prefer to have global identifiers for identifiers.. by having your own property to shortcodes you are delegating to another non-URI registry (the wiki page?) or it has to be explicitly freehand, e.g. permit "Galaxy", "galaxy" and "Galaxy Project". It's good to be clear which of these two routes it is.

http://schema.org/engineType for instance seems to permit both "petrol" and "gasoline" and "1580cc" - it is purely a text field, although it can also go to controlled vocabularies as well as to somewhat ordered QualitativeValue. Flexible but not predictable!

So in that sense the new suggestion achieves less than the old #planemo approach that at least let us expand with a {@id: #planemo}' node as a SoftwareApplicationthat I quite liked. It is not possible to expand/describe"planemo"the string (presumably you don't mean./planemoin the current directory and we can't add@vocabtricks without changing the@context) - but now at the cost of still needing an intermediate contextual entity that has that testEngineTypeproperty, and so in RO-Crate that would still need an@id`.

I can see the reason to use testEngineType as a property rather than a subclass @type - because it's something you use when running, rather than just mark/bring in particular metadata properties - it is data not metadata. But if it's "free for all" then you risk two competing users deciding what "shell" should mean without updating the wiki, one of them assuming "sh" on Linux and the other assuming CMD.EXE on Windows. This seem to be what you are asking for when you want to not want to update any registry.

We tried to make RO Terms be a CSV file so it is easy to update.. If it's not suitable as a registry, then we perhaps need an even easier approach, like a GitHub wiki page with #subheadings?

So how about:

{
    "@id": "test/test1/my-test.yml",
    "@type": [
        "File",
        "TestDefinition"
    ],
    "conformsTo": {"@id": "#planemo"}
},
{ "@id": "#planemo",
  "@type": "SoftwareApplication",
  "testEngineType": {"@id": "https://w3id.org/ro/terms/engine#planemo"},
  "name": "Planemo",
  "url": {"@id": "https://github.com/galaxyproject/planemo"},
  "version": ">=0.70"
}

and then that permalink takes you to a GitHub wiki page which just lists each engine like you already suggested. Note that there are no pull requests on GitHub wikis, if you permit a group (e.g. public) editing it will just be editable. It is revision tracked though.

(Why is testEngineType @id here a full URI? Well, the @context terms are only expanded in properties and under @type, so while @type: shell would have worked by adding shell to the @context, here we need full URI unless we also used @vocab in the @context definition of testEngineType)

From this I wonder why we need #planemo and can just as easily call that node @id: https://w3id.org/ro/terms/engine#planemo .. then not really need for testEngineType?

@simleo
Copy link
Collaborator Author

simleo commented Dec 15, 2020

From this I wonder why we need #planemo and can just as easily call that node @id: https://w3id.org/ro/terms/engine#planemo .. then not really need for testEngineType?

I had moved the engine type to its own property to support the use case where different tests in the same crate might need different versions of the same engine, so that the crate author could do something like:

{
  "@id": "#engine1",
  "testEngineType": "planemo",
  "version": "0.70",
  ...
},
{
  "@id": "#engine2",
  "testEngineType": "planemo",
  "version": "0.72",
  ...
}

I think we can still get rid of testEngineType and have a single unique id while solving the above problem by lifting version to TestDefinition (as engineVersion):

{
  "@id": "test/test1/my-test.yml",
  "@type": ["File", "TestDefinition"],
  "conformsTo": {"@id": "https://w3id.org/ro/terms/engine#planemo"},
  "engineVersion": ">=0.70"
},
{
  "@id": "https://w3id.org/ro/terms/engine#planemo",
  "@type": "SoftwareApplication",
  "name": "Planemo",
  "url": {"@id": "https://github.com/galaxyproject/planemo"},
}

Regarding the registry, I did not mean to imply that ro-terms is unsuitable. After reading your considerations, now I see that something would have to be updated anyway, whether it's the current CSV or a new setup based on subheadings. So it's better to go with the former since it's already in place, and integrate your original proposal of a PlanemoEngine type. The PlanemoEngine description in the ro-terms CSV can be something like "The Planemo workflow testing application: https://github.com/galaxyproject/planemo", or even just "https://github.com/galaxyproject/planemo". And we can have something similar for services.

Taking all of the above into account, the new test instances and definitions look like this:

{
    "@id": "#test1_1",
    "name": "Test 1 Instance 1",
    "@type": "TestInstance",
    "runsOn": {"@id": "https://w3id.org/ro/terms/test#JenkinsService"},
    "url": "https://example.org/jenkins",
    "resource": "job/tests/"
},
{
    "@id": "#test1_2",
    "name": "Test 1 Instance 2",
    "@type": "TestInstance",
    "runsOn": {"@id": "https://w3id.org/ro/terms/test#JenkinsService"},
    "url": "https://jenkins.example.com",
    "resource": "job/wftests/"
},
{
    "@id": "https://w3id.org/ro/terms/test#JenkinsService",
    "@type": "TestService",
    "name": "Jenkins",
    "url": {"@id": "https://www.jenkins.io"}
}

{
    "@id": "test/test1/my-test.yml",
    "@type": ["File", "TestDefinition"],
    "conformsTo": {"@id": "https://w3id.org/ro/terms/test#PlanemoEngine"},
    "engineVersion": "0.70"
},
{
    "@id": "test/test2/my-test.yml",
    "@type": ["File", "TestDefinition"],
    "conformsTo": {"@id": "https://w3id.org/ro/terms/test#PlanemoEngine"},
    "engineVersion": "0.72"
},
{
    "@id": "https://w3id.org/ro/terms/test#PlanemoEngine",
    "@type": "SoftwareApplication",
    "name": "Planemo",
    "url": {"@id": "https://github.com/galaxyproject/planemo"}
}

The full set of changes to this PR is in f23f6b8. Here is the corresponding update for the test namespace in ro-terms: ResearchObject/ro-terms#4.

How does this look?

One last note: at this point, the engine entity does not offer much added value, since it's just hardwired to be always the same thing for any given engine type. conformsTo in TestDefinition now points to a global unique ID, and info on what it's about can be found in the registry. So I think it can be simply removed. The same goes for test services. I've placed the corresponding code in a separate branch: https://github.com/ResearchObject/ro-crate-py/tree/test_metadata_no_engine_no_service. Would this be better or worse?

@stain stain merged commit 6664fe9 into master Dec 16, 2020
@stain
Copy link
Contributor

stain commented Dec 16, 2020

I like this simplified approach. RO-Crate philosophy is to documented in ro-crate-metadata.json any URIs that don't give a human-readable description - so I would for now say include the https://w3id.org/ro/terms/test#PlanemoEngine contextual node as well.

I am concerned if runsOn is meant to mean "runsOnSomethingOfThisType" or "runsOnThisInstance" - your example seems to be the first as a particular Jenkins installation would have a different URL - but presumably that URL would not need to be listed here?

@simleo
Copy link
Collaborator Author

simleo commented Dec 16, 2020

I am concerned if runsOn is meant to mean "runsOnSomethingOfThisType" or "runsOnThisInstance" - your example seems to be the first as a particular Jenkins installation would have a different URL - but presumably that URL would not need to be listed here?

Yes, it means "runsOnSomethingOfThisType".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants