Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cwltool --print-deps fails with workflows having namespaced location steps #1765

Open
jmfernandez opened this issue Nov 18, 2022 · 3 comments · May be fixed by #1766
Open

cwltool --print-deps fails with workflows having namespaced location steps #1765

jmfernandez opened this issue Nov 18, 2022 · 3 comments · May be fixed by #1766

Comments

@jmfernandez
Copy link
Contributor

jmfernandez commented Nov 18, 2022

I have been testing to "print" the list of dependencies of several workflows which have some of its steps declarations outside their workflow repository. I need them to be able to capture the list of all the CWL URLs involved in a workflow.

Expected Behavior

For instance, if you test next workflow:

git clone https://github.com/Sage-Bionetworks-Challenges/data-to-model-challenge-workflow
cd data-to-model-challenge-workflow
cwltool --print-deps --relative-deps primary workflow.cwl

it provides the list of both local and remote cwl dependencies.

Actual Behavior

But, with workflows where the base location is declared as a namespace, and then the namespace is used to declare the step location in a shorter way, next operation is failing:

git clone https://github.com/pvanheus/lukasa/
cd lukasa
cwltool --print-deps --relative-deps primary protein_evidence_mapping.cwl
ERROR Tool definition failed validation:
Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl

Workflow Code

You can see one example of workflow using namespaces to provide the "prefix" to locate the steps in next link https://github.com/pvanheus/lukasa/blob/main/protein_evidence_mapping.cwl . There are more examples available in other GitHub repos.

Full Traceback

NFO /tmp/testi/.v/bin/cwltool 3.1.20221109155812
INFO Resolved 'protein_evidence_mapping.cwl' to 'file:///tmp/testi/lukasa/protein_evidence_mapping.cwl'
ERROR Tool definition failed validation:
Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl
Traceback (most recent call last):
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 1117, in main
    printdeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 570, in printdeps
    deps = find_deps(obj, document_loader, uri, basedir=basedir, nestdirs=nestdirs)
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 617, in find_deps
    sfs = scandeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1339, in scandeps
    scandeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1339, in scandeps
    scandeps(
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/process.py", line 1302, in scandeps
    loadref(base, u2),
  File "/tmp/testi/.v/lib/python3.8/site-packages/cwltool/main.py", line 615, in loadref
    return document_loader.fetch(document_loader.fetcher.urljoin(base, uri))
  File "/tmp/testi/.v/lib/python3.8/site-packages/schema_salad/ref_resolver.py", line 995, in fetch
    text = self.fetch_text(url, content_types=content_types)
  File "/tmp/testi/.v/lib/python3.8/site-packages/schema_salad/fetcher.py", line 108, in fetch_text
    raise ValidationException(f"Unsupported scheme in url: {url}")
schema_salad.exceptions.ValidationException: Unsupported scheme in url: bio-cwl-tools:samtools/samtools_faidx.cwl

Your Environment

  • cwltool version:
    All the tests have been with cwltool 3.1.20221109155812 in Linux.
  • Environment: cwltool was installed from PyPI with pip in a virtual enviroment from Python 3.8
@kinow
Copy link
Member

kinow commented Nov 18, 2022

Huh, that error message is really odd since the --validate option doesn't find any validation errors.

(cwltool):~/Development/python/workspace/lukasa$ cwltool --validate protein_evidence_mapping.cwl
INFO /home/bdepaula/mambaforge/envs/cwltool/bin/cwltool 3.1.20221109155812
INFO Resolved 'protein_evidence_mapping.cwl' to 'file:///home/bdepaula/Development/python/workspace/lukasa/protein_evidence_mapping.cwl'
protein_evidence_mapping.cwl is valid CWL.

@tetron
Copy link
Member

tetron commented Nov 18, 2022

Oh wow, I never thought about using namespace for file references that way. That is really clever.

I believe what is happening is that the find_deps function is running on the document prior to having the full schema salad preprocessing applied (this is so that files brought in by $imports get recognized as dependencies). So the find_deps function needs to apply namespaces in the URI expansion itself.

@kinow
Copy link
Member

kinow commented Nov 18, 2022

Oh wow, I never thought about using namespace for file references that way. That is really clever.

I believe what is happening is that the find_deps function is running on the document prior to having the full schema salad preprocessing applied (this is so that files brought in by $imports get recognized as dependencies). So the find_deps function needs to apply namespaces in the URI expansion itself.

Ah! In that case I think we can just switch the order, and load and validate the document before printdeps (& then find_deps) is called. I tested it and it worked, just adding a test case and then will send a PR 👍

Thanks @tetron !

@kinow kinow linked a pull request Nov 18, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants