Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figuring out path forward on complicated output sources #126

Open
jmchilton opened this issue Nov 10, 2020 · 8 comments
Open

Figuring out path forward on complicated output sources #126

jmchilton opened this issue Nov 10, 2020 · 8 comments

Comments

@jmchilton
Copy link

Consider cond-wf-003.cwl

class: Workflow
cwlVersion: v1.2
inputs:
  val: int
  def:
    type: string
    default: "Direct"

steps:

  step1:
    in:
      in1: val
      a_new_var: val
    run: foo.cwl
    when: $(inputs.a_new_var > 2)
    out: [out1]

outputs:
  out1:
    type: string
    outputSource:
      - step1/out1
      - def
    pickValue: first_non_null

requirements:
  InlineJavascriptRequirement: {}
  MultipleInputFeatureRequirement: {}

Currently we tie all workflow outputs to a particular step, Galaxy doesn't really have non-step processing of workflows.

Question:

  • Are there cases where the outputSource wouldn't correspond to a single input or step outside the context of pickValue? If yes, if there. a simpler test case to work off of that doesn't depend on pickValue.
  • My first thought here is that we can convert this output into an expression tool with a single output (or even a no-op step of some sort with a single output). Is that something in cwl-utils (https://github.com/common-workflow-language/cwl-utils) or should it be?
@jmchilton
Copy link
Author

I found an interesting test case that uses pickValue on the result of one step.

class: Workflow
cwlVersion: v1.2
inputs:
  data:
    type: int[]
    default: [1, 2, 3, 4, 5, 6]
  test: boolean

steps:

  step1:
    in:
      in1: data
      a_new_var: test
    run: foo.cwl
    when: $(inputs.a_new_var)
    out: [out1]
    scatter: in1

outputs:
  out1:
    type: string[]
    outputSource: step1/out1
    pickValue: all_non_null

requirements:
  ScatterFeatureRequirement: {}

Problem here is very similar, we set the outputs when we create the job - not when the jobs are complete. So we don't have an extension point to pick those values and set a workflow output. Same trick here though of adding a step that takes an input, evaluates the picking, and then setting the output would potentially work.

We could also extend the workflow output logic to encode the delayed picking in the database somehow and just pick when we need to fetch the values. That isn't super clean either.

@jmchilton
Copy link
Author

jmchilton commented Nov 10, 2020

Now that I've looked through the rest of the test cases and thought through things a bit more - I think there are 4 potential paths forward.

  1. pickValues are syntactic sugar, just rewrite the workflow on import to add a step with an expression tool (implement in cwl-utils)
  2. pickValues are syntactic sugar, just rewrite the workflow on import to add a step with an expression tool (implement in Galaxy)
  3. Somehow implement delayed workflow outputs - the progress tracker would throw a Delay exception if the outputs aren't ready and the API would return some sort of incomplete value.
  4. Add a new workflow invocation state - "finalized". We could add workflow finalizer threads - that monitor scheduled workflows and usher them into a finalized state once all the jobs are complete. This is a lot of work to just implement pickValues - but we could then implement post-workflow-actions - e-mail notifications about them being complete, a performant "cleanup non-terminal datasets" option, better job states in the API, publish a PDF copy of the invocation report to an FTP directory, etc..

3 ... seems gross. It feels like a lot of complexity that isn't buying us much beyond implementing pickValues for CWL unless we can come up with some other use cases. 4 seems like I good, noble project to improve Galaxy, but it is going to be "a thing". 1 seems... very nice the only drawback is it doesn't really translate to Galaxy Format 2 readily. 1 would seem to make more sense than 2, but 2 is worth considering in case we can come up with some cool syntax that makes sense for Galaxy and Format 2 that we could adapt the outputs to (i.e. a variant of this where you're picking inputs instead of picking outputs).

@mr-c
Copy link
Member

mr-c commented Nov 10, 2020 via email

@mr-c
Copy link
Member

mr-c commented Nov 11, 2020

@jmchilton I've got a WIP on option 1; question: is pickValue only a problem to you when it shows up in the outputs of a CWL Workflow, but okay if it appears as part of a step? Or is it always a problem?

@mr-c
Copy link
Member

mr-c commented Nov 11, 2020

@jmchilton common-workflow-language/cwl-utils#43

@jmchilton
Copy link
Author

Does it ever appear on a step? I couldn't find an example of this.

@mr-c
Copy link
Member

mr-c commented Nov 11, 2020

@jmchilton It can https://www.commonwl.org/v1.2/Workflow.html#WorkflowStepInput ; imagine a conditional when in the "middle" of a workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants