Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate scheduling, triggering and monitoring of workflows on CAVATICA #29

Open
2 of 10 tasks
ByroneCole-SageBionetworks opened this issue Jan 11, 2023 · 5 comments
Open
2 of 10 tasks
Assignees
Labels
Sage Sage Bionetworks task

Comments

@ByroneCole-SageBionetworks
Copy link

ByroneCole-SageBionetworks commented Jan 11, 2023

  • CHOP RNASeq BixOps Automation. JIRA

    • Reach out to CHOP to get answers about their BixOps process to trigger the process manually. JIRA
    • Confirmed that we have enough information from CHOP to start work. JIRA
    • BONUS: Manually follow the CHOP RNASeq BixOps on a subset of existing datasets (Khor) and confirm the expected output. (IN PROGRESS)
    • BONUS: Create python/shell script to capture the CHOP BIxOps RNASeq pipeline.
    • BONUS: Determine feasibility of using capturing the CHOP BixOps via Orca (Airflow DAG)
    • BONUS: Implement Orca data pipeline (Airflow DAG) that will automate the process
    • BONUS: Leverage Orca to launch, monitor CHOP workflows for RNASeq processing for new data.
  • BONUS: CHOP WGS BixOps Automation. JIRA

    • BONUS: Same checklist as above but for WGS
@thomasyu888
Copy link
Contributor

We reached out to CHOP (Eric, Allison, Yuankun) to answer questions we had about the CHOP BixOps. Here is the link to the slack thread: https://teaminclude.slack.com/archives/C03K4BHD3QC/p1673973750582959

@thomasyu888 thomasyu888 changed the title Road to automation of scheduling, triggering and monitoring of workflows on CAVATICA Automate scheduling, triggering and monitoring of workflows on CAVATICA Jan 30, 2023
@thomasyu888
Copy link
Contributor

On the datahops call on 3/23/2023, we stated that Sage demonstrated the ability to launch and monitor workflows on Cavatica using Orca so we are calling this particular portion done.

The rest of the bullet points are BONUS, but we will continue to try to tackle them.

@thomasyu888
Copy link
Contributor

thomasyu888 commented Apr 4, 2023

For the V3 tech plan, I am including a draft of the automation of data processing SOP for V4 and the future of this project.

Data Processing with Orca

The genomic data processing for INCLUDE occurs within the Cavatica platform developed and maintained by Velsera. The CHOP team have developed many bioinformatics operation pipelines that capture all the steps prior and post processing on Cavatica. The Sage Bionetworks team is responsible for learning and attempting to automate this process as much as possible. The bulk of the work is in learning the BixOps and then attempting to automate all the steps.

This is a high level flow chart for the steps required to leverage Orca for genomic data processing in INCLUDE. (Lucid link)

image

On a high level:

Depending on the complexity and completeness of any BixOps workflow, we estimate that it will take around a quarter to fully automate a workflow, if deemed feasible.

  1. Learn the CHOP BixOps of any particular workflow and manually trigger the processing.
    1. Manually set up Cavatica project with KF app and reference files
    2. Manually upload and prepare dataset for processing
    3. Process the dataset with given KF App
    4. Set up delivery Cavatica project and populate with workflow input / output
  2. Attempt to automate steps by stringing together steps via Python script
  3. If possible ^, the steps might look like
    1. Notify CHOP of new dataset
    2. Each step in the CHOP BixOps
    3. Execute workflow on subset of data
    4. Validate output
    5. Execute workflow on production dataset
    6. Validate output
    7. Notify collaborators

@briandoconnor
Copy link

Only comment, think about capacity... given our funding, how many "Execute Orca recipe on entire dataset" can we do in a given quarter?

@briandoconnor
Copy link

briandoconnor commented Apr 13, 2023

Other thing... think about adding a flow step for "trigger notification of CHOP"... and adding status back to DFA as that's deployed for INCLUDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sage Sage Bionetworks task
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

3 participants