Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Workflow: cp_process_singlecells ! #37

Merged
merged 42 commits into from
Apr 12, 2023

Conversation

axiomcura
Copy link
Member

@axiomcura axiomcura commented Apr 6, 2023

About

This PR introduces a new a workflow that is based on @jenna-tomkinson single cell feature extraction analysis

What’s new

  • New module cytotable_convert.smk
    • Converts sqlite files into parquet files. For more information about CytoTable, please look at the repo
    • Workflow that contains 3 steps.
      • First step converts sqlite files into paruquet files by using cytostable_convert.smk module
      • Next is to normalize the data set by using pycytominer's normalization function within the normalize.smk module
      • Then features are extracted using pycytominer's feature extraction function in the feature_selection.smk module
  • Introducing to workflow config file
    • Workflow configs contains all the parameters used within the workflow
    • This is located in configs/wl_configs/cp_process_singlecells.yaml
      • NOTE: workflow configs names will have the same name as the workflow

Usage

To use this workflow first one must initalize the data set for

cytosnake init -d *.sqlite -m metadata -b barcodes.txt

NOTE: Make sure to replace the file names with your own file names.

Where:

  • -d is the dataset, it can also be a list of datasets
  • -m refers to the metadata folder
  • -b refers to the barcodes files, however, this is optional. If your datasets do not come with barcodes, then this will be defaulted to None

This will make CytoSnake recognize that in the current directory you are in is the ProjectDirectory

once the initialization step is complete, then you can use CytoSnake's run mode to execute the cp_process_singlecells workflow

cytosnake run cp_process_singlecells

In your ProjecDirectory, a new directory will appear titled as results. That is where all the outputs will be saved.

  • NOTE: the converted datasets will be in the data folder

And that’s it!

Change configurations.

To change the cp_process_singlecells configs, go to configs/wl_configs/cp_process_singlecells.yaml file

The structure of the config is:

name: name_of_workflow

module1_configs:
  params:
    - convert: parquet
module2_configs:
  params:
    - method: spherize 
# and so on

To change the parameters, change the values to each keyword under the params section. Make sure the inputs are valid. Please refer to the software's documentation.. CytoSnake cannot preemptively check if the new parameters are valid before executing the workflow.

@axiomcura axiomcura requested a review from d33bs April 6, 2023 17:11
@axiomcura
Copy link
Member Author

Hello @axiomcura @jenna-tomkinson @d33bs

Hopefully I have attended all your comments and suggestions. Ready for another review!

Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this PR spawned several other issues!

Of the ones it spawned, which are highest priority in terms of handing this off to @jenna-tomkinson to play with?

cytosnake/utils/cyto_paths.py Show resolved Hide resolved
@axiomcura
Copy link
Member Author

Looks like this PR spawned several other issues!

Of the ones it spawned, which are highest priority in terms of handing this off to @jenna-tomkinson to play with?

It's too hard to tell. My best bet is which ever @jenna-tomkinson finds the most annoying. 😅

@gwaybio
Copy link
Member

gwaybio commented Apr 10, 2023

oh gotcha! So you're saying after this is merged, she can have at it? (i.e. no other issues need addressing?)

@axiomcura
Copy link
Member Author

I have been using the NF1 dataset to test cp_process_singlecells and it has been working for me. Hopefully everything goes smoothly for Jenna with the other datasets. hehe 😅

I think one issue would be trying to run CytoSnake with multiple metadata directories in one single run

@gwaybio
Copy link
Member

gwaybio commented Apr 10, 2023

I think one issue would be trying to run CytoSnake with multiple metadata directories in one single run

Gotcha! Ok, perhaps this is the next priority to tackle, but it doesn't necessarily mean that Jenna has to wait.

I have been using the NF1 dataset to test cp_process_singlecells and it has been working for me.

Glad to hear it!! I wonder if you can contribute this directly then... what do you think?

@axiomcura
Copy link
Member Author

axiomcura commented Apr 10, 2023

I think the plan right now is:

I send it to @jenna-tomkinson , she writes some issues in the issues section in the Repo and I attend those a first priority issues to take care off.

Glad to hear it!! I wonder if you can contribute this directly then... what do you think?

Don't mind at all. @jenna-tomkinson and I can sit together, tinker with the config files to generate high quality features :)!

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thank you for addressing the comments and for making many changes in addition to the new issues! I left a few minor comments + responses and overall felt this LGTM, respecting your decision on when it's best to merge.

@@ -1,7 +1,7 @@
cytotable_convert:
params:
dest_datatype: parquet
source_datatype: null
source_datatype: sqlite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking: does this mean CytoSnake is tightly coupled to SQLite source data input for CytoTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now yes. But in the future CSV and NPZ support will be added into the cytotable_convert.yaml module

configs/configuration.yaml Show resolved Hide resolved
configs/wf_configs/cp_process_singlecells.yaml Outdated Show resolved Hide resolved
cytosnake/guards/path_guards.py Show resolved Hide resolved
workflows/rules/common.smk Outdated Show resolved Hide resolved
workflows/rules/common.smk Show resolved Hide resolved
workflows/scripts/convert.py Outdated Show resolved Hide resolved
workflows/scripts/feature_select.py Show resolved Hide resolved
@axiomcura
Copy link
Member Author

@d33bs @gwaybio Hopefully I have attended all comments and suggestions. If not, please let me know!

Thank you for your inputs! made some issues that I will take care in the future!

@axiomcura axiomcura merged commit 13a46da into WayScience:main Apr 12, 2023
@axiomcura axiomcura deleted the cp-process-singlecells branch May 16, 2023 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants