Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Workflow: cp_process_singlecells ! #37

Merged
merged 42 commits into from
Apr 12, 2023
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
74f9959
fixed minor pathing bugs
axiomcura Mar 14, 2023
fbd9ffa
Merge branch 'WayScience:main' into main
axiomcura Mar 28, 2023
86e6f7d
`cp_process_singlecells` init workflow
axiomcura Mar 28, 2023
d4f6794
created cytotable env
axiomcura Mar 28, 2023
91e4014
added cytotable config
axiomcura Mar 29, 2023
e90b18a
added convert.py script
axiomcura Mar 29, 2023
fe16359
added parquet variable in common.smk
axiomcura Mar 29, 2023
54b2009
added converted parquet helper func
axiomcura Mar 29, 2023
24aa53c
update common.smk
axiomcura Mar 29, 2023
febe581
added configs param
axiomcura Mar 29, 2023
6003d11
init cp_process_singlecells workflow
axiomcura Mar 30, 2023
6115dc5
Fixed typos
axiomcura Mar 30, 2023
1608de2
updated cytotable_convert.smk rule module
axiomcura Mar 30, 2023
8a36504
added new error
axiomcura Mar 30, 2023
a6dc5fb
added new loader function for general configs
axiomcura Mar 30, 2023
c0b7460
added more guards
axiomcura Mar 30, 2023
ceb0094
update convert.py
axiomcura Mar 30, 2023
74fd4cb
fixed extension bug
axiomcura Mar 30, 2023
c9e206c
removed cytosnake imports see #18
axiomcura Mar 30, 2023
1f71e1c
update config
axiomcura Mar 30, 2023
fabe130
fixed parameters naming
axiomcura Mar 31, 2023
e87aa56
fixed bugs with convert and normalize scripts
axiomcura Apr 4, 2023
c22ba5b
added default env manager
axiomcura Apr 4, 2023
d7f4157
update pathing and fix #35
axiomcura Apr 4, 2023
a3104e8
update workflow added documentation
axiomcura Apr 5, 2023
be045dc
workflow update
axiomcura Apr 5, 2023
b2e92e2
update configs for normalize
axiomcura Apr 6, 2023
8e0475c
update feature select module
axiomcura Apr 6, 2023
bcd92de
removed unused import
axiomcura Apr 6, 2023
fecfb6f
added documentation
axiomcura Apr 7, 2023
df8a863
fixed typo
axiomcura Apr 7, 2023
1ebd569
added suggestions
axiomcura Apr 7, 2023
9d00a7a
Update cytosnake/utils/cyto_paths.py
axiomcura Apr 10, 2023
3d6fbf4
Update workflows/scripts/convert.py
axiomcura Apr 10, 2023
8413984
added comments and suggestions changes
axiomcura Apr 10, 2023
7a1a22d
added pre-commit tools
axiomcura Apr 10, 2023
c75ad28
pre-commit formatting
axiomcura Apr 10, 2023
4970d85
added documentation
axiomcura Apr 11, 2023
a0e42cd
added snakemake input documentation
axiomcura Apr 11, 2023
69294d8
added documentation
axiomcura Apr 11, 2023
f2e9c12
removed new lines
axiomcura Apr 11, 2023
6f4a7f6
updated convert link and params
axiomcura Apr 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 19 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,19 +1,34 @@
---
repos:
# remove unused imports
- repo: https://github.com/hadialqattan/pycln.git
rev: v2.1.3
hooks:
- id: pycln

# import formatter with black configurations
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files"]

# Code formatter for both python files and jupyter notebooks
# support pep 8 standards
- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black-jupyter
- id: black
language_version: python3.10

# remove unused imports
- repo: https://github.com/hadialqattan/pycln.git
rev: v2.1.3
# AI based formatter to improve readability
- repo: https://github.com/sourcery-ai/sourcery
rev: v1.1.0
hooks:
- id: pycln
- id: sourcery
args: [--diff=git diff HEAD, --no-summary]

# snakemake formatting
- repo: https://github.com/snakemake/snakefmt
Expand Down
11 changes: 11 additions & 0 deletions configs/analysis_configs/cytotable_convert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
cytotable_convert:
params:
dest_datatype: parquet
source_datatype: sqlite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking: does this mean CytoSnake is tightly coupled to SQLite source data input for CytoTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now yes. But in the future CSV and NPZ support will be added into the cytotable_convert.yaml module

concat: True
join: True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably beyond scope of cytosnake, but can you specify non-canonical compartments? (e.g. anything beyond Cells, Cytoplasm, and Nuclei)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will depend on CytoTable's capability. The config files will take anything as long as the parameters are valid in CytoTable's convert workflow. Therefore, if there is support for non-canonical compartments within CytoTable then it should work?

Looking at the documentation, the join parameter only takes a boolean.

Correct me if I am wrong @d33bs.This will depend on CytoTable's capability. The config files will take anything as long as the parameters are valid in CytoTable's convert workfow. Therefore, if there is support for non-cononical compartments within CytoTable then it should work?

Looking at the documentation, the join parameter only takes a boolean.

Correct me if I am wrong @d33bs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axiomcura - CytoTable.convert's join parameter decides whether the conversion should perform compartment merge (or solely concatenation via concat). Additional or different compartments may be specified via the compartments parameter. See the configuration presets here for a reference of example parameters used by CytoTable. Does the configuration found here for CytoSnake enable the use of the CytoTable compartments parameter?

infer_common_schema: True
drop_null: True
preset: cellprofiler_sqlite
log_level: ERROR

12 changes: 11 additions & 1 deletion configs/configuration.yaml
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
config_name: cytopipe_defaults
config_name:
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

env_manager: conda

# computation configs
analysis_configs:
preprocessing:
threads: 4

# data configurations
data_configs:
plate_data_format: sqlite

# Analysis configuration file paths
config_paths:
# general configs
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
general_configs: "configs/configuration.yaml"

# CellProfiler Specific analysis configurations
single_cell: "configs/analysis_configs/single_cell_configs.yaml"
normalize: "configs/analysis_configs/normalize_configs.yaml"
feature_select: "configs/analysis_configs/feature_select_configs.yaml"
aggregate: "configs/analysis_configs/aggregate_configs.yaml"
cytotable_config: "configs/analysis_configs/cytotable_convert.yaml"

# DeepProfiler Specific analysis configurations
dp_data: "configs/analysis_configs/dp_data_configs.yaml"
Expand Down
84 changes: 84 additions & 0 deletions configs/wf_configs/cp_process_singlecells.yaml
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
name: cp_process_singlecells_configs

# Documentation
docs: |
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
Description:
------------
Converts sqlite plate data into parquet and returns selected features in csv
format

Workflow Steps:
---------------
Below the workflow steps are separated in chunks.

cytotable_convert:
Takes in sqlite file and converts it into a parquet file.

Uses CytoTable's convert workflow which can be found in:
https://github.com/cytomining/CytoTable/blob/main/cytotable/convert.py
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

normalize_configs:
Noramlizes single cell morphological features

Uses Pycytominer normalization module:
https://github.com/cytomining/pycytominer/blob/master/pycytominer/normalize.py


feature_select_configs:
Selects morphological features from normalized dataset

Uses Pycytominer feature extraction module
https://github.com/cytomining/pycytominer/blob/master/pycytominer/feature_select.py


cytotable_convert:
params:
dest_datatype: parquet
source_datatype: null
concat: True
join: True
infer_common_schema: True
drop_null: True
preset: cellprofiler_sqlite
log_level: ERROR

normalize_configs:
params:
features: infer
image_features: False
meta_features: infer
samples: all
method: mad_robustize
compression_options:
method: gzip
mtime: 1
float_format: null
mad_robustize_epsilon: 1.0e-18
spherize_center: True
spherize_method: ZCA-cor
spherize_epsilon: 1.0e-6

feature_select_configs:
params:
features: infer
image_features: False
samples: all
operation:
- variance_threshold
- drop_na_columns
- correlation_threshold
- drop_outliers
- blocklist
na_cutoff: 0.05
corr_threshold: 0.9
corr_method: pearson
freq_cut: 0.05
unique_cut: 0.1
compression_options:
method: gzip
mtime: 1
float_format: null
blocklist_file: null
outlier_cutoff: 15
noise_removal_perturb_groups: null
noise_removal_stdev_cutoff: null
2 changes: 1 addition & 1 deletion cytosnake/cli/cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@

Generates CLI interface in order to interact with CytoSnake.
"""
import sys
import logging
import sys
from pathlib import Path

# cytosnake imports
Expand Down
4 changes: 4 additions & 0 deletions cytosnake/common/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ class ProjectExistsError(BaseFileExistsError):
that the current directory has already been set up for cytosnake analysis"""


class ExtensionError(BaseValueError):
"""Raised when invalid extensions are captured"""


# -----------------------
# Error handling functions
# -----------------------
Expand Down
50 changes: 50 additions & 0 deletions cytosnake/guards/ext_guards.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
"""
module: ext_guards.py
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

Checks if the correct extensions are provided
"""

import pathlib
from typing import TypeGuard

from cytosnake.guards.path_guards import is_valid_path


def has_parquet_ext(file_name: str | pathlib.Path) -> TypeGuard[str]:
"""Checks if the provided file path contains parquet file extension .
Parameters
----------
file_name : str | pathlib.Path
path to file

Returns
-------
TypeGuard[str]
return True if it is a parquet file, else False
"""
return (
file_name.suffix in [".parquet", ".parq", ".pq"]
if is_valid_path(file_name)
else False
)


def has_sqlite_ext(file_name: str | pathlib.Path) -> TypeGuard[str]:
"""Checks if the provided file path contains parquet file extension .

Parameters
----------
file_name : str | pathlib.Path
path to file

Returns
-------
TypeGuard[str]
return True if it is a parquet file, else False
"""

return (
file_name.suffix in [".sqlite", ".sqlite3"]
if is_valid_path(file_name)
else False
)
15 changes: 7 additions & 8 deletions cytosnake/guards/path_guards.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,19 @@
- valid path strings
"""

from pathlib import Path
import pathlib
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
from typing import TypeGuard


def is_valid_path(val: object) -> TypeGuard[Path]:
def is_valid_path(val: object) -> TypeGuard[pathlib.Path]:
"""checks if provided value is a valid path"""

# check if the val is valid type
# -- if string, convert to Path
accepted_types = (str, Path)
if not isinstance(val, accepted_types):
# type checking
if not isinstance(val, (str, pathlib.Path)):
return False
# convert to pathlib.Path
if isinstance(val, str):
val = Path(val)
val = pathlib.Path(val).resolve(strict=True)
axiomcura marked this conversation as resolved.
Show resolved Hide resolved

# check if the path exists
return bool(val.exists())
return val.exists()
27 changes: 22 additions & 5 deletions cytosnake/helpers/helper_funcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,17 @@
"""


from pathlib import Path
from typing import Optional

from snakemake.io import expand
from pathlib import Path
from cytosnake.utils.config_utils import load_meta_path_configs

from cytosnake.guards.path_guards import is_valid_path
from cytosnake.utils.config_utils import load_general_configs, load_meta_path_configs

# loading in config as global variables
PATHS = load_meta_path_configs()
CYTOSNAKE_CONFIGS = load_general_configs()


# ------------------------------
Expand Down Expand Up @@ -151,7 +154,6 @@ def annotated_output() -> str:


def normalized_output() -> str:

"""Generates output path for normalized dataset

Returns
Expand All @@ -169,7 +171,6 @@ def normalized_output() -> str:


def selected_features_output() -> str:

"""Generates output path for selected features dataset

Returns
Expand All @@ -187,7 +188,6 @@ def selected_features_output() -> str:


def consensus_output() -> str:

"""Generates output path for consensus dataset

Returns
Expand All @@ -204,6 +204,23 @@ def consensus_output() -> str:
return str(results_path / f"{output_name}.{ext}")


def parquet_output():
axiomcura marked this conversation as resolved.
Show resolved Hide resolved
"""Generates output path for parquet profiles

Returns
-------
str
path to generated parquet files

"""
data_path = Path(PATHS["project_dir_path"]) / "data"
output_name = "{file_name}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated somehow? I wasn't certain how this would effect the return of this function (would it product ".../{file_name}.parquet" every time?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now looking in retrospect, the helper functions are not well developed as it takes in too much repeated code.

Will be creating a PR for this issue: #32

ext = "parquet"

# constructing file output string
return str(data_path / f"{output_name}.{ext}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using only an f-string to format this string to help increase readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be updated in the future! #32



# ------------------------------
# Formatting I/O functions
# ------------------------------
Expand Down
16 changes: 13 additions & 3 deletions cytosnake/utils/config_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,7 @@ def load_configs(config_path: str | Path) -> dict:
if not is_valid_path(config_path):
raise FileNotFoundError("Invalid config path provided")
if isinstance(config_path, str):
config_path = Path(config_path).absolute()
if not config_path.is_absolute():
config_path = config_path.absolute()
config_path = Path(config_path).resolve(strict=True)

# loading in config_path
with open(config_path, "r") as yaml_contents:
Expand All @@ -41,6 +39,18 @@ def load_configs(config_path: str | Path) -> dict:
return loaded_configs


def load_general_configs() -> dict:
"""Loads cytosnake's general configurations

Returns:
-------
dict
dictionary containing the cytosnake general configs
"""
config_dir_path = cp.get_config_dir_path() / "configuration.yaml"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you change the name of the this file above, also make sure to propagate the change here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Any name changes that occurs will also have to be reflected here. Still thinking on ideas on how to prevent harcoding file names. I am open to any suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is ok to hardcode some things - ultimately, they will need to be hardcoded somewhere!

You may consider having a single function where these names are hardcoded, and calling the function wherever hardcodes are used. Just don't update the corresponding dictionary keys!

I think what you have now is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of making a class the handles all the file pathing function calls. It's an idea.

return load_configs(config_dir_path)


def load_meta_path_configs() -> dict:
"""Loads the metadata path from `.cytosnake/_paths.yaml` file

Expand Down
Loading