-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Workflow: cp_process_singlecells
!
#37
Changes from 37 commits
74f9959
fbd9ffa
86e6f7d
d4f6794
91e4014
e90b18a
fe16359
54b2009
24aa53c
febe581
6003d11
6115dc5
1608de2
8a36504
a6dc5fb
c0b7460
ceb0094
74fd4cb
c9e206c
1f71e1c
fabe130
e87aa56
c22ba5b
d7f4157
a3104e8
be045dc
b2e92e2
8e0475c
bcd92de
fecfb6f
df8a863
1ebd569
9d00a7a
3d6fbf4
8413984
7a1a22d
c75ad28
4970d85
a0e42cd
69294d8
f2e9c12
6f4a7f6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
cytotable_convert: | ||
params: | ||
dest_datatype: parquet | ||
source_datatype: sqlite | ||
concat: True | ||
join: True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is probably beyond scope of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will depend on Looking at the documentation, the Correct me if I am wrong @d33bs.This will depend on Looking at the documentation, the Correct me if I am wrong @d33bs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @axiomcura - |
||
infer_common_schema: True | ||
drop_null: True | ||
preset: cellprofiler_sqlite | ||
log_level: ERROR | ||
|
axiomcura marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
name: cp_process_singlecells_configs | ||
|
||
# Documentation | ||
docs: | | ||
axiomcura marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Description: | ||
------------ | ||
Converts sqlite plate data into parquet and returns selected features in csv | ||
format | ||
|
||
Workflow Steps: | ||
--------------- | ||
Below the workflow steps are separated in chunks. | ||
|
||
cytotable_convert: | ||
Takes in sqlite file and converts it into a parquet file. | ||
|
||
Uses CytoTable's convert workflow which can be found in: | ||
https://github.com/cytomining/CytoTable/blob/main/cytotable/convert.py | ||
axiomcura marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
normalize_configs: | ||
Noramlizes single cell morphological features | ||
|
||
Uses Pycytominer normalization module: | ||
https://github.com/cytomining/pycytominer/blob/master/pycytominer/normalize.py | ||
|
||
|
||
feature_select_configs: | ||
Selects morphological features from normalized dataset | ||
|
||
Uses Pycytominer feature extraction module | ||
https://github.com/cytomining/pycytominer/blob/master/pycytominer/feature_select.py | ||
|
||
|
||
cytotable_convert: | ||
params: | ||
dest_datatype: parquet | ||
source_datatype: null | ||
concat: True | ||
join: True | ||
infer_common_schema: True | ||
drop_null: True | ||
preset: cellprofiler_sqlite | ||
log_level: ERROR | ||
|
||
normalize_configs: | ||
params: | ||
features: infer | ||
image_features: False | ||
meta_features: infer | ||
samples: all | ||
method: mad_robustize | ||
compression_options: | ||
method: gzip | ||
mtime: 1 | ||
float_format: null | ||
mad_robustize_epsilon: 1.0e-18 | ||
spherize_center: True | ||
spherize_method: ZCA-cor | ||
spherize_epsilon: 1.0e-6 | ||
|
||
feature_select_configs: | ||
params: | ||
features: infer | ||
image_features: False | ||
samples: all | ||
operation: | ||
- variance_threshold | ||
- drop_na_columns | ||
- correlation_threshold | ||
- drop_outliers | ||
- blocklist | ||
na_cutoff: 0.05 | ||
corr_threshold: 0.9 | ||
corr_method: pearson | ||
freq_cut: 0.05 | ||
unique_cut: 0.1 | ||
compression_options: | ||
method: gzip | ||
mtime: 1 | ||
float_format: null | ||
blocklist_file: null | ||
outlier_cutoff: 15 | ||
noise_removal_perturb_groups: null | ||
noise_removal_stdev_cutoff: null |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
""" | ||
module: ext_guards.py | ||
axiomcura marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Checks if the correct extensions are provided | ||
""" | ||
|
||
import pathlib | ||
from typing import TypeGuard | ||
|
||
from cytosnake.guards.path_guards import is_valid_path | ||
|
||
|
||
def has_parquet_ext(file_name: str | pathlib.Path) -> TypeGuard[str]: | ||
"""Checks if the provided file path contains parquet file extension . | ||
Parameters | ||
---------- | ||
file_name : str | pathlib.Path | ||
path to file | ||
|
||
Returns | ||
------- | ||
TypeGuard[str] | ||
return True if it is a parquet file, else False | ||
""" | ||
return ( | ||
file_name.suffix in [".parquet", ".parq", ".pq"] | ||
if is_valid_path(file_name) | ||
else False | ||
) | ||
|
||
|
||
def has_sqlite_ext(file_name: str | pathlib.Path) -> TypeGuard[str]: | ||
"""Checks if the provided file path contains parquet file extension . | ||
|
||
Parameters | ||
---------- | ||
file_name : str | pathlib.Path | ||
path to file | ||
|
||
Returns | ||
------- | ||
TypeGuard[str] | ||
return True if it is a parquet file, else False | ||
""" | ||
|
||
return ( | ||
file_name.suffix in [".sqlite", ".sqlite3"] | ||
if is_valid_path(file_name) | ||
else False | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,14 +6,17 @@ | |
""" | ||
|
||
|
||
from pathlib import Path | ||
from typing import Optional | ||
|
||
from snakemake.io import expand | ||
from pathlib import Path | ||
from cytosnake.utils.config_utils import load_meta_path_configs | ||
|
||
from cytosnake.guards.path_guards import is_valid_path | ||
from cytosnake.utils.config_utils import load_general_configs, load_meta_path_configs | ||
|
||
# loading in config as global variables | ||
PATHS = load_meta_path_configs() | ||
CYTOSNAKE_CONFIGS = load_general_configs() | ||
|
||
|
||
# ------------------------------ | ||
|
@@ -151,7 +154,6 @@ def annotated_output() -> str: | |
|
||
|
||
def normalized_output() -> str: | ||
|
||
"""Generates output path for normalized dataset | ||
|
||
Returns | ||
|
@@ -169,7 +171,6 @@ def normalized_output() -> str: | |
|
||
|
||
def selected_features_output() -> str: | ||
|
||
"""Generates output path for selected features dataset | ||
|
||
Returns | ||
|
@@ -187,7 +188,6 @@ def selected_features_output() -> str: | |
|
||
|
||
def consensus_output() -> str: | ||
|
||
"""Generates output path for consensus dataset | ||
|
||
Returns | ||
|
@@ -204,6 +204,23 @@ def consensus_output() -> str: | |
return str(results_path / f"{output_name}.{ext}") | ||
|
||
|
||
def parquet_output(): | ||
axiomcura marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Generates output path for parquet profiles | ||
|
||
Returns | ||
------- | ||
str | ||
path to generated parquet files | ||
|
||
""" | ||
data_path = Path(PATHS["project_dir_path"]) / "data" | ||
output_name = "{file_name}" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be updated somehow? I wasn't certain how this would effect the return of this function (would it product There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, now looking in retrospect, the helper functions are not well developed as it takes in too much repeated code. Will be creating a PR for this issue: #32 |
||
ext = "parquet" | ||
|
||
# constructing file output string | ||
return str(data_path / f"{output_name}.{ext}") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider using only an f-string to format this string to help increase readability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will be updated in the future! #32 |
||
|
||
|
||
# ------------------------------ | ||
# Formatting I/O functions | ||
# ------------------------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,9 +30,7 @@ def load_configs(config_path: str | Path) -> dict: | |
if not is_valid_path(config_path): | ||
raise FileNotFoundError("Invalid config path provided") | ||
if isinstance(config_path, str): | ||
config_path = Path(config_path).absolute() | ||
if not config_path.is_absolute(): | ||
config_path = config_path.absolute() | ||
config_path = Path(config_path).resolve(strict=True) | ||
|
||
# loading in config_path | ||
with open(config_path, "r") as yaml_contents: | ||
|
@@ -41,6 +39,18 @@ def load_configs(config_path: str | Path) -> dict: | |
return loaded_configs | ||
|
||
|
||
def load_general_configs() -> dict: | ||
"""Loads cytosnake's general configurations | ||
|
||
Returns: | ||
------- | ||
dict | ||
dictionary containing the cytosnake general configs | ||
""" | ||
config_dir_path = cp.get_config_dir_path() / "configuration.yaml" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you change the name of the this file above, also make sure to propagate the change here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. Any name changes that occurs will also have to be reflected here. Still thinking on ideas on how to prevent harcoding file names. I am open to any suggestions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think it is ok to hardcode some things - ultimately, they will need to be hardcoded somewhere! You may consider having a single function where these names are hardcoded, and calling the function wherever hardcodes are used. Just don't update the corresponding dictionary keys! I think what you have now is fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking of making a |
||
return load_configs(config_dir_path) | ||
|
||
|
||
def load_meta_path_configs() -> dict: | ||
"""Loads the metadata path from `.cytosnake/_paths.yaml` file | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking: does this mean CytoSnake is tightly coupled to SQLite source data input for CytoTable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now yes. But in the future
CSV
andNPZ
support will be added into thecytotable_convert.yaml
module