Skip to content

Commit

Permalink
feat: enhance analysis and visualization capabilities for scRNA-seq data
Browse files Browse the repository at this point in the history
- Update service description for comprehensive single-cell RNA sequencing analysis.
- Add new input parameters: max_features, resolution, species, min_pct, logfc_threshold, gsea_min_size, gsea_max_size, and category.
- Implement GF-ICF for single-cell GSEA and integrate pathway analysis.
- Enhance output files to include detailed cluster information and pathway scores.
- Introduce docker-compose setup for RStudio environment.
- Optimize Dockerfile with streamlined package installations using `pak`.
  • Loading branch information
mihirsamdarshi committed Aug 13, 2024
1 parent 110f47c commit c9df03d
Show file tree
Hide file tree
Showing 4 changed files with 266 additions and 210 deletions.
74 changes: 69 additions & 5 deletions .osparc/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ key: simcore/services/comp/osparc-differential-expression
type: computational
integration-version: 1.0.0
version: 0.1.0
description: Easily generate differential expression results from OSparc data
description: |
Easily generate differential expression results from OSparc data. This service performs comprehensive analysis of single-cell RNA sequencing data, including data normalization, clustering, dimensionality reduction, and pathway activity scoring. It provides visualizations such as t-SNE and UMAP plots for both gene expression and pathway activity, along with detailed cluster information and pathway scores. The service is designed to work with standard input formats and offers flexibility in analysis parameters.
contact: [email protected]
thumbnail: https://github.com/ITISFoundation/osparc-assets/blob/cb43207b6be2f4311c93cd963538d5718b41a023/assets/default-thumbnail-cookiecutter-osparc-service.png?raw=true
authors:
Expand All @@ -16,11 +17,12 @@ inputs:
label: Input Folder
description: Folder containing scRNA-seq data (matrix.mtx, features.tsv, barcodes.tsv)
type: data:*/*
output_prefix:
name:
displayOrder: 2
label: Output Prefix
description: Prefix for output files (optional)
label: Project Name
description: The name of the dataset being analyzed
type: string
defaultValue: sPARcRNA
min_cells:
displayOrder: 3
label: Minimum Cells
Expand All @@ -33,11 +35,73 @@ inputs:
description: Minimum number of features (genes) per cell
type: integer
defaultValue: 200
max_features:
displayOrder: 5
label: Maximum Features
description: Maximum number of features (genes) per cell
type: integer
defaultValue: 2500
resolution:
displayOrder: 6
label: Resolution
description: Resolution parameter for clustering
type: number
defaultValue: 0.8
species:
displayOrder: 7
label: Species
description: Species for GSEA (e.g., "Homo sapiens" or "Mus musculus")
type: string
defaultValue: Homo sapiens
min_pct:
displayOrder: 8
label: Minimum Percentage
description: Minimum percentage for FindAllMarkers
type: number
defaultValue: 0.25
logfc_threshold:
displayOrder: 9
label: Log Fold-Change Threshold
description: Log fold-change threshold for FindAllMarkers
type: number
defaultValue: 0.25
gsea_min_size:
displayOrder: 10
label: GSEA Minimum Size
description: Minimum gene set size for GSEA
type: integer
defaultValue: 15
gsea_max_size:
displayOrder: 11
label: GSEA Maximum Size
description: Maximum gene set size for GSEA
type: integer
defaultValue: 500
category:
displayOrder: 12
label: MSigDB Category
description: MSigDB category for GSEA (e.g., "H" for hallmark gene sets)
type: string
defaultValue: H
outputs:
output_file:
displayOrder: 1
label: Processed Data
description: Zipped file containing processed scRNA-seq data (AnnData objects) and metadata
description: |
Zipped file containing processed scRNA-seq data and analysis results. The zip file includes:
- seurat_object.rds: Serialized R object containing the complete Seurat analysis
- tsne_plot.png: t-SNE plot of cell clusters based on gene expression
- dim_reduction_data.csv: CSV file with UMAP and t-SNE coordinates for both gene expression and pathway activity, along with cluster assignments
- pathway_scores.csv: CSV file with pathway activity scores for each cell
- cluster_info_genes.csv: CSV file with cluster information based on gene expression, including centroids and top pathways
- cluster_info_pathways.csv: CSV file with cluster information based on pathway activity, including centroids and top pathways
- outputs.json: JSON file containing summary statistics and file paths, including:
- initial_cell_count: Number of cells in the input data
- final_cell_count: Number of cells after filtering
- gene_count: Number of genes in the analysis
- project_name: Name of the analyzed dataset
- cluster_count_genes: Number of clusters based on gene expression
- cluster_count_pathways: Number of clusters based on pathway activity
type: data:application/zip
fileToKeyMap:
final_output.zip: output_file
18 changes: 18 additions & 0 deletions docker-compose-rstudio.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
services:
rstudio:
image: rocker/rstudio:4.2.0
container_name: rstudio_instance
environment:
- PASSWORD=yourpassword
- USER=scu
- INPUT_FOLDER=/input
- OUTPUT_FOLDER=/output
volumes:
- ./src/pipeline:/home/scu/pipeline
- ./src/astro:/home/scu/astro
- ./input:/input
- ./output:/output
ports:
- "8787:8787"
user: "root"
command: ["--server-user", "scu"]
12 changes: 4 additions & 8 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ ENV NODE_VERSION="22.x" \
# Install system packages and R
RUN apt-get update \
&& apt-get -y install --no-install-recommends \
curl jq zip adduser apt-transport-https ca-certificates gnupg \
curl jq zip adduser apt-transport-https ca-certificates gnupg libgsl-dev \
libcurl4-openssl-dev libssl-dev libxml2-dev libhdf5-dev python3 python3-pip \
libfontconfig1-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev \
&& echo "deb https://cloud.r-project.org/bin/linux/ubuntu noble-cran40/" >> /etc/apt/sources.list \
Expand All @@ -38,13 +38,9 @@ RUN adduser --uid ${SC_USER_ID} --disabled-password --gecos "" --shell /bin/sh -

# ------------------------------------------------------------------------------------
# Install Bioconductor
RUN R -e "install.packages('BiocManager', repos='http://cran.rstudio.com/')"
# Install required R packages available on CRAN
RUN R -e "install.packages(c('Seurat', 'jsonlite', 'optparse', 'ggplot2', 'devtools', 'dplyr', 'msigdbr'), repos='http://cran.rstudio.com/')"
# Install required R packages available on Bioconductor
RUN R -e "BiocManager::install('fgsea')"
# Install presto for speed-ups to Seurat
RUN R -e "library(devtools); devtools::install_github('immunogenomics/presto')"

RUN R -e "install.packages(c('pak', 'devtools', 'BiocManager'), repos='http://cran.rstudio.com/')"
RUN R -e "pak::pkg_install(c('Seurat', 'jsonlite', 'optparse', 'ggplot2', 'devtools', 'dplyr', 'msigdbr', 'BiocManager', 'sva', 'edgeR', 'fgsea', 'immunogenomics/presto', 'zdebruine/RcppML', 'gambalab/gficf'))"

# Remove the Python warning about system installs since we're in a docker image
RUN rm /usr/lib/python*/EXTERNALLY-MANAGED
Expand Down
Loading

0 comments on commit c9df03d

Please sign in to comment.