Skip to content

Commit

Permalink
edited the readme
Browse files Browse the repository at this point in the history
  • Loading branch information
EliHei2 committed Sep 12, 2024
1 parent e24f997 commit b103e2d
Showing 1 changed file with 61 additions and 42 deletions.
103 changes: 61 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,94 @@
# Segger
# 🍳 Welcome to segger

**segger** is a cutting-edge tool for **cell segmentation** in **single-molecule spatial omics** datasets. By leveraging **graph neural networks (GNNs)** and heterogeneous graphs, segger offers unmatched accuracy and scalability.

*Segger* is a cell segmentation model for single-molecule resolved datasets, addressing the challenges of accurate and fast single-cell segmentation in imaging-based spatial omics. By leveraging the co-occurrence of nucleic and cytoplasmic molecules (e.g., transcripts), Segger employs a heterogeneous graph structure integrating fixed-radius nearest neighbor graphs for nuclei and molecules, with edges connecting transcripts to nuclei based on spatial proximity. A graph neural network (GNN) propagates information across these edges to learn molecule-nuclei associations, refining cell borders post-training. Benchmarks on 10X Xenium and MERSCOPE demonstrate Segger's superior accuracy and efficiency over existing methods like Baysor and Cellpose, with faster training and easy adaptability to different datasets and technologies.

# How segger Works

![Segger Model](docs/images/Segger_model_08_2024.png)

---

## Installation
# Quick Links

To install Segger, clone this repository and install the required dependencies:
- 💾 **[Installation Guide](https://elihei2.github.io/segger_dev/installation/)**
Get started with installing segger on your machine.

```bash
git clone https://github.com/EliHei2/segger_dev.git
cd segger_dev
pip install -r requirements.txt
```
- 📖 **[User Guide](https://elihei2.github.io/segger_dev/user_guide/)**
Learn how to use segger for cell segmentation tasks.

Alternatively, you can create a conda environment using the provided `environment.yml` file:
- 💻 **[Command-Line Interface (CLI)](https://elihei2.github.io/segger_dev/cli/)**
Explore the CLI options for working with segger.

```bash
conda env create -f environment.yml
conda activate segger
```
- 📚 **[API Reference](https://elihei2.github.io/segger_dev/api/)**
Dive into the detailed API documentation for advanced usage.

## Download Pancreas Dataset
---

Download the Pancreas dataset from 10x Genomics:
# Why segger?

1. Go to the [Xenium Human Pancreatic Dataset Explorer](https://www.10xgenomics.com/products/xenium-human-pancreatic-dataset-explorer).
2. Download the `transcripts.csv.gz` and `nucleus_boundaries.csv.gz` files.
3. Place these files in a directory, e.g., `data_raw/pancreas`.
- ⚙️ **Highly parallelizable** – Optimized for multi-GPU environments
-**Fast and efficient** – Trains in a fraction of the time compared to alternatives
- 🔄 **Transfer learning** – Easily adaptable to new datasets and technologies

## Creating Dataset
### Challenges in Segmentation

To create a dataset for Segger, use the `create_data.py` script. The script takes several arguments to customize the dataset creation process.
Spatial omics segmentation faces issues like:

```bash
python create_data.py --transcripts_path data_raw/pancreas/transcripts.csv.gz --nuclei_path data_raw/pancreas/nucleus_boundaries.csv.gz --output_dir data_tidy/pyg_datasets/pancreas --d_x 180 --d_y 180 --x_size 200 --y_size 200 --r 3 --val_prob 0.1 --test_prob 0.1 --k_nc 3 --dist_nc 10 --k_tx 5 --dist_tx 3 --compute_labels True --sampling_rate 1
```
- **Over/Under-segmentation**
- **Transcript contamination**
- **Scalability limitations**

This command will process the Pancreas dataset and save the processed data in the specified output directory.
segger tackles these with a **graph-based approach**, achieving superior segmentation accuracy.

## Training
---
## Installation Options

To train the Segger model, use the `train.py` script. The script takes several arguments to customize the training process.
Choose the installation method that best suits your needs.

### Micromamba Installation

```bash
python train.py --train_dir data_tidy/pyg_datasets/pancreas/train_tiles/processed --val_dir data_tidy/pyg_datasets/pancreas/val_tiles/processed --test_dir data_tidy/pyg_datasets/pancreas/test_tiles/processed --epochs 100 --batch_size_train 4 --batch_size_val 4 --learning_rate 1e-3 --init_emb 8 --hidden_channels 64 --out_channels 16 --heads 4 --aggr sum --accelerator cuda --strategy auto --precision 16-mixed --devices 4 --default_root_dir ./models/pancreas
micromamba create -n segger-rapids --channel-priority 1 \
-c rapidsai -c conda-forge -c nvidia -c pytorch -c pyg \
rapids=24.08 python=3.* 'cuda-version>=11.4,<=11.8' jupyterlab \
'pytorch=*=*cuda*' 'pyg=*=*cu118' pyg-lib pytorch-sparse
micromamba install -n segger-rapids --channel-priority 1 --file mamba_environment.yml
micromamba run -n segger-rapids pip install --no-deps ./
```

This command will train the Segger model on the processed Pancreas dataset and save the trained model in the specified output directory.

## Prediction

To make predictions using a trained Segger model, use the `predict.py` script. The script takes several arguments to customize the prediction process.
### GitHub Installation

```bash
python predict.py --train_dir data_tidy/pyg_datasets/pancreas/train_tiles/processed --val_dir data_tidy/pyg_datasets/pancreas/val_tiles/processed --test_dir data_tidy/pyg_datasets/pancreas/test_tiles/processed --checkpoint_path ./models/pancreas/lightning_logs/version_0/checkpoints/epoch=99-step=100.ckpt --batch_size 1 --init_emb 8 --hidden_channels 64 --out_channels 16 --heads 4 --aggr sum --accelerator cuda --devices 1 --default_root_dir ./log_final --score_cut 0.5 --k_nc 4 --dist_nc 20 --k_tx 5 --dist_tx 10
git clone https://github.com/EliHei2/segger_dev.git
cd segger_dev
pip install .
```
---



---

# Powered by

This command will use the trained Segger model to make predictions on the Pancreas dataset and save the predictions in the specified output directory.
-**PyTorch Lightning & PyTorch Geometric**: Enables fast, efficient graph neural network (GNN) implementation for heterogeneous graphs.
- ⚙️ **Dask**: Scalable parallel processing and distributed task scheduling, ideal for handling large transcriptomic datasets.
- 🗺️ **Shapely & Geopandas**: Utilized for spatial operations such as polygon creation, scaling, and spatial relationship computations.
- 🖥️ **RAPIDS**: Provides GPU-accelerated computation for tasks like k-nearest neighbors (KNN) graph construction.
- 📊 **AnnData & Scanpy**: Efficient processing for single-cell datasets.
- 📐 **SciPy**: Facilitates spatial graph construction, including distance metrics and convex hull calculations for transcript clustering.

## Benchmarking
---

Benchmarking utilities are provided to evaluate the performance of the Segger model. You can find these utilities in the `benchmark` directory.
# Contributions

## Visualization
segger is **open-source** and welcomes contributions. Join us in advancing spatial omics segmentation!

Visualization scripts are also provided to help visualize the results. You can find these scripts in the `benchmark` directory.
- 🛠️ **Source Code**
[GitHub](https://github.com/EliHei2/segger_dev)

## License
- 🐞 **Bug Tracker**
[Report Issues](https://github.com/EliHei2/segger_dev/issues)

This project is licensed under the MIT License - see the LICENSE file for details.
- 📚 **Full Documentation**
[API Reference](https://elihei2.github.io/segger_dev/api/)

0 comments on commit b103e2d

Please sign in to comment.