Skip to content

Commit

Permalink
Chargestate - dev container - dev guide (#59)
Browse files Browse the repository at this point in the history
- Chargestate prediction data, models, and other resources
- VSCode dev container for containerized development
- Guide for development
  • Loading branch information
omsh authored Feb 2, 2025
2 parents 109dc86 + 2576044 commit 54a0983
Show file tree
Hide file tree
Showing 23 changed files with 6,687 additions and 71 deletions.
241 changes: 241 additions & 0 deletions .devcontainer/DEVELOPMENT_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
# Development Guides for dlomix PyTorch Implementation

This file provides guidelines for contributing PyTorch implementations to the dlomix project, a deep learning framework for proteomics.

Based on your environment, please follow the respective setup guide:

- [Dev Containers in VSCode](#dev-containers-in-vscode): Recommended if you would like to isolate everything in a Docker container. If you have Apple Silicon, local development would be a better option.
- [Local Development Guide](#local-development-guide): Recommended if you have good command of your Python virtual environments, dependencies, etc..
- [Google Colab Development Guide](#google-colab-development-guide): More explorative and would not provide full control on the development environment (temrinal, etc..)

Other options: GitHub Codespaces or similar, please follow the local development guide.

For contributing, please follow our [implementation guidelines](#implementation-guidelines)


## Dev Containers in VSCode

### Steps

1. Ensure you have Docker installed on your system and the docker daemon is running. To validate, please run the following command and ensure you do not get a listing of CONTAINER ID and other details:
```bash
docker ps
```
2. If you don't yet have docker installed, follow these instructions: https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository

3. To run docker without sudo (VSCode requirement), follow these post-installation steps: https://docs.docker.com/engine/install/linux-postinstall/

4. Open VSCode and install the Devcontainers extensions from the extensions tab

5. Clone the forked GitHub repository of DLOmix https://github.com/omsh/dlomix

6. Open the repository in a DevContainer by clicking on the arrows in the botton left corner, and choosing "Reopen in Container".

![alt text](vscode-screenshot.png)

7. During the first time, the container build will take some time and then VSCode will connect to the running container. Once it is done, please run the following command in the VSCode Terminal to install DLOmix with development packages in the editable mode:

```bash
make install-dev
```
8. You are now ready to make changes to the source code and see the impact directly. Once you make the changes, they should be reflected in the editable install inside your dev container.

VSCode Official Tutorial: https://code.visualstudio.com/docs/devcontainers/tutorial
VSCode documentaion for DevContainers: https://code.visualstudio.com/docs/devcontainers/containers


## Local Development Guide

### Environment Setup

#### Option 1: Using venv (Recommended)

1a. Create and activate a virtual environment:
```bash
python -m venv venv
# On Windows
.\venv\Scripts\activate
# On Unix or MacOS
source venv/bin/activate
```

#### Option 2: Using conda

1b. Create and activate a conda environment:
```bash
conda create -n dlomix python=3.9
conda activate dlomix
```

2. Clone the repository and `cd` into the directory of the cloned repo:
```bash
git clone https://github.com/omsh/dlomix.git
cd dlomix
```

3. Install development dependencies and ensure torch-related packages are in this file, otherwise extend it:
```bash
pip install -r ./.devcontainer/dev-requirements.txt
```

### DLOmix Editable installation

Install the package with the dev option and in editable mode:
```bash
pip install -e .[dev]
```


## Google Colab Development Guide

### Initial Setup

1. Create a new Colab notebook and mount your Google Drive:
```python
from google.colab import drive
drive.mount('/content/drive')
```

2. Clone the forked dlomix repository:
```bash
!git clone https://github.com/omsh/dlomix.git
```

3. Install development dependencies and ensure torch-related packages are in this file, otherwise extend it:
```bash
pip install -r ./dlomix/.devcontainer/dev-requirements.txt
```

4. Install the package in development mode:
```bash
!pip install -e "./dlomix[dev]"
```


## Implementation Guidelines

1. Add PyTorch implementations following the current project structure:
```
dlomix/
├── models/
│ ├── pytorch/
│ │ ├── __init__.py
│ │ └── model.py
│ └── existing_models/
```

2. Ensure compatibility with existing APIs:
```python
# dlomix/models/pytorch/model.py
import torch
import torch.nn as nn

# Example of maintaining consistent API
class PrositRTPyTorch(nn.Module):
"""PyTorch implementation of Prosit retention time model"""

def __init__(self, *args, **kwargs):
super().__init__()
# PyTorch implementation here

def forward(self, sequences):
# Maintain same input/output structure as TensorFlow version

return retention_times
```

3. Add corresponding tests:
```python
# tests/test_pytorch_models.py
import torch
import pytest
from dlomix.models.pytorch import PrositRTPyTorch

def test_model_compatibility():
tf_model = PrositRT() # Existing TF implementation
pt_model = PrositRTPyTorch()

# Test with same input
sequence_input = "PEPTIDE"
tf_output = tf_model.predict(sequence_input)
pt_output = pt_model(torch.tensor(encoded_sequence))

assert tf_output.shape == pt_output.detach().numpy().shape

def test_model_forward_pass():
model = PrositRTPyTorch()
expected_shape = (128, 1)
input_size = 30

x = torch.randn(128, input_size) # Match existing input dimensions
output = model(x)
assert output.shape == expected_shape
```

4. Add a usage example of the new PyTorch implementation, preferably in a notebook under `./notebooks`


### Development Workflow

#### (Optional, but recommended) Pre-commit hooks
We use some simple pre-commit hooks to ensure consistency in file and code formatting. To use pre-commit hooks:
- install pre-commit with `pip install pre-commit`
- add the hooks by running in the root directory of the repo `pre-commit install`
- If you like, you can manually run the checks after staging but before commiting using `pre-commit run` to run the hooks against youur changes.

1. Create a new branch:
```bash
git checkout -b feature/FEATURE_NAME
```

2. Add your implementation

3. Write tests under `./tests` to ensure your code runs as expected.

4. Run the test suite using make:
```bash
make test-local
```

For google Colab you can run:
```bash
!python -m pytest tests/
```

5. Format your code using the project's style guidelines:
```bash
make format
```

6. Create a pull request with:
- Clear description of changes
- Any new dependencies added
- Mention the usage example under `./notebooks`


### General Considerations

1. Sequence Data :
- Assume the same sequence encoding schemes

2. Model Architecture:
- Closely mimic the existing Keras implementations or the original implementations of papers
- Maintain similar model inputs and outputs (datatype, shape, etc..)

### Resources

#### PyTorch
- PyTorch Installation https://pytorch.org/get-started/locally/
- PyTorch Documentation, please always ensure you have the right version on the top left corner https://pytorch.org/docs/stable/index.html

### Keras and TensorFlow
- TensorFlow API Documentation 2.15 (Version 2.16 introduced some breaking changes with respect to Keras) https://www.tensorflow.org/versions/r2.15/api_docs/python/tf
- TensorFlow Keras Guide https://www.tensorflow.org/guide/keras

### HuggingFace Datasets

- PROSPECT PTMs is available on HuggingFace for Retention time, Fragment ion intensity, and Charge state prediction https://huggingface.co/collections/Wilhelmlab/prospect-ptms-665db48431a7e844634660ba


### Python Environments
- If you like to use conda, try out miniforge https://github.com/conda-forge/miniforge
31 changes: 31 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
FROM python:3.9-slim

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Install extra development dependencies
COPY dev-requirements.txt /tmp/dev-requirements.txt
RUN pip install --upgrade --no-cache-dir pip && \
pip install --no-cache-dir -r /tmp/dev-requirements.txt

# Install system dependencies
RUN set -ex && \
apt-get update && \
apt-get install -y --no-install-recommends \
build-essential \
bash-completion \
git \
openssh-client \
ca-certificates \
rsync \
vim \
nano \
wget \
curl \
&& update-ca-certificates \
&& rm -rf /var/lib/apt/lists/*

# Set the working directory
WORKDIR /workspaces/dlomix

USER root
5 changes: 5 additions & 0 deletions .devcontainer/dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# comment the following line to install cuda with pytorch, otherwise if left uncommented it will install cpu version
--index-url https://download.pytorch.org/whl/cpu

torch
#wandb >= 0.15 # enable this line to install wandb
34 changes: 34 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"name": "DLOmix Dev Container",
"build": {
"dockerfile": "Dockerfile"
},
"runArgs": [
"--platform=linux/amd64"
],
"remoteUser": "root",
"containerUser": "root",
"workspaceFolder": "/workspaces/dlomix",
"workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/dlomix,type=bind,consistency=cached",
"customizations": {
"vscode": {
"extensions": [
"ms-azuretools.vscode-docker",
"eamodio.gitlens",
"ms-python.python",
"ms-python.black-formatter",
"ms-python.vscode-pylance",
"tamasfe.even-better-toml",
"ms-toolsai.jupyter"
],
"settings": {
"git.path": "/usr/bin/git",
"[python]": {
"python.pythonPath": "/usr/local/bin/python",
"editor.defaultFormatter": "ms-python.black-formatter"
}
},
"postCreateCommand": "pip install -e .[dev]"
}
}
}
Binary file added .devcontainer/vscode-screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ run_scripts/*.index
run_scripts/*.data-*
run_scripts/*.csv
run_scripts/*.pkl

run_scripts/output/

# test assets (will be downloaded the first time tests are run and then ignore by git)
assets/
Expand Down
Loading

0 comments on commit 54a0983

Please sign in to comment.