Chargestate - dev container - dev guide (#59)

- Chargestate prediction data, models, and other resources - VSCode dev container for containerized development - Guide for development
wilhelm-lab · Feb 2, 2025 · 54a0983 · 54a0983
2 parents 109dc86 + 2576044
commit 54a0983
Show file tree

Hide file tree

Showing 23 changed files with 6,687 additions and 71 deletions.
diff --git a/.devcontainer/DEVELOPMENT_GUIDE.md b/.devcontainer/DEVELOPMENT_GUIDE.md
@@ -0,0 +1,241 @@
+# Development Guides for dlomix PyTorch Implementation
+
+This file provides guidelines for contributing PyTorch implementations to the dlomix project, a deep learning framework for proteomics.
+
+Based on your environment, please follow the respective setup guide:
+
+- [Dev Containers in VSCode](#dev-containers-in-vscode): Recommended if you would like to isolate everything in a Docker container. If you have Apple Silicon, local development would be a better option.
+- [Local Development Guide](#local-development-guide): Recommended if you have good command of your Python virtual environments, dependencies, etc..
+- [Google Colab Development Guide](#google-colab-development-guide): More explorative and would not provide full control on the development environment (temrinal, etc..)
+
+Other options: GitHub Codespaces or similar, please follow the local development guide.
+
+For contributing, please follow our [implementation guidelines](#implementation-guidelines)
+
+
+## Dev Containers in VSCode
+
+### Steps
+
+1. Ensure you have Docker installed on your system and the docker daemon is running. To validate, please run the following command and ensure you do not get a listing of CONTAINER ID and other details:
+```bash
+docker ps
+```
+2. If you don't yet have docker installed, follow these instructions: https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
+
+3. To run docker without sudo (VSCode requirement), follow these post-installation steps: https://docs.docker.com/engine/install/linux-postinstall/
+
+4. Open VSCode and install the Devcontainers extensions from the extensions tab
+
+5. Clone the forked GitHub repository of DLOmix https://github.com/omsh/dlomix
+
+6. Open the repository in a DevContainer by clicking on the arrows in the botton left corner, and choosing "Reopen in Container".
+
+![alt text](vscode-screenshot.png)
+
+7. During the first time, the container build will take some time and then VSCode will connect to the running container. Once it is done, please run the following command in the VSCode Terminal to install DLOmix with development packages in the editable mode:
+
+```bash
+make install-dev
+```
+8. You are now ready to make changes to the source code and see the impact directly. Once you make the changes, they should be reflected in the editable install inside your dev container.
+
+VSCode Official Tutorial: https://code.visualstudio.com/docs/devcontainers/tutorial
+VSCode documentaion for DevContainers: https://code.visualstudio.com/docs/devcontainers/containers
+
+
+## Local Development Guide
+
+### Environment Setup
+
+#### Option 1: Using venv (Recommended)
+
+1a. Create and activate a virtual environment:
+```bash
+python -m venv venv
+# On Windows
+.\venv\Scripts\activate
+# On Unix or MacOS
+source venv/bin/activate
+```
+
+#### Option 2: Using conda
+
+1b. Create and activate a conda environment:
+```bash
+conda create -n dlomix python=3.9
+conda activate dlomix
+```
+
+2. Clone the repository and `cd` into the directory of the cloned repo:
+```bash
+git clone https://github.com/omsh/dlomix.git
+cd dlomix
+```
+
+3. Install development dependencies and ensure torch-related packages are in this file, otherwise extend it:
+```bash
+pip install -r ./.devcontainer/dev-requirements.txt
+```
+
+### DLOmix Editable installation
+
+Install the package with the dev option and in editable mode:
+```bash
+pip install -e .[dev]
+```
+
+
+## Google Colab Development Guide
+
+### Initial Setup
+
+1. Create a new Colab notebook and mount your Google Drive:
+```python
+from google.colab import drive
+drive.mount('/content/drive')
+```
+
+2. Clone the forked dlomix repository:
+```bash
+!git clone https://github.com/omsh/dlomix.git
+```
+
+3. Install development dependencies and ensure torch-related packages are in this file, otherwise extend it:
+```bash
+pip install -r ./dlomix/.devcontainer/dev-requirements.txt
+```
+
+4. Install the package in development mode:
+```bash
+!pip install -e "./dlomix[dev]"
+```
+
+
+## Implementation Guidelines
+
+1. Add PyTorch implementations following the current project structure:
+```
+dlomix/
+├── models/
+│   ├── pytorch/
+│   │   ├── __init__.py
+│   │   └── model.py
+│   └── existing_models/
+```
+
+2. Ensure compatibility with existing APIs:
+```python
+# dlomix/models/pytorch/model.py
+import torch
+import torch.nn as nn
+
+# Example of maintaining consistent API
+class PrositRTPyTorch(nn.Module):
+    """PyTorch implementation of Prosit retention time model"""
+
+    def __init__(self, *args, **kwargs):
+        super().__init__()
+        # PyTorch implementation here
+
+    def forward(self, sequences):
+        # Maintain same input/output structure as TensorFlow version
+
+        return retention_times
+```
+
+3. Add corresponding tests:
+```python
+# tests/test_pytorch_models.py
+import torch
+import pytest
+from dlomix.models.pytorch import PrositRTPyTorch
+
+def test_model_compatibility():
+    tf_model = PrositRT()  # Existing TF implementation
+    pt_model = PrositRTPyTorch()
+
+    # Test with same input
+    sequence_input = "PEPTIDE"
+    tf_output = tf_model.predict(sequence_input)
+    pt_output = pt_model(torch.tensor(encoded_sequence))
+
+    assert tf_output.shape == pt_output.detach().numpy().shape
+
+def test_model_forward_pass():
+    model = PrositRTPyTorch()
+    expected_shape = (128, 1)
+    input_size = 30
+
+    x = torch.randn(128, input_size)  # Match existing input dimensions
+    output = model(x)
+    assert output.shape == expected_shape
+```
+
+4. Add a usage example of the new PyTorch implementation, preferably in a notebook under `./notebooks`
+
+
+### Development Workflow
+
+#### (Optional, but recommended) Pre-commit hooks
+We use some simple pre-commit hooks to ensure consistency in file and code formatting. To use pre-commit hooks:
+- install pre-commit with `pip install pre-commit`
+- add the hooks by running in the root directory of the repo `pre-commit install`
+- If you like, you can manually run the checks after staging but before commiting using `pre-commit run` to run the hooks against youur changes.
+
+1. Create a new branch:
+```bash
+git checkout -b feature/FEATURE_NAME
+```
+
+2. Add your implementation
+
+3. Write tests under `./tests` to ensure your code runs as expected.
+
+4. Run the test suite using make:
+```bash
+make test-local
+```
+
+For google Colab you can run:
+```bash
+!python -m pytest tests/
+```
+
+5. Format your code using the project's style guidelines:
+```bash
+make format
+```
+
+6. Create a pull request with:
+- Clear description of changes
+- Any new dependencies added
+- Mention the usage example under `./notebooks`
+
+
+### General Considerations
+
+1. Sequence Data :
+   - Assume the same sequence encoding schemes
+
+2. Model Architecture:
+   - Closely mimic the existing Keras implementations or the original implementations of papers
+   - Maintain similar model inputs and outputs (datatype, shape, etc..)
+
+### Resources
+
+#### PyTorch
+- PyTorch Installation https://pytorch.org/get-started/locally/
+- PyTorch Documentation, please always ensure  you have the right version on the top left corner https://pytorch.org/docs/stable/index.html
+
+### Keras and TensorFlow
+- TensorFlow API Documentation 2.15 (Version 2.16 introduced some breaking changes with respect to Keras) https://www.tensorflow.org/versions/r2.15/api_docs/python/tf
+- TensorFlow Keras Guide https://www.tensorflow.org/guide/keras
+
+### HuggingFace Datasets
+
+- PROSPECT PTMs is available on HuggingFace for Retention time, Fragment ion intensity, and Charge state prediction https://huggingface.co/collections/Wilhelmlab/prospect-ptms-665db48431a7e844634660ba
+
+
+### Python Environments
+- If you like to use conda, try out miniforge https://github.com/conda-forge/miniforge
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
@@ -0,0 +1,31 @@
+FROM python:3.9-slim
+
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONDONTWRITEBYTECODE=1
+
+# Install extra development dependencies
+COPY dev-requirements.txt /tmp/dev-requirements.txt
+RUN pip install --upgrade --no-cache-dir pip && \
+    pip install --no-cache-dir -r /tmp/dev-requirements.txt
+
+# Install system dependencies
+RUN set -ex && \
+    apt-get update && \
+    apt-get install -y --no-install-recommends \
+    build-essential \
+    bash-completion \
+    git \
+    openssh-client \
+    ca-certificates \
+    rsync \
+    vim \
+    nano \
+    wget \
+    curl \
+    && update-ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+
+# Set the working directory
+WORKDIR /workspaces/dlomix
+
+USER root
diff --git a/.devcontainer/dev-requirements.txt b/.devcontainer/dev-requirements.txt
@@ -0,0 +1,5 @@
+# comment the following line to install cuda with pytorch, otherwise if left uncommented it will install cpu version
+--index-url https://download.pytorch.org/whl/cpu
+
+torch
+#wandb >= 0.15 # enable this line to install wandb
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -0,0 +1,34 @@
+{
+    "name": "DLOmix Dev Container",
+    "build": {
+        "dockerfile": "Dockerfile"
+    },
+    "runArgs": [
+        "--platform=linux/amd64"
+    ],
+    "remoteUser": "root",
+    "containerUser": "root",
+    "workspaceFolder": "/workspaces/dlomix",
+    "workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/dlomix,type=bind,consistency=cached",
+    "customizations": {
+        "vscode": {
+            "extensions": [
+                "ms-azuretools.vscode-docker",
+                "eamodio.gitlens",
+                "ms-python.python",
+                "ms-python.black-formatter",
+                "ms-python.vscode-pylance",
+                "tamasfe.even-better-toml",
+                "ms-toolsai.jupyter"
+            ],
+            "settings": {
+                "git.path": "/usr/bin/git",
+                "[python]": {
+                    "python.pythonPath": "/usr/local/bin/python",
+                    "editor.defaultFormatter": "ms-python.black-formatter"
+                }
+            },
+            "postCreateCommand": "pip install -e .[dev]"
+        }
+    }
+}
diff --git a/.devcontainer/vscode-screenshot.png b/.devcontainer/vscode-screenshot.png
diff --git a/.gitignore b/.gitignore
@@ -163,7 +163,7 @@ run_scripts/*.index
 run_scripts/*.data-*
 run_scripts/*.csv
 run_scripts/*.pkl
-
+run_scripts/output/
 
 # test assets (will be downloaded the first time tests are run and then ignore by git)
 assets/