Skip to content

SanoScience/neardata-fl-for-transcriptomics

Repository files navigation

Neardata Fl For Transcriptomics


Table of Contents

Environment setup

Compile the requirements.in file to get requirements.txt

pip-compile requirements.in

Create and activate your virtual environment.

python -m venv venv
source venv/bin/activate

Now install packages from the requirements.txt file.

pip install -r requirements.txt

Install also neardata-fl-for-transcriptomics from source to run scripts from the root dir.

pip install .

We use NeptuneAI as the MLOps tool. The api token should be stored in .env file, so that python-dotenv package can load it as an env variable. The .env file should have the following entry, where 'xxx' is the api token:

NEPTUNE_API_TOKEN=xxx

Running flower tutorial example

Running locally

First, run the data split service, which is a http server that assigns samples to each client (example for a dataset with 60000 samples):

python3 fl/flower_tutorial/scripts/run_data_split_service.py --service-ip localhost --n-samples=60000 --n-splits 3 --manual-seed 1

You can instantiate the FL server by running:

python3 fl/flower_tutorial/scripts/run_server.py --num-clients 3 --server-ip localhost --num-rounds 50 --num-local-epochs 1

The --num-clients argument is the minimum number of clients the federated learning round can start with. The --server-ip argument is the address of the server. The client can be instantiated by running:

python3 fl/flower_tutorial/scripts/run_client.py --server-ip localhost --data-split-service-ip localhost 

Clients and server are using port 8081.

Running through SLURM

Setting up the environment

This script will create a virtual environment that can be used to run the tutorial example.

SBATCH run_configure_venv.sh

To run a FL workflow, you can use:

sbatch -n 4 run_flower_tutorial.sh 3

This script will run a parallel job a server + 3 clients on 4 nodes.

Running through Docker Compose

You can run the workflow using Docker Compose. First, the data split service:

docker compose -f ./docker/flower_tutorial/docker-compose.yml up -d data-split-service

Second, instantiate the server container:

docker compose -f ./docker/flower_tutorial/docker-compose.yml up --build -d server

Then, you can build and start client containers:

docker compose -f ./docker/flower_tutorial/docker-compose.yml up --build -d --scale client=4

The --scale flag will allow you to instantiate a number of client containers, in this case, 4.

Running genotypes use case

Running locally

First, run the data split service, which is a http server that assigns samples to each client (example for a dataset with 60000 samples):

python3 fl/genotypes/scripts/run_data_split_service.py --service-ip localhost --n-samples=400 --n-splits 2 --manual-seed 1

You can instantiate the FL server by running:

python3 fl/genotypes/scripts/run_server.py --num-clients 2 --server-ip localhost --num-rounds 2 --num-local-epochs 1

The --num-clients argument is the minimum number of clients the federated learning round can start with. The --server-ip argument is the address of the server. The client can be instantiated by running:

python3 fl/genotypes/scripts/run_client.py --server-ip localhost --data-split-service-ip localhost 

Clients and server are using port 8081.

Running through Docker Compose

You can run the workflow using Docker Compose. First, the data split service:

docker compose -f ./docker/genotypes/docker-compose.yml up -d data-split-service

Second, instantiate the server container:

docker compose -f ./docker/genotypes/docker-compose.yml up -d server

Then, you can build and start client containers:

docker compose -f ./docker/genotypes/docker-compose.yml up -d --scale client=2

The --scale flag will allow you to instantiate a number of client containers, in this case, 4.

License

This project is licensed under the MIT License

About

FL system for transcriptomics use case in the Neardata project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published