Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5th sub-PR of #117] update setup process for CC #179

Merged
merged 1 commit into from
Aug 21, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/setup/setup-cc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Setup Capreolus on Compute Canada

This page contains instructions to set up Capreolus on Compute Canada (CC).
Please follow [this guide](https://github.com/castorini/onboarding/blob/master/docs/cc-guide.md) to create an account on Compute Canada.

This instruction assume the users have anaconda or miniconda installed.

## Capreolus Installation
To setup, clone the repo and run the following scripts under the top-level capreolus:
```bash
git clone https://github.com/capreolus-ir/capreolus && cd capreolus

module load java/11
module load python/3.7
module load scipy-stack

ENVDIR=$HOME/venv/capreolus-env
virtualenv --no-download $ENVDIR
source $ENVDIR/bin/activate

pip install tf-models-official==2.5 tensorflow-ranking==0.4.2
cat requirements.txt | cut -d '#' -f 1 | grep "\S" | xargs -n 1 -i sh -c 'pip install --no-index {} || pip install {}'
pip install --no-index torch==1.9.0 spacy==2.2.2
```

## Pre-download Huggingface models
In case the server has no internet access, you can use the script `./scripts/download_model.sh` to pre-download huggingface models:
```bash
sh ./scripts/download_model.sh $model_name
```
The model will then be downloaded to the current directory.
You can then pass the model directory to Capreolus via:
```
task.reranker.pretrained=/path/to/model
task.reranker.extractor.tokenizer.pretrained=/path/to/model`
```

## Pre-download MS MARCO Passage Dataset
**After** specifying the `$CAPREOLUS_CACHE` and `$CAPREOLUS_RESULT`
(For CC users, they should be set under `/scratch/your_user_name` since the cache and results can take a huge amount of space),
run `sh download_data.sh` to pre-download the needed data for MS MARCO Passage dataset.
```bash
export CAPREOLUS_CACHE=/scratch/your_username/.capreolus/cache
export CAPREOLUS_RESULTS=/scratch/your_username/.capreolus/results
sh ./scripts/download_data.sh
```