Skip to content

Commit

Permalink
move the location of sample_slurm_scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
crystina-z committed Aug 22, 2021
1 parent dfdcdf4 commit 569276e
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 32 deletions.
16 changes: 13 additions & 3 deletions docs/reproduction/MS_MARCO.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Once the environment is set, you can verify the installation with [these instruc
## Running MS MARCO
This requires GPU(s) with 48GB memory (e.g. 3 V100 or a RTX 8000) or a TPU.
1. Make sure you are in the top-level `capreolus` directory;
2. Use the following script to quickly test if everything is working.
2. Use the following script to run a "mini" version of the MS MARCO fine-tuning, testing if everything is working.
```bash
python -m capreolus.run rerank.train with file=docs/reproduction/config_msmarco.txt
```
Expand All @@ -20,14 +20,15 @@ This requires GPU(s) with 48GB memory (e.g. 3 V100 or a RTX 8000) or a TPU.

3. Once the above is done, we can fine-tune a full version on MS MARCO Passage using the following scripts:
```bash
# must change
niters=10
batch_size=16
validatefreq=$niters # to ensure the validation is run only at the end of training
decayiters=$niters # either same with $itersize or 0
threshold=1000 # the top-k documents to rerank
python -m capreolus.run rerank.train with \
file=docs/reproduction/config_msmarco.txt \
threshold=$threshold \
reranker.trainer.niters=$niters \
reranker.trainer.batch=$batch_size \
reranker.trainer.decayiters=$decayiters \
Expand All @@ -38,7 +39,16 @@ This requires GPU(s) with 48GB memory (e.g. 3 V100 or a RTX 8000) or a TPU.
After data is prepared, it would take 4~6 hours to train and 6~10 hours to inference with *4 V100s* for BERT-base.
This should achieve `MRR@10=0.35+`.

### For CC slurm users:
In case you are new to [slurm](https://slurm.schedmd.com/documentation.html), a sample slurm script for the *full version* fine-tuning could be found under `docs/reproduction/sample_slurm_script.sh`.
This should work on `cedar` directly via `sbatch sample_slurm_script.sh`.
To adapt it to the `mini` version, simply change the GPU number and request time into:
```
#SBATCH --gres=gpu:v100l:1
#SBATCH --time=24:00:00
```

## Replication Logs
+ Results (with hypperparameter-0) replicated by [@crystina-z](https://github.com/crystina-z) on 2020-12-06 (commit [`6c3759f`](https://github.com/crystina-z/capreolus-1/commit/6c3759fe620f18f8939670176a18c744752bc9240)) (Tesla V100 on Compute Canada)
+ Results (with hypperparameter-6) replicated by [@Dahlia-Chehata](https://github.com/Dahlia-Chehata) on 2021-03-29 (commit [`7915aad`](https://github.com/capreolus-ir/capreolus/commit/7915aad75406527a3b88498926cff85259808696)) (Tesla V100 on Compute Canada)
+ Results (MRR@10=0.356) replicated by [@andrewyguo](https://github.com/andrewyguo) on 2021-05-29 (commit [`1ce71d9`](https://github.com/capreolus-ir/capreolus/commit/1ce71d93ab5473b40d4ae02768fd053261b27320)) (Tesla V100 on Compute Canada)
+ Results (MRR@10=0.356) replicated by [@andrewyguo](https://github.com/andrewyguo) on 2021-05-29 (commit [`1ce71d9`](https://github.com/capreolus-ir/capreolus/commit/1ce71d93ab5473b40d4ae02768fd053261b27320)) (Tesla V100 on Compute Canada)
32 changes: 32 additions & 0 deletions docs/reproduction/sample_slurm_script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
#SBATCH --job-name=msmarcopsg
#SBATCH --nodes=1
#SBATCH --gres=gpu:v100l:4
#SBATCH --ntasks-per-node=1
#SBATCH --mem=120GB
#SBATCH --time=72:00:00
#SBATCH --account=your_slurm_account
#SBATCH --cpus-per-task=16

#SBATCH -o ./msmarco-psg-output.log


# Modify the following lines according to your setup process
module load arch/avx512 StdEnv/2018.3 java/11 python/3.7 scipy-stack
ENVDIR=$HOME/venv/capreolus-env
source $ENVDIR/bin/activate

niters=10
batch_size=16
validatefreq=$niters # to ensure the validation is run only at the end of training
decayiters=$niters # either same with $itersize or 0
threshold=1000 # the top-k documents to rerank

python -m capreolus.run rerank.train with \
file=docs/reproduction/config_msmarco.txt \
threshold=$threshold \
reranker.trainer.niters=$niters \
reranker.trainer.batch=$batch_size \
reranker.trainer.decayiters=$decayiters \
reranker.trainer.validatefreq=$validatefreq \
fold=s1
29 changes: 0 additions & 29 deletions docs/setup/scripts/sample_slurm_script.sh

This file was deleted.

0 comments on commit 569276e

Please sign in to comment.