Skip to content

Commit

Permalink
Merge pull request #709 from argonne-lcf/feature/updates_for_SN_samba…
Browse files Browse the repository at this point in the history
…flow_1.23.5

Feature/updates for sn sambaflow 1.23.5
  • Loading branch information
felker authored Feb 5, 2025
2 parents ce9d46e + 2178559 commit 36c4029
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 22 deletions.
10 changes: 5 additions & 5 deletions docs/ai-testbed/sambanova/example-modelzoo-programs.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ cd ~/sambanova/modelzoo
export TARGET_SAMBAFLOW_VERSION=$((rpm -q sambanova-runtime 2>/dev/null || dpkg -s sambanova-runtime 2>/dev/null) | egrep -m 1 -o "[0-9]+\.[0-9]+\.[0-9]+")
echo $TARGET_SAMBAFLOW_VERSION
# should be of the form 1.19.1
./start_container.sh -b /data/ANL/openwebtext/hdf5/hdf5:/opt/datasets/openweb_hdf54096/ -b /software:/software / /software/sambanova/singularity/images/llm-modelzoo/Modelzoo/ModelzooDevbox_1.sif
./start_container.sh -b /data/ANL/openwebtext/hdf5/hdf5:/opt/datasets/openweb_hdf54096/ -b /software:/software / /software/sambanova/singularity/images/llm-modelzoo/Modelzoo/ModelzooDevbox_0.2.0.sif
```
Container startup output should look like:
```
Expand Down Expand Up @@ -102,7 +102,7 @@ python ./modelzoo/examples/nlp/text_generation/rdu_generate_text.py \
command=compile \
checkpoint.model_name_or_path=/software/models/Llama-2-7b-hf/ \
samba_compile.output_folder=/home/$(whoami)/sambanova/out_generation \
+samba_compile.target_sambaflow_version=$TARGET_SAMBAFLOW_VERSION # =1.19.1
+samba_compile.target_sambaflow_version=LATEST
```

Note: each compile will add a new subdirectory to the ouput folder (`/home/$(whoami)/sambanova/out_generation`), containing compile artifacts. The folder can be deleted when testing is complete;
Expand Down Expand Up @@ -184,9 +184,9 @@ python ./modelzoo/examples/nlp/training/utils/convert_ultrachat.py -src ultracha
mv ~/sambanova/ultrachat_processed.jsonl ~/sambanova/ultrachat_processed_full.jsonl
head -1000 ~/sambanova/ultrachat_processed_full.jsonl > ~/sambanova/ultrachat_processed.jsonl
# This step makes a directory of hdf5 files from the single jsonl file
export TOKENIZER="./Llama-2-7b-hf"
export TOKENIZER="meta-llama/Llama-2-7b-hf"
export MAX_SEQ_LENGTH=4096
python -m generative_data_prep pipeline --input_file_path=./ultrachat_processed.jsonl --output_path=./ultrachat_dialogue --pretrained_tokenizer=${TOKENIZER} --max_seq_length=${MAX_SEQ_LEN}
python -m generative_data_prep pipeline --input_file_path=./ultrachat_processed.jsonl --output_path=./ultrachat_dialogue --pretrained_tokenizer=${TOKENIZER} --max_seq_length=${MAX_SEQ_LENGTH}
deactivate
```

Expand Down Expand Up @@ -227,7 +227,7 @@ python modelzoo/examples/nlp/training/rdu_train_llm.py \
training.batch_size=${BATCH_SIZE} \
samba_compile.arch=${ARCH} \
samba_compile.output_folder=/home/$(whoami)/sambanova/out_train \
+samba_compile.target_sambaflow_version=$TARGET_SAMBAFLOW_VERSION
+samba_compile.target_sambaflow_version=LATEST
```

Note: each compile will add a new subdirectory to the ouput folder (`/home/$(whoami)/sambanova/out_train`), containing compile artifacts. The folder can be deleted when testing is complete;
Expand Down
2 changes: 1 addition & 1 deletion docs/ai-testbed/sambanova/example-multi-node-programs.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ cd ~/nlp-multiNodetest
### Create and run Gpt1.5B_compile.sh and Gpt1.5B_run.sh

Create the files **Gpt1.5B_compile.sh** and **Gpt1.5B_run.sh** in the current directory.
Copy the contents of [Gpt1.5B_compile.sh](./files/Gpt1.5B_compile.sh) and [Gpt1.5B_run.sh](./files/Gpt1.5B_run.sh). Alternatively, the files can be accessed at `/data/ANL/scripts/Gpt1.5B_compile.sh` and `/data/ANL/scripts/Gpt1.5B_run.sh` on any of the compute node and can be copied over to the working directory.
Copy the contents of [Gpt1.5B_compile.sh](./files/Gpt1.5B_compile.sh) and [Gpt1.5B_run.sh](./files/Gpt1.5B_run.sh). Alternatively, the files can be accessed at `/data/ANL/scripts/1.23.5-46/legacy_models/Gpt1.5B_compile.sh` and `/data/ANL/scripts/1.23.5-46/legacy_models/Gpt1.5B_run.sh` on any of the compute node and can be copied over to the working directory.

### Compile and Run

Expand Down
14 changes: 9 additions & 5 deletions docs/ai-testbed/sambanova/example-programs.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,10 @@ Run these commands for training (compile + train):
The `compile` and `run` arguments of the script can only be run with number of instances equal to 1, indicating that this is a simple 4 tile run without data parallel framework.
For a image size of 256x256 and batch size 256 when running just 1 instance, the commands are provided as follows.

!!! note

The compilation runs for over 30 minutes.

```bash
./Unet2d.sh compile 256 256 1 unet2d_single_compile
./Unet2d.sh run 256 256 1 unet2d_single_run
Expand All @@ -284,10 +288,10 @@ The performance data is located at the bottom of log file.
inner train loop time : 374.6789753437042 for 10 epochs, number of global steps: 130, e2e samples_per_sec: 88.82270474202953
```

## Gpt 1.5B
## GPT 1.5B

The Gpt 1.5B application example is provided in the the path : `/opt/sambaflow/apps/nlp/transformers_on_rdu/`.
The scripts containing the `compile` and `run` commands for Gpt1.5B model can be accessed at the path `/data/ANL/scripts/Gpt1.5B_base_single_compile.sh` and `/data/ANL/scripts/Gpt1.5B_base_single_run.sh` on any SN30 compute node. This script is compiled and run for only 1 instance and the model fits on 4 tiles or half of a RDU. The scripts are provided for reference.
The GPT 1.5B application example is provided in the the path : `/opt/sambaflow/apps/nlp/transformers_on_rdu/`.
The scripts containing the `compile` and `run` commands for the GPT 1.5B model can be accessed at the path `/data/ANL/scripts/1.23.5-46/legacy_models/Gpt1.5B_base_single_compile.sh` and `/data/ANL/scripts/1.23.5-46/legacy_models/Gpt1.5B_base_single_run.sh` on any SN30 compute node. This script is compiled and run for only 1 instance and the model fits on 4 tiles or half of a RDU. The scripts are provided for reference.

Change directory and copy files.

Expand All @@ -303,8 +307,8 @@ to a file with the same names into the current directory using your favorite edi
or copy the contents from `/data/ANL/scripts/Gpt1.5B_base_single_compile.sh` and `/data/ANL/scripts/Gpt1.5B_base_single_run.sh`.

```bash
cp /data/ANL/scripts/Gpt1.5B_base_single_compile.sh ~/apps/nlp/Gpt1.5B_single/
cp /data/ANL/scripts/Gpt1.5B_base_single_run.sh ~/apps/nlp/Gpt1.5B_single/
cp /data/ANL/scripts/1.23.5-46/legacy_models/Gpt1.5B_base_single_compile.sh ~/apps/nlp/Gpt1.5B_single/
cp /data/ANL/scripts/1.23.5-46/legacy_models/Gpt1.5B_base_single_run.sh ~/apps/nlp/Gpt1.5B_single/
```

Run the script with batch size as an argument(shown below with an example of 32).
Expand Down
12 changes: 5 additions & 7 deletions docs/ai-testbed/sambanova/files/Unet2d.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ export OMP_NUM_THREADS=16
#fi
if [ -e /opt/sambaflow/apps/image/unet ] ; then
UNET=/opt/sambaflow/apps/image/unet
elif [ -e /opt/sambaflow/apps/vision/segmentation/compile.py ] ; then
UNET=/opt/sambaflow/apps/vision/segmentation/
elif [ -e /opt/sambaflow/apps/image/segmentation ] ; then
UNET=/opt/sambaflow/apps/image/segmentation/
else
Expand Down Expand Up @@ -72,9 +74,7 @@ if [ "${1}" == "compile" ] ; then
rm ${OUTDIR}/unet_train_${BS}_${2}_single/unet_train_${BS}_${2}_single_${NUM_TILES}.pef
fi
if [ -e ${UNET}/compile.py ] ; then
COMMAND="python ${UNET}/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_single_${NUM_TILES} --output-folder=${OUTDIR}"
#1.15 python ${UNET}/compile.py compile -b ${BS} --num-classes 2 --num-flexible-classes -1 --in-channels=3 --init-features 32 --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_depth2colb.json --enable-stoc-rounding --num-tiles ${NUM_TILES} --pef-name="unet_train_${BS}_${2}_single_${NUM_TILES}" > compile_${BS}_${2}_single_${NUM_TILES}.log 2>&1

COMMAND="python ${UNET}/compile.py compile --init-features 32 --in-channels=3 --num-classes 2 --num-flexible-classes 1 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --mac-human-decision /opt/sambaflow/apps/vision/segmentation/jsons/hd_files/hd_unet_256_depth2colb.json --compiler-configs-file /opt/sambaflow/apps/vision/segmentation/jsons/compiler_configs/unet_compiler_configs_depth2colb.json --enable-stoc-rounding --num-tiles=4 --pef-name=unet_train_${BS}_${2}_single_${NUM_TILES} --output-folder=${OUTDIR}"
else
#old
COMMAND="python ${UNET}/unet.py compile -b ${BS} --in-channels=${NUM_WORKERS} --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_tgm.json --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --pef-name="unet_train_${BS}_${2}_single" > compile_${BS}_${2}_single.log 2>&1"
Expand All @@ -90,9 +90,7 @@ elif [ "${1}" == "pcompile" ] ; then
rm ${OUTDIR}/unet_train_${BS}_${2}_NP_${NUM_TILES}/unet_train_${BS}_${2}_NP_${NUM_TILES}.pef
fi
if [ -e ${UNET}/hook.py ] ; then
#python ${UNET}/compile.py compile -b ${BS} --num-classes 2 --num-flexible-classes -1 --in-channels=3 --init-features 32 --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_depth2colb.json --enable-stoc-rounding --num-tiles ${NUM_TILES} --pef-name="unet_train_${BS}_${2}_NP_${NUM_TILES}" --data-parallel -ws 2 > compile_${BS}_${2}_NP_${NUM_TILES}.log 2>&1
#1.16.2
COMMAND="python /opt/sambaflow/apps/image/segmentation/compile.py compile --mac-v2 --in-channels=3 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --num-tiles=4 --pef-name=unet_train_${BS}_${2}_NP_${NUM_TILES} --data-parallel -ws 2 --output-folder=${OUTDIR}"
COMMAND="python /opt/sambaflow/apps/vision/segmentation/compile.py compile --init-features 32 --in-channels=3 --num-classes 2 --num-flexible-classes 1 --in-width=${2} --in-height=${2} --batch-size=${BS} --enable-conv-tiling --mac-human-decision /opt/sambaflow/apps/vision/segmentation/jsons/hd_files/hd_unet_256_depth2colb.json --compiler-configs-file /opt/sambaflow/apps/vision/segmentation/jsons/compiler_configs/unet_compiler_configs_depth2colb.json --num-tiles=4 --pef-name=unet_train_${BS}_${2}_NP_${NUM_TILES} --data-parallel -ws 2 --output-folder=${OUTDIR}"
else
COMMAND="python ${UNET}/unet.py compile -b ${BS} --in-channels=${NUM_WORKERS} --in-width=${2} --in-height=${2} --enable-conv-tiling --mac-v2 --mac-human-decision ${UNET}/jsons/hd_files/hd_unet_${HD}_tgm.json --compiler-configs-file ${UNET}/jsons/compiler_configs/unet_compiler_configs_no_inst.json --pef-name=unet_train_${BS}_${2}_NP --data-parallel -ws 2 --output-folder=${OUTDIR}"
fi
Expand All @@ -108,7 +106,7 @@ elif [ "${1}" == "run" ] ; then
export SF_RNT_DMA_POLL_BUSY_WAIT=1
#run single
if [ -e ${UNET}/hook.py ] ; then
COMMAND="srun --nodelist $(hostname) python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR} --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling --min-throughput 395 --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --max-epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${BS}_single_${NUM_TILES} --pef=${OUTDIR}/unet_train_${BS}_${2}_single_${NUM_TILES}/unet_train_${BS}_${2}_single_${NUM_TILES}.pef"
COMMAND="srun --nodelist $(hostname) python /opt/sambaflow/apps/vision/segmentation//hook.py run --data-cache=${CACHE_DIR} --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling --min-throughput 395 --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --max-epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${BS}_single_${NUM_TILES} --pef=${OUTDIR}/unet_train_${BS}_${2}_single_${NUM_TILES}/unet_train_${BS}_${2}_single_${NUM_TILES}.pef"

else
COMMAND="srun --nodelist $(hostname) python ${UNET}/unet_hook.py run --num-workers=${NUM_WORKERS} --do-train --in-channels=3 --in-width=${2} --in-height=${2} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${2}_${3} --pef=${OUTDIR}/unet_train_${BS}_${2}_single/unet_train_${BS}_${2}_single.pef --use-sambaloader"
Expand Down
6 changes: 2 additions & 4 deletions docs/ai-testbed/sambanova/files/unet_batch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ export OMP_NUM_THREADS=16
if [ -e /opt/sambaflow/apps/image/unet ] ; then
UNET=/opt/sambaflow/apps/image/unet
elif [ -e /opt/sambaflow/apps/image/segmentation ] ; then
UNET=/opt/sambaflow/apps/image/segmentation/
UNET=/opt/sambaflow/apps/vision/segmentation/
else
echo "Cannot find UNET"
exit
Expand All @@ -61,9 +61,7 @@ echo "Time: " $(date +%H:%M) >> ${OUTPUT_PATH} 2>&1
export SF_RNT_DMA_POLL_BUSY_WAIT=1
rm -rf log_dir_unet_${NP}_train_kaggle
if [ -e ${UNET}/hook.py ] ; then
#orig srun --mpi=pmi2 python ${UNET}/hook.py run --data-cache-dir ${CACHE_DIR} --num-workers=${NUM_WORKERS} --mode train --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --pef=$(pwd)/out/unet_train_${BS}_${IM}_NP/unet_train_${BS}_${IM}_NP.pef --data-parallel --reduce-on-rdu --use-sambaloader > run_unet_${BS}_${IM}_${NP}.log 2>&1
#1.15.2 srun --mpi=pmi2 python ${UNET}/hook.py run --data-in-memory --data-cache=${CACHE_DIR} --num-workers=${NUM_WORKERS} --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=$(pwd)/out/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef > run_unet_${BS}_${IM}_${NP}_4.log 2>&1
COMMAND="srun --mpi=pmi2 python /opt/sambaflow/apps/image/segmentation//hook.py run --data-cache=${CACHE_DIR} --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling --min-throughput 395 --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --max-epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=${OUTDIR}/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef"
COMMAND="srun --mpi=pmi2 python /opt/sambaflow/apps/vision/segmentation/hook.py run --data-cache=${CACHE_DIR} --data-in-memory --num-workers=${NUM_WORKERS} --enable-tiling --min-throughput 395 --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --max-epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --data-parallel --reduce-on-rdu --pef=${OUTDIR}/unet_train_${BS}_${IM}_NP_4/unet_train_${BS}_${IM}_NP_4.pef"
else
COMMAND="srun --mpi=pmi2 python ${UNET}/unet_hook.py run --data-cache-dir ${CACHE_DIR} --num-workers=${NUM_WORKERS} --do-train --in-channels=3 --in-width=${IM} --in-height=${IM} --init-features 32 --batch-size=${BS} --epochs 10 --data-dir ${DS} --log-dir log_dir_unet_${IM}_${BS}_${NP} --pef=$(pwd)/out/unet_train_${BS}_${IM}_NP/unet_train_${BS}_${IM}_NP.pef --data-parallel --reduce-on-rdu --use-sambaloader > run_unet_${BS}_${IM}_${NP}.log 2>&1"
fi
Expand Down

0 comments on commit 36c4029

Please sign in to comment.