-
Notifications
You must be signed in to change notification settings - Fork 417
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add smollm2 pipeline * update readme
- Loading branch information
Showing
6 changed files
with
210 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
|
||
# Instructions to train SmolLM2-1.7B-Instruct | ||
|
||
We build the [SmolLM2-Instruct](https://huggingface.co/collections/HuggingFaceTB/smollm2-6723884218bcda64b34d7db9) by doing SFT on [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) and then DPO on [UltraFeedBack](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). | ||
|
||
## Setup | ||
|
||
Follow the installation instructions in https://github.com/huggingface/alignment-handbook/tree/main?tab=readme-ov-file#installation-instructions | ||
|
||
## Training | ||
We train the 1.7B on 8 GPUs using the following command: | ||
|
||
```shell | ||
# SFT | ||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config.yaml | ||
|
||
# DPO | ||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config.yaml | ||
``` | ||
|
||
For the 135M and 360M we use [smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) dataset for SFT and UltraFeedback for DPO: | ||
```shell | ||
# SFT | ||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config_smol.yaml | ||
|
||
# DPO | ||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config_smol.yaml | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Model arguments | ||
model_name_or_path: loubnabnl/smollm2-1.7B-sft | ||
torch_dtype: bfloat16 | ||
|
||
# Data training arguments | ||
dataset_mixer: | ||
HuggingFaceH4/ultrafeedback_binarized: 1.0 | ||
|
||
dataset_splits: | ||
- train_prefs | ||
- test_prefs | ||
preprocessing_num_workers: 12 | ||
|
||
# DPOTrainer arguments | ||
bf16: true | ||
beta: 0.5 | ||
do_eval: true | ||
hub_private_repo: true | ||
eval_strategy: steps | ||
eval_steps: 100 | ||
gradient_accumulation_steps: 8 | ||
gradient_checkpointing: true | ||
gradient_checkpointing_kwargs: | ||
use_reentrant: False | ||
hub_model_id: smollm2-1.7B-dpo | ||
learning_rate: 1.0e-6 | ||
log_level: info | ||
logging_steps: 10 | ||
lr_scheduler_type: cosine | ||
max_length: 1024 | ||
max_prompt_length: 512 | ||
num_train_epochs: 3 | ||
optim: adamw_torch | ||
output_dir: data/smollm2-1.7B-dpo | ||
per_device_train_batch_size: 2 | ||
per_device_eval_batch_size: 4 | ||
push_to_hub: true | ||
report_to: | ||
- tensorboard | ||
- wandb | ||
save_strategy: "no" | ||
seed: 42 | ||
warmup_ratio: 0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Model arguments | ||
model_name_or_path: loubnabnl/smollm2-360M-sft # we use this script for the 135M model too | ||
torch_dtype: bfloat16 | ||
|
||
# Data training arguments | ||
dataset_mixer: | ||
HuggingFaceH4/ultrafeedback_binarized: 1.0 | ||
|
||
dataset_splits: | ||
- train_prefs | ||
- test_prefs | ||
preprocessing_num_workers: 12 | ||
|
||
# DPOTrainer arguments | ||
bf16: true | ||
beta: 0.5 | ||
do_eval: true | ||
hub_private_repo: true | ||
eval_strategy: steps | ||
eval_steps: 100 | ||
gradient_accumulation_steps: 8 | ||
gradient_checkpointing: true | ||
gradient_checkpointing_kwargs: | ||
use_reentrant: False | ||
hub_model_id: smollm2-360M-dpo | ||
learning_rate: 1.0e-6 | ||
log_level: info | ||
logging_steps: 10 | ||
lr_scheduler_type: cosine | ||
max_length: 1024 | ||
max_prompt_length: 512 | ||
num_train_epochs: 2 | ||
optim: adamw_torch | ||
output_dir: data/smollm2-360M-dpo | ||
per_device_train_batch_size: 2 | ||
per_device_eval_batch_size: 4 | ||
push_to_hub: true | ||
report_to: | ||
- tensorboard | ||
- wandb | ||
save_strategy: "no" | ||
seed: 42 | ||
warmup_ratio: 0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Model arguments | ||
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B | ||
model_revision: main | ||
tokenizer_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens | ||
torch_dtype: bfloat16 | ||
use_flash_attention_2: true | ||
|
||
# Data training arguments | ||
dataset_mixer: | ||
HuggingFaceTB/smoltalk: 1.0 | ||
|
||
dataset_configs: | ||
- all | ||
|
||
dataset_splits: | ||
- train | ||
- test | ||
preprocessing_num_workers: 36 | ||
|
||
# SFT trainer config | ||
bf16: true | ||
do_eval: true | ||
evaluation_strategy: epoch | ||
gradient_accumulation_steps: 4 | ||
gradient_checkpointing: true | ||
gradient_checkpointing_kwargs: | ||
use_reentrant: false | ||
hub_model_id: smollm2-1.7B-sft | ||
hub_strategy: every_save | ||
learning_rate: 3.0e-04 | ||
log_level: info | ||
logging_steps: 5 | ||
logging_strategy: steps | ||
lr_scheduler_type: cosine | ||
max_seq_length: 8192 | ||
max_steps: -1 | ||
num_train_epochs: 2 | ||
output_dir: data/smollm2-1.7B-sft | ||
overwrite_output_dir: true | ||
per_device_eval_batch_size: 4 | ||
per_device_train_batch_size: 4 | ||
push_to_hub: true | ||
remove_unused_columns: true | ||
report_to: | ||
- tensorboard | ||
- wandb | ||
save_strategy: "no" | ||
seed: 42 | ||
warmup_ratio: 0.1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Model arguments | ||
model_name_or_path: HuggingFaceTB/SmolLM2-360M # we use this script for the 135M model too | ||
model_revision: main | ||
tokenizer_name_or_path: HuggingFaceTB/SmolLM2-360M-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens | ||
torch_dtype: bfloat16 | ||
use_flash_attention_2: true | ||
|
||
# Data training arguments | ||
dataset_mixer: | ||
HuggingFaceTB/smol-smoltalk: 1.0 | ||
|
||
dataset_splits: | ||
- train | ||
- test | ||
preprocessing_num_workers: 36 | ||
|
||
# SFT trainer config | ||
bf16: true | ||
do_eval: true | ||
evaluation_strategy: epoch | ||
gradient_accumulation_steps: 4 | ||
gradient_checkpointing: true | ||
gradient_checkpointing_kwargs: | ||
use_reentrant: false | ||
hub_model_id: smollm2-360M-sft | ||
hub_strategy: every_save | ||
learning_rate: 1.0e-03 # 3e-4 | ||
log_level: info | ||
logging_steps: 5 | ||
logging_strategy: steps | ||
lr_scheduler_type: cosine | ||
max_seq_length: 8192 | ||
max_steps: -1 | ||
num_train_epochs: 2 | ||
output_dir: data/smollm2-360M-sft | ||
overwrite_output_dir: true | ||
per_device_eval_batch_size: 4 | ||
per_device_train_batch_size: 4 | ||
push_to_hub: true | ||
remove_unused_columns: true | ||
report_to: | ||
- tensorboard | ||
- wandb | ||
save_strategy: "no" | ||
seed: 42 | ||
warmup_ratio: 0.1 |