diff --git a/README.md b/README.md index 6cc35dc6..ce625c6b 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155 The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline. ## News 🗞️ +* **November 21, 2024**: We release the [recipe](recipes/smollm2/README.md) for finet-uning SmolLM2-Instruct. * **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md) to fine-tuning small LLMs 💻 * **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁 * **March 12, 2024:** We release StarChat2 15B, along with the recipe to train capable coding assistants 🌟 diff --git a/recipes/smollm2/README.md b/recipes/smollm2/README.md new file mode 100644 index 00000000..2afc8844 --- /dev/null +++ b/recipes/smollm2/README.md @@ -0,0 +1,28 @@ + +# Instructions to train SmolLM2-1.7B-Instruct + +We build the [SmolLM2-Instruct](https://huggingface.co/collections/HuggingFaceTB/smollm2-6723884218bcda64b34d7db9) by doing SFT on [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) and then DPO on [UltraFeedBack](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). + +## Setup + +Follow the installation instructions in https://github.com/huggingface/alignment-handbook/tree/main?tab=readme-ov-file#installation-instructions + +## Training +We train the 1.7B on 8 GPUs using the following command: + +```shell +# SFT +ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config.yaml + +# DPO +ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config.yaml +``` + +For the 135M and 360M we use [smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) dataset for SFT and UltraFeedback for DPO: +```shell +# SFT +ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config_smol.yaml + +# DPO +ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config_smol.yaml +``` \ No newline at end of file diff --git a/recipes/smollm2/dpo/config.yaml b/recipes/smollm2/dpo/config.yaml new file mode 100644 index 00000000..1f35f8dc --- /dev/null +++ b/recipes/smollm2/dpo/config.yaml @@ -0,0 +1,43 @@ +# Model arguments +model_name_or_path: loubnabnl/smollm2-1.7B-sft +torch_dtype: bfloat16 + +# Data training arguments +dataset_mixer: + HuggingFaceH4/ultrafeedback_binarized: 1.0 + +dataset_splits: +- train_prefs +- test_prefs +preprocessing_num_workers: 12 + +# DPOTrainer arguments +bf16: true +beta: 0.5 +do_eval: true +hub_private_repo: true +eval_strategy: steps +eval_steps: 100 +gradient_accumulation_steps: 8 +gradient_checkpointing: true +gradient_checkpointing_kwargs: + use_reentrant: False +hub_model_id: smollm2-1.7B-dpo +learning_rate: 1.0e-6 +log_level: info +logging_steps: 10 +lr_scheduler_type: cosine +max_length: 1024 +max_prompt_length: 512 +num_train_epochs: 3 +optim: adamw_torch +output_dir: data/smollm2-1.7B-dpo +per_device_train_batch_size: 2 +per_device_eval_batch_size: 4 +push_to_hub: true +report_to: +- tensorboard +- wandb +save_strategy: "no" +seed: 42 +warmup_ratio: 0.1 \ No newline at end of file diff --git a/recipes/smollm2/dpo/config_smol.yaml b/recipes/smollm2/dpo/config_smol.yaml new file mode 100644 index 00000000..b629bc3a --- /dev/null +++ b/recipes/smollm2/dpo/config_smol.yaml @@ -0,0 +1,43 @@ +# Model arguments +model_name_or_path: loubnabnl/smollm2-360M-sft # we use this script for the 135M model too +torch_dtype: bfloat16 + +# Data training arguments +dataset_mixer: + HuggingFaceH4/ultrafeedback_binarized: 1.0 + +dataset_splits: +- train_prefs +- test_prefs +preprocessing_num_workers: 12 + +# DPOTrainer arguments +bf16: true +beta: 0.5 +do_eval: true +hub_private_repo: true +eval_strategy: steps +eval_steps: 100 +gradient_accumulation_steps: 8 +gradient_checkpointing: true +gradient_checkpointing_kwargs: + use_reentrant: False +hub_model_id: smollm2-360M-dpo +learning_rate: 1.0e-6 +log_level: info +logging_steps: 10 +lr_scheduler_type: cosine +max_length: 1024 +max_prompt_length: 512 +num_train_epochs: 2 +optim: adamw_torch +output_dir: data/smollm2-360M-dpo +per_device_train_batch_size: 2 +per_device_eval_batch_size: 4 +push_to_hub: true +report_to: +- tensorboard +- wandb +save_strategy: "no" +seed: 42 +warmup_ratio: 0.1 \ No newline at end of file diff --git a/recipes/smollm2/sft/config.yaml b/recipes/smollm2/sft/config.yaml new file mode 100644 index 00000000..6f6cd516 --- /dev/null +++ b/recipes/smollm2/sft/config.yaml @@ -0,0 +1,49 @@ +# Model arguments +model_name_or_path: HuggingFaceTB/SmolLM2-1.7B +model_revision: main +tokenizer_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens +torch_dtype: bfloat16 +use_flash_attention_2: true + +# Data training arguments +dataset_mixer: + HuggingFaceTB/smoltalk: 1.0 + +dataset_configs: +- all + +dataset_splits: +- train +- test +preprocessing_num_workers: 36 + +# SFT trainer config +bf16: true +do_eval: true +evaluation_strategy: epoch +gradient_accumulation_steps: 4 +gradient_checkpointing: true +gradient_checkpointing_kwargs: + use_reentrant: false +hub_model_id: smollm2-1.7B-sft +hub_strategy: every_save +learning_rate: 3.0e-04 +log_level: info +logging_steps: 5 +logging_strategy: steps +lr_scheduler_type: cosine +max_seq_length: 8192 +max_steps: -1 +num_train_epochs: 2 +output_dir: data/smollm2-1.7B-sft +overwrite_output_dir: true +per_device_eval_batch_size: 4 +per_device_train_batch_size: 4 +push_to_hub: true +remove_unused_columns: true +report_to: +- tensorboard +- wandb +save_strategy: "no" +seed: 42 +warmup_ratio: 0.1 \ No newline at end of file diff --git a/recipes/smollm2/sft/config_smol.yaml b/recipes/smollm2/sft/config_smol.yaml new file mode 100644 index 00000000..70be48cc --- /dev/null +++ b/recipes/smollm2/sft/config_smol.yaml @@ -0,0 +1,46 @@ +# Model arguments +model_name_or_path: HuggingFaceTB/SmolLM2-360M # we use this script for the 135M model too +model_revision: main +tokenizer_name_or_path: HuggingFaceTB/SmolLM2-360M-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens +torch_dtype: bfloat16 +use_flash_attention_2: true + +# Data training arguments +dataset_mixer: + HuggingFaceTB/smol-smoltalk: 1.0 + +dataset_splits: +- train +- test +preprocessing_num_workers: 36 + +# SFT trainer config +bf16: true +do_eval: true +evaluation_strategy: epoch +gradient_accumulation_steps: 4 +gradient_checkpointing: true +gradient_checkpointing_kwargs: + use_reentrant: false +hub_model_id: smollm2-360M-sft +hub_strategy: every_save +learning_rate: 1.0e-03 # 3e-4 +log_level: info +logging_steps: 5 +logging_strategy: steps +lr_scheduler_type: cosine +max_seq_length: 8192 +max_steps: -1 +num_train_epochs: 2 +output_dir: data/smollm2-360M-sft +overwrite_output_dir: true +per_device_eval_batch_size: 4 +per_device_train_batch_size: 4 +push_to_hub: true +remove_unused_columns: true +report_to: +- tensorboard +- wandb +save_strategy: "no" +seed: 42 +warmup_ratio: 0.1 \ No newline at end of file