Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

axolotl example #2784

Merged
merged 7 commits into from
Nov 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions llm/axolotl/axolotl-spot.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Usage:
#
# Unmanaged spot (no auto-recovery; for debugging):
# HF_TOKEN=abc BUCKET=<unique-name> sky launch -c axolotl-spot axolotl-spot.yaml --env HF_TOKEN --env BUCKET -i30 --down
#
# Managed spot (auto-recovery; for full runs):
# HF_TOKEN=abc BUCKET=<unique-name> sky spot launch -n axolotl-spot axolotl-spot.yaml --env HF_TOKEN --env BUCKET

name: axolotl
manishiitg marked this conversation as resolved.
Show resolved Hide resolved

resources:
accelerators: A100:1
cloud: gcp # optional
manishiitg marked this conversation as resolved.
Show resolved Hide resolved
use_spot: True

workdir: mistral

file_mounts:
/sky-notebook:
name: ${BUCKET}
mode: MOUNT

setup: |
docker pull winglian/axolotl:main-py3.10-cu118-2.0.1

run: |
docker run --gpus all \
-v ~/sky_workdir:/sky_workdir \
-v /root/.cache:/root/.cache \
winglian/axolotl:main-py3.10-cu118-2.0.1 \
huggingface-cli login --token ${HF_TOKEN}

docker run --gpus all \
-v ~/sky_workdir:/sky_workdir \
-v /root/.cache:/root/.cache \
-v /sky-notebook:/sky-notebook \
winglian/axolotl:main-py3.10-cu118-2.0.1 \
accelerate launch -m axolotl.cli.train /sky_workdir/qlora-checkpoint.yaml

envs:
HF_TOKEN: <your-huggingface-token> # TODO: Replace with huggingface token
manishiitg marked this conversation as resolved.
Show resolved Hide resolved
BUCKET: <a-unique-bucket-name-to-use>






35 changes: 35 additions & 0 deletions llm/axolotl/axolotl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Usage:
# HF_TOKEN=abc sky launch -c axolotl axolotl.yaml --env HF_TOKEN -y -i30 --down

name: axolotl
manishiitg marked this conversation as resolved.
Show resolved Hide resolved

resources:
accelerators: L4:1
cloud: gcp # optional

workdir: mistral

setup: |
docker pull winglian/axolotl:main-py3.10-cu118-2.0.1

run: |
docker run --gpus all \
-v ~/sky_workdir:/sky_workdir \
-v /root/.cache:/root/.cache \
winglian/axolotl:main-py3.10-cu118-2.0.1 \
huggingface-cli login --token ${HF_TOKEN}

docker run --gpus all \
-v ~/sky_workdir:/sky_workdir \
-v /root/.cache:/root/.cache \
winglian/axolotl:main-py3.10-cu118-2.0.1 \
accelerate launch -m axolotl.cli.train /sky_workdir/qlora.yaml

envs:
HF_TOKEN: <your-huggingface-token> # TODO: Replace with huggingface token






84 changes: 84 additions & 0 deletions llm/axolotl/mistral/qlora-checkpoint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca

dataset_prepared_path: /sky-notebook/alpaca_2k_test/last_run_prepared
val_set_size: 0.05
output_dir: /sky-notebook/alpaca_2k_test

# hub_model_id: manishiitg/mistral-alpaca_2k_test # TODO: Replace with hub model id
# hf_use_auth_token: false # TODO: push as private or public model

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
auto_resume_from_checkpoints: true ## manage check point resume from here
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 0.05
eval_table_size:
eval_table_max_new_tokens: 128
save_steps: 2 ## increase based on your dataset
save_strategy: steps
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
81 changes: 81 additions & 0 deletions llm/axolotl/mistral/qlora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
base_model: mistralai/Mistral-7B-v0.1
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./qlora-out

# hub_model_id: manishiitg/mhenrichsen-alpaca_2k_test # TODO: Replace with hub model id
# hf_use_auth_token: false # TODO: push as private or public model

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 0.05
eval_table_size:
eval_table_max_new_tokens: 128
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
24 changes: 24 additions & 0 deletions llm/axolotl/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
This example is using the same config located https://github.com/OpenAccess-AI-Collective/axolotl/tree/main/examples/mistral



Simple training example
```
HF_TOKEN=abc sky launch -c axolotl axolotl.yaml --env HF_TOKEN -y -i30 --down
ssh -L 8888:localhost:8888 axolotl
sky down axolotl -y
```



To launch an unmanaged spot instance (no auto-recovery; good for debugging)
```
HF_TOKEN=abc BUCKET=<unique-name> sky launch -c axolotl-spot axolotl-spot.yaml --env HF_TOKEN --env BUCKET -i30 --down
ssh -L 8888:localhost:8888 axolotl-spot
```


Launch managed spot instances (auto-recovery; for full runs):
```
HF_TOKEN=abc BUCKET=<unique-name> sky spot launch -n axolotl-spot axolotl-spot.yaml --env HF_TOKEN --env BUCKET
```