Skip to content

Commit

Permalink
update training pipeline.
Browse files Browse the repository at this point in the history
  • Loading branch information
shibing624 committed Jun 15, 2023
1 parent c03a853 commit abbfa91
Show file tree
Hide file tree
Showing 9 changed files with 883 additions and 923 deletions.
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,14 @@ python gradio_demo.py --model_type base_model_type --base_model path_to_llama_hf

Training Stage:

| Stage | Introduction | Open In Colab | Python script | Shell script |
|:--------------------------------|:-------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| Stage 1: Continue Pretraining | 增量预训练 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_pretraining.ipynb) | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
| Stage 2: Supervised Fine-tuning | 有监督微调 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_supervised_finetuning.ipynb) | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
| Stage 3: Reward Modeling | 奖励模型建模 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_reward_modeling.ipynb) | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
| Stage 4: Reinforcement Learning | 强化学习 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_rl_training.ipynb) | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/rl_training.py) | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rl.sh) |

| Stage | Introduction | Python script | Shell script |
|:--------------------------------|:-------------|:------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| Stage 1: Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
| Stage 2: Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
| Stage 3: Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
| Stage 4: Reinforcement Learning | 强化学习 | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/rl_training.py) | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rl.sh) |

提供完整四阶段串起来训练的pipeline:[run_training_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_pipeline.ipynb) ,其对应的colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_pipeline.ipynb)

[训练参数说明wiki](https://github.com/shibing624/MedicalGPT/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82%E8%AF%B4%E6%98%8E)

Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
loguru
transformers>=4.30.1
sentencepiece
datasets
tensorboard
tqdm>=4.47.0
Expand Down
12 changes: 6 additions & 6 deletions reward_modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -514,7 +514,7 @@ def main():
logger.info(f"Raw datasets: {raw_datasets}")

# Preprocessing the datasets
max_length = data_args.max_source_length + data_args.max_target_length
full_max_length = data_args.max_source_length + data_args.max_target_length

def preprocess_reward_function(examples):
"""
Expand Down Expand Up @@ -560,8 +560,8 @@ def preprocess_reward_function(examples):
desc="Running tokenizer on dataset",
)
train_dataset = tokenized_dataset.filter(
lambda x: 0 < len(x['input_ids_rejected']) <= max_length and 0 < len(
x['input_ids_chosen']) <= max_length
lambda x: 0 < len(x['input_ids_rejected']) <= full_max_length and 0 < len(
x['input_ids_chosen']) <= full_max_length
)
logger.debug(f"Num train_samples: {len(train_dataset)}")
logger.debug("Tokenized training example:")
Expand All @@ -588,8 +588,8 @@ def preprocess_reward_function(examples):
desc="Running tokenizer on dataset",
)
eval_dataset = tokenized_dataset.filter(
lambda x: 0 < len(x['input_ids_rejected']) <= max_length and 0 < len(
x['input_ids_chosen']) <= max_length
lambda x: 0 < len(x['input_ids_rejected']) <= full_max_length and 0 < len(
x['input_ids_chosen']) <= full_max_length
)
logger.debug(f"Num eval_samples: {len(eval_dataset)}")
logger.debug("Tokenized eval example:")
Expand All @@ -614,7 +614,7 @@ def preprocess_reward_function(examples):
tokenizer=tokenizer,
compute_metrics=compute_metrics,
data_collator=RewardDataCollatorWithPadding(
tokenizer=tokenizer, max_length=max_length, padding="max_length"
tokenizer=tokenizer, max_length=full_max_length, padding="max_length"
),
)

Expand Down
6 changes: 3 additions & 3 deletions rl_training.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@
"llama": (LlamaForCausalLM, LlamaTokenizer),
}


PROMPT_TEMPLATE = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response: "
)


@dataclass
class ScriptArguments:
"""
Expand Down Expand Up @@ -169,7 +169,6 @@ def __post_init__(self):
raise ValueError("You must specify a valid reward_model_name_or_path to run training.")



def print_trainable_parameters(model):
"""
Prints the number of trainable parameters in the model.
Expand Down Expand Up @@ -202,6 +201,8 @@ def main():
logger.warning(f"Parse args: {args}")

model_class, tokenizer_class = MODEL_CLASSES[args.model_type]
if args.model_type == 'bloom':
args.use_fast_tokenizer = True
# Load tokenizer
tokenizer_kwargs = {
"cache_dir": args.cache_dir,
Expand Down Expand Up @@ -359,7 +360,6 @@ def preprocess_function(examples):
logger.debug("Tokenized training example:")
# logger.debug(tokenizer.decode(train_dataset[0]['input_ids']))


def collator(data):
return dict((key, [d[key] for d in data]) for key in data[0])

Expand Down
235 changes: 0 additions & 235 deletions run_pretraining.ipynb

This file was deleted.

Loading

0 comments on commit abbfa91

Please sign in to comment.