From 030779456bd5c8a3c5a4593ab650013afdfb3eaa Mon Sep 17 00:00:00 2001 From: XU Ke Date: Wed, 17 Apr 2024 22:14:43 +0800 Subject: [PATCH] 1. Add scikit-learn to requirements; 2. Update deprecated API of peft; 3. set CUDA_VISIBLE_DEVICES=0 in ppo part of demo ipynb such that users with multi CUDA devices can run it smoothly; 4; Modify the test step in demo ipynb to non-interactive; 5. Copy INSTALL step to ENG doc. --- README.md | 32 +-- README_EN.md | 22 +- requirements.txt | 10 +- reward_modeling.py | 4 +- run_training_dpo_pipeline.ipynb | 338 +++++++++++------------ run_training_ppo_pipeline.ipynb | 458 ++++++++++++++++---------------- 6 files changed, 439 insertions(+), 425 deletions(-) diff --git a/README.md b/README.md index f18f293..89d3426 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624) +[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
@@ -19,7 +19,7 @@ ## 📖 Introduction -**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, +**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization). **MedicalGPT** 训练医疗大模型,实现了包括增量预训练、有监督微调、RLHF(奖励建模、强化学习训练)和DPO(直接偏好优化)。 @@ -60,7 +60,7 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO( - 第一阶段:PT(Continue PreTraining)增量预训练,在海量领域文档数据上二次预训练GPT模型,以适应领域数据分布(可选) - 第二阶段:SFT(Supervised Fine-tuning)有监督微调,构造指令微调数据集,在预训练模型基础上做指令精调,以对齐指令意图,并注入领域知识 -- 第三阶段 +- 第三阶段 - RLHF(Reinforcement Learning from Human Feedback)基于人类反馈对语言模型进行强化学习,分为两步: - RM(Reward Model)奖励模型建模,构造人类偏好排序数据集,训练奖励模型,用来建模人类偏好,主要是"HHH"原则,具体是"helpful, honest, harmless" - RL(Reinforcement Learning)强化学习,用奖励模型来训练SFT模型,生成模型使用奖励或惩罚来更新其策略,以便生成更高质量、更符合人类偏好的文本 @@ -71,7 +71,7 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO( ### Release Models -| Model | Base Model | Introduction | +| Model | Base Model | Introduction | |:------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型,医疗问答效果有提升,发布微调后的LoRA权重(单轮对话) | | [shibing624/ziya-llama-13b-medical-merged](https://huggingface.co/shibing624/ziya-llama-13b-medical-merged) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型,医疗问答效果有提升,发布微调后的完整模型权重(单轮对话) | @@ -105,7 +105,7 @@ CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --model_type base_model_type --base ## 💾 Install #### Updating the requirements -From time to time, the `requirements.txt` changes. To update, use this command: +`requirements.txt`会不时更新. 使用以下命令更新依赖: ```markdown git clone https://github.com/shibing624/MedicalGPT @@ -113,7 +113,7 @@ cd MedicalGPT pip install -r requirements.txt --upgrade ``` -#### Hardware Requirement(显存/VRAM) +#### Hardware Requirement (显存/VRAM) | 训练方法 | 精度 | 7B | 13B | 30B | 65B | 8x7B | @@ -127,14 +127,14 @@ pip install -r requirements.txt --upgrade Training Stage: -| Stage | Introduction | Python script | Shell script | +| Stage | Introduction | Python script | Shell script | |:-------------------------------|:-------------|:--------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------| -| Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) | -| Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) | -| Direct Preference Optimization | 直接偏好优化 | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) | -| Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) | -| Reinforcement Learning | 强化学习 | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) | -| ORPO | 概率偏好优化 | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) | +| Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) | +| Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) | +| Direct Preference Optimization | 直接偏好优化 | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) | +| Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) | +| Reinforcement Learning | 强化学习 | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) | +| ORPO | 概率偏好优化 | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) | - 提供完整PT+SFT+DPO全阶段串起来训练的pipeline:[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) ,其对应的colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb),运行完大概需要15分钟,我运行成功后的副本colab:[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kMIe3pTec2snQvLBA00Br8ND1_zwy3Gr?usp=sharing) - 提供完整PT+SFT+RLHF全阶段串起来训练的pipeline:[run_training_ppo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) ,其对应的colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) ,运行完大概需要20分钟,我运行成功后的副本colab:[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RGkbev8D85gR33HJYxqNdnEThODvGUsS?usp=sharing) @@ -209,7 +209,7 @@ yi: - [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) - [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B) -## 💻 Inference +## 💻 Inference 训练完成后,现在我们加载训练好的模型,验证模型生成文本的效果。 ```shell @@ -267,7 +267,7 @@ CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 inference_multigpu_demo.py -## 📚 Dataset +## 📚 Dataset ### 医疗数据集 - 240万条中文医疗数据集(包括预训练、指令微调和奖励数据集):[shibing624/medical](https://huggingface.co/datasets/shibing624/medical) @@ -342,7 +342,7 @@ MedicalGPT项目代码的授权协议为 [The Apache License 2.0](/LICENSE), 之后即可提交PR。 -## 💕 Acknowledgements +## 💕 Acknowledgements - [Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf) - [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/finetune.py) diff --git a/README_EN.md b/README_EN.md index 2bf778e..6a47883 100644 --- a/README_EN.md +++ b/README_EN.md @@ -1,4 +1,4 @@ -[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624) +[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
@@ -19,7 +19,7 @@ ## 📖 Introduction -**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, +**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining, Supervised Finetuning, Reward Modeling and Reinforcement Learning. @@ -117,7 +117,17 @@ sh run_ppo.sh [Training Detail wiki](https://github.com/shibing624/MedicalGPT/wiki/Training-Details) -### Hardware Requirement(VRAM) +## 💾 Install +#### Updating the requirements +From time to time, the `requirements.txt` changes. To update, use this command: + +```markdown +git clone https://github.com/shibing624/MedicalGPT +cd MedicalGPT +pip install -r requirements.txt --upgrade +``` + +### Hardware Requirement (VRAM) | Method | Bits | 7B | 13B | 30B | 65B | 8x7B | | ------ | ---- | ----- | ----- | ----- | ------ | ------ | @@ -126,7 +136,7 @@ sh run_ppo.sh | QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB | | QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB | -## 🔥 Inference +## 🔥 Inference After the training is complete, now we load the trained model to verify the effect of the model generating text. ```shell @@ -160,7 +170,7 @@ Parameter Description:
-## 📚 Dataset +## 📚 Dataset - 2.4 million Chinese medical datasets (including pre-training, instruction fine-tuning and reward datasets): [shibing624/medical](https://huggingface.co/datasets/shibing624/medical) @@ -208,7 +218,7 @@ The project code is still very rough. If you have improved the code, you are wel Then you can submit a PR. -## 💕 Acknowledgements +## 💕 Acknowledgements - [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/finetune.py) - [ymcui/Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) diff --git a/requirements.txt b/requirements.txt index 07c41bc..fbabfd2 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,10 +1,10 @@ +accelerate~=0.27.2 +datasets>=2.14.6 loguru -transformers>=4.39.3 +peft~=0.10.0 sentencepiece -datasets>=2.14.6 -tqdm +scikit-learn tensorboard tqdm>=4.47.0 -peft~=0.10.0 -accelerate~=0.27.2 +transformers>=4.39.3 trl~=0.8.3 diff --git a/reward_modeling.py b/reward_modeling.py index 7549a92..893f43a 100644 --- a/reward_modeling.py +++ b/reward_modeling.py @@ -13,7 +13,7 @@ import torch from datasets import load_dataset from loguru import logger -from peft import LoraConfig, TaskType, get_peft_model, PeftModel, prepare_model_for_int8_training +from peft import LoraConfig, TaskType, get_peft_model, PeftModel, prepare_model_for_kbit_training from sklearn.metrics import mean_squared_error, mean_absolute_error from torch.utils.data import Dataset from transformers import ( @@ -425,7 +425,7 @@ def main(): else: logger.info("Init new peft model") if model_args.load_in_8bit: - model = prepare_model_for_int8_training(model) + model = prepare_model_for_kbit_training(model) target_modules = script_args.target_modules.split(',') if script_args.target_modules else None if target_modules and 'all' in target_modules: target_modules = find_all_linear_names(model, int4=False, int8=model_args.load_in_8bit) diff --git a/run_training_dpo_pipeline.ipynb b/run_training_dpo_pipeline.ipynb index 7c9093d..8f720ff 100644 --- a/run_training_dpo_pipeline.ipynb +++ b/run_training_dpo_pipeline.ipynb @@ -2,13 +2,13 @@ "cells": [ { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Training Pipeline\n", "[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) | [Open In Colab](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -40,10 +40,10 @@ }, { "cell_type": "markdown", - "source": [], "metadata": { "collapsed": false - } + }, + "source": [] }, { "cell_type": "markdown", @@ -119,6 +119,9 @@ { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python pretraining.py \\\n", @@ -163,10 +166,7 @@ " --report_to tensorboard \\\n", " --ddp_find_unused_parameters False \\\n", " --gradient_checkpointing True" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -188,46 +188,46 @@ }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model bigscience/bloomz-560m --lora_model outputs-pt-v1 --output_dir merged-pt/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-pt/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-pt/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -241,8 +241,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "start_time": "2023-06-15T13:56:17.032821Z", - "end_time": "2023-06-15T13:56:17.081153Z" + "end_time": "2023-06-15T13:56:17.081153Z", + "start_time": "2023-06-15T13:56:17.032821Z" } }, "outputs": [], @@ -250,32 +250,35 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Stage 2: Supervised FineTuning\n", "\n", "第二阶段:SFT(Supervised Fine-tuning)有监督微调,构造指令微调数据集,在预训练模型基础上做指令精调,以对齐指令意图,并注入领域知识\n", "\n", "| Stage 2: Supervised Fine-tuning | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage1得到的预训练模型\n", "2. 数据集:SFT阶段使用的是使用的是Belle的1千条抽样数据,位于`data/finetune`文件夹" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Stage2 咱们开始吧\n", "\n", @@ -291,29 +294,29 @@ "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2023-06-15T13:58:38.966506Z", + "start_time": "2023-06-15T13:58:38.778132Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "%ls ./data/finetune" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "start_time": "2023-06-15T13:58:38.778132Z", - "end_time": "2023-06-15T13:58:38.966506Z" - } - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python supervised_finetuning.py \\\n", @@ -355,126 +358,126 @@ " --report_to tensorboard \\\n", " --ddp_find_unused_parameters False \\\n", " --gradient_checkpointing True" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh outputs-sft-v1" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model merged-pt --lora_model outputs-sft-v1 --output_dir ./merged-sft" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-sft/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-sft/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "Stage2 SFT训练完成。" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "Stage2 SFT训练完成。" + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-15T14:07:40.731186Z", - "end_time": "2023-06-15T14:07:40.752635Z" - } - } + "end_time": "2023-06-15T14:07:40.752635Z", + "start_time": "2023-06-15T14:07:40.731186Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Stage 3: DPO(Direct Preference Optimization)\n", "\n", "第三阶段:DPO(Direct Preference Optimization)直接偏好优化,DPO通过直接优化语言模型来实现对其行为的精确控制,而无需使用复杂的强化学习,也可以有效学习到人类偏好,DPO相较于RLHF更容易实现且易于训练,效果更好\n", "\n", "| Stage 3: Direct Preference Optimization | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) |" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage2得到的SFT模型\n", "2. 数据集:DPO阶段使用的是医疗reward数据,抽样了500条,位于`data/reward`文件夹" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Stage3 咱们开始吧\n", "\n", @@ -490,25 +493,25 @@ "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls ./data/reward/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python dpo_training.py \\\n", @@ -540,158 +543,157 @@ " --remove_unused_columns False \\\n", " --gradient_checkpointing True \\\n", " --cache_dir ./cache" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh outputs-dpo-v1" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model merged-sft --lora_model outputs-dpo-v1 --output_dir merged-dpo/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-dpo/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-dpo/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "Stage3 偏好建模第一次训练完成。" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "Stage3 偏好建模第一次训练完成。" + ] }, { "cell_type": "markdown", - "source": [ - "**至此一个完整的训练流程演示完成。**" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "**至此一个完整的训练流程演示完成。**" + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-26T12:34:29.620609Z", - "end_time": "2023-06-26T12:34:29.658428Z" - } - } + "end_time": "2023-06-26T12:34:29.658428Z", + "start_time": "2023-06-26T12:34:29.620609Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [] }, { "cell_type": "markdown", - "source": [ - "# Test" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "# Test" + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [ - "!python inference.py --model_type bloom --base_model merged-dpo --interactive" - ], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-26T12:34:47.802087Z", - "end_time": "2023-06-26T12:35:00.864463Z" - } - } + "end_time": "2023-06-26T12:35:00.864463Z", + "start_time": "2023-06-26T12:34:47.802087Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [ + "!python inference.py --model_type bloom --base_model merged-dpo\n", + "# 或在shell中运行\n", + "# python inference.py --model_type bloom --base_model merged-dpo --interactive" + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "Input:介绍下南京\n", "Response: 南京市位于江苏省西南部,是全国首批历史文化名城、国家中心城市和自由贸易试验区。\n", "\n", "完。\n" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { "collapsed": false - } + }, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "name": "python3", + "display_name": "Python 3", "language": "python", - "display_name": "Python 3" + "name": "python3" }, "language_info": { "codemirror_mode": { diff --git a/run_training_ppo_pipeline.ipynb b/run_training_ppo_pipeline.ipynb index 23d4973..264da83 100644 --- a/run_training_ppo_pipeline.ipynb +++ b/run_training_ppo_pipeline.ipynb @@ -2,13 +2,13 @@ "cells": [ { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Training Pipeline\n", "[run_training_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_pipeline.ipynb) | [Open In Colab](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_pipeline.ipynb)" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -112,6 +112,9 @@ { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python pretraining.py \\\n", @@ -156,10 +159,7 @@ " --report_to tensorboard \\\n", " --ddp_find_unused_parameters False \\\n", " --gradient_checkpointing True" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", @@ -181,46 +181,46 @@ }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model bigscience/bloomz-560m --lora_model outputs-pt-v1 --output_dir merged-pt/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-pt/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-pt/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", @@ -234,8 +234,8 @@ "execution_count": null, "metadata": { "ExecuteTime": { - "start_time": "2023-06-15T13:56:17.032821Z", - "end_time": "2023-06-15T13:56:17.081153Z" + "end_time": "2023-06-15T13:56:17.081153Z", + "start_time": "2023-06-15T13:56:17.032821Z" } }, "outputs": [], @@ -243,32 +243,35 @@ }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Stage 2: Supervised FineTuning\n", "\n", "第二阶段:SFT(Supervised Fine-tuning)有监督微调,构造指令微调数据集,在预训练模型基础上做指令精调,以对齐指令意图,并注入领域知识\n", "\n", "| Stage 2: Supervised Fine-tuning | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage1得到的预训练模型\n", "2. 数据集:SFT阶段使用的是使用的是Belle的1千条抽样数据,位于`data/finetune`文件夹" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Stage2 咱们开始吧\n", "\n", @@ -284,29 +287,29 @@ "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2023-06-15T13:58:38.966506Z", + "start_time": "2023-06-15T13:58:38.778132Z" + }, + "collapsed": false + }, "outputs": [], "source": [ "%ls ./data/finetune" - ], - "metadata": { - "collapsed": false, - "ExecuteTime": { - "start_time": "2023-06-15T13:58:38.778132Z", - "end_time": "2023-06-15T13:58:38.966506Z" - } - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python supervised_finetuning.py \\\n", @@ -348,126 +351,126 @@ " --report_to tensorboard \\\n", " --ddp_find_unused_parameters False \\\n", " --gradient_checkpointing True" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh outputs-sft-v1" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model merged-pt --lora_model outputs-sft-v1 --output_dir merged-sft/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-sft/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-sft/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "Stage2 SFT训练完成。" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "Stage2 SFT训练完成。" + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-15T14:07:40.731186Z", - "end_time": "2023-06-15T14:07:40.752635Z" - } - } + "end_time": "2023-06-15T14:07:40.752635Z", + "start_time": "2023-06-15T14:07:40.731186Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Stage 3: Reward Modeling\n", "\n", "第三阶段:RM(Reward Model)奖励模型建模,构造人类偏好排序数据集,训练奖励模型,用来对齐人类偏好,主要是\"HHH\"原则,具体是\"helpful, honest, harmless\"\n", "\n", "| Stage 3: Reward Modeling | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage2得到的SFT模型\n", "2. 数据集:RM阶段使用的是医疗reward数据,抽样了500条,位于`data/reward`文件夹" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Stage3 咱们开始吧\n", "\n", @@ -483,25 +486,25 @@ "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls ./data/reward/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python reward_modeling.py \\\n", @@ -543,113 +546,113 @@ " --ddp_find_unused_parameters False \\\n", " --remove_unused_columns False \\\n", " --gradient_checkpointing True" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh outputs-rm-v1" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model merged-sft --lora_model outputs-rm-v1 --output_dir merged-rm/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-rm/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-rm/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "Stage3 奖励建模第一次训练完成。" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "Stage3 奖励建模第一次训练完成。" + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-15T14:12:09.464881Z", - "end_time": "2023-06-15T14:12:09.472414Z" - } - } + "end_time": "2023-06-15T14:12:09.472414Z", + "start_time": "2023-06-15T14:12:09.464881Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "# Stage 4: Reinforcement Learning Training\n", "\n", "第四阶段:RL(Reinforcement Learning)基于人类反馈的强化学习(RLHF),用奖励模型来训练SFT模型,生成模型使用奖励或惩罚来更新其策略,以便生成更高质量、更符合人类偏好的文本\n", "\n", "| Stage 4: Reinforcement Learning | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/rl_training.py) | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rl.sh) |\n" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型、奖励模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", @@ -657,13 +660,13 @@ "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage2得到的SFT模型\n", "2. 奖励模型:使用的是`OpenAssistant/reward-model-deberta-v3-large-v2` 或者 Stage3得到的BERT类或者GPT类奖励模型\n", "3. 数据集:RL阶段的数据可以复用SFT的数据集,使用的是Belle的1千条抽样数据,位于`data/finetune`文件夹" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "## Stage4 咱们开始吧\n", "\n", @@ -681,28 +684,28 @@ "6. 查看训练结果\n", "\n", "以下参数可以根据你的GPU实际情况修改,当前参数是根据Colab的T4单卡GPU(16GB显存)配置的。" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls ./data/finetune/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ - "!python ppo_training.py \\\n", + "! CUDA_VISIBLE_DEVICES=0 python ppo_training.py \\\n", " --model_type bloom \\\n", " --model_name_or_path ./merged-sft \\\n", " --reward_model_name_or_path ./merged-rm \\\n", @@ -726,178 +729,177 @@ " --early_stopping True \\\n", " --target_kl 0.1 \\\n", " --reward_baseline 0.0" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [ - "%ls -lh outputs-ppo-v1" - ], "metadata": { "collapsed": false - } + }, + "outputs": [], + "source": [ + "%ls -lh outputs-rl-v1" + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/trl`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/trl --host 0.0.0.0 --port 8009`" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [ - "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", - " --base_model merged-sft --lora_model outputs-ppo-v1 --output_dir merged-ppo/" - ], - "metadata": { - "collapsed": false - } + " --base_model merged-sft --lora_model outputs-rl-v1 --output_dir merged-ppo/" + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%ls -lh merged-ppo/" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, + "metadata": { + "collapsed": false + }, "outputs": [], "source": [ "%cat merged-ppo/config.json" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "Stage4 RL第一次训练完成。\n", "\n", "**至此一个完整的4阶段训练流程演示完成。**" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "实际操作中Stage3和Stage4可以反复多次,直到RL得到的最后模型满足评估要求。\n", "\n", "RLHF过程可以把SFT模型当成一个初始化模型,RM模型当做指导老师,使用RL(PPO)调教SFT模型生成指导老师最满意的结果,如果小学老师满意了,我们就再训练一个中学老师,继续指导,中学老师满意了,就训练一个大学老师,这样不断迭代,使得生成模型的质量达到甚至超过人工撰写的天花板。\n", "\n", "RLHF训练不易,此项目提供给大家一种实现的方法和参考,希望抛砖引玉,共同促进中文开源LLM发展。" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "collapsed": false - } + }, + "source": [] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-26T12:34:29.620609Z", - "end_time": "2023-06-26T12:34:29.658428Z" - } - } + "end_time": "2023-06-26T12:34:29.658428Z", + "start_time": "2023-06-26T12:34:29.620609Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [] }, { "cell_type": "markdown", - "source": [ - "# Test" - ], "metadata": { "collapsed": false - } + }, + "source": [ + "# Test" + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "collapsed": false - } + }, + "source": [] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [ - "!python inference.py --model_type bloom --base_model merged-ppo --interactive" - ], "metadata": { - "collapsed": false, "ExecuteTime": { - "start_time": "2023-06-26T12:34:47.802087Z", - "end_time": "2023-06-26T12:35:00.864463Z" - } - } + "end_time": "2023-06-26T12:35:00.864463Z", + "start_time": "2023-06-26T12:34:47.802087Z" + }, + "collapsed": false + }, + "outputs": [], + "source": [ + "!python inference.py --model_type bloom --base_model merged-ppo\n", + "# 或在shell中运行\n", + "# !python inference.py --model_type bloom --base_model merged-ppo --interactive" + ] }, { "cell_type": "markdown", + "metadata": { + "collapsed": false + }, "source": [ "Input:介绍下南京\n", "Response: 南京市位于江苏省西南部,是全国首批历史文化名城、国家中心城市和自由贸易试验区。\n", "\n", "完。\n" - ], - "metadata": { - "collapsed": false - } + ] }, { "cell_type": "code", "execution_count": null, - "outputs": [], - "source": [], "metadata": { "collapsed": false - } + }, + "outputs": [], + "source": [] } ], "metadata": { "kernelspec": { - "name": "python3", + "display_name": "Python 3", "language": "python", - "display_name": "Python 3" + "name": "python3" }, "language_info": { "codemirror_mode": {