Skip to content

Commit

Permalink
Merge pull request #360 from ker2xu/main
Browse files Browse the repository at this point in the history
Updates for readme and demo ipynb and a small update for deprecated function
  • Loading branch information
shibing624 authored Apr 18, 2024
2 parents a99e3ee + 0307794 commit 3b604e5
Show file tree
Hide file tree
Showing 6 changed files with 439 additions and 425 deletions.
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)

<div align="center">
<a href="https://github.com/shibing624/MedicalGPT">
Expand All @@ -19,7 +19,7 @@

## 📖 Introduction

**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(Direct Preference Optimization).

**MedicalGPT** 训练医疗大模型,实现了包括增量预训练、有监督微调、RLHF(奖励建模、强化学习训练)和DPO(直接偏好优化)。
Expand Down Expand Up @@ -60,7 +60,7 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(

- 第一阶段:PT(Continue PreTraining)增量预训练,在海量领域文档数据上二次预训练GPT模型,以适应领域数据分布(可选)
- 第二阶段:SFT(Supervised Fine-tuning)有监督微调,构造指令微调数据集,在预训练模型基础上做指令精调,以对齐指令意图,并注入领域知识
- 第三阶段
- 第三阶段
- RLHF(Reinforcement Learning from Human Feedback)基于人类反馈对语言模型进行强化学习,分为两步:
- RM(Reward Model)奖励模型建模,构造人类偏好排序数据集,训练奖励模型,用来建模人类偏好,主要是"HHH"原则,具体是"helpful, honest, harmless"
- RL(Reinforcement Learning)强化学习,用奖励模型来训练SFT模型,生成模型使用奖励或惩罚来更新其策略,以便生成更高质量、更符合人类偏好的文本
Expand All @@ -71,7 +71,7 @@ Supervised Finetuning, RLHF(Reward Modeling and Reinforcement Learning) and DPO(
### Release Models


| Model | Base Model | Introduction |
| Model | Base Model | Introduction |
|:------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型,医疗问答效果有提升,发布微调后的LoRA权重(单轮对话) |
| [shibing624/ziya-llama-13b-medical-merged](https://huggingface.co/shibing624/ziya-llama-13b-medical-merged) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型,医疗问答效果有提升,发布微调后的完整模型权重(单轮对话) |
Expand Down Expand Up @@ -105,15 +105,15 @@ CUDA_VISIBLE_DEVICES=0 python gradio_demo.py --model_type base_model_type --base

## 💾 Install
#### Updating the requirements
From time to time, the `requirements.txt` changes. To update, use this command:
`requirements.txt`会不时更新. 使用以下命令更新依赖:

```markdown
git clone https://github.com/shibing624/MedicalGPT
cd MedicalGPT
pip install -r requirements.txt --upgrade
```

#### Hardware Requirement(显存/VRAM)
#### Hardware Requirement (显存/VRAM)


| 训练方法 | 精度 | 7B | 13B | 30B | 65B | 8x7B |
Expand All @@ -127,14 +127,14 @@ pip install -r requirements.txt --upgrade

Training Stage:

| Stage | Introduction | Python script | Shell script |
| Stage | Introduction | Python script | Shell script |
|:-------------------------------|:-------------|:--------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------|
| Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
| Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
| Direct Preference Optimization | 直接偏好优化 | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) |
| Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
| Reinforcement Learning | 强化学习 | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) |
| ORPO | 概率偏好优化 | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) |
| Continue Pretraining | 增量预训练 | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
| Supervised Fine-tuning | 有监督微调 | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
| Direct Preference Optimization | 直接偏好优化 | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) |
| Reward Modeling | 奖励模型建模 | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
| Reinforcement Learning | 强化学习 | [ppo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/ppo_training.py) | [run_ppo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_ppo.sh) |
| ORPO | 概率偏好优化 | [orpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/orpo_training.py) | [run_orpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_orpo.sh) |

- 提供完整PT+SFT+DPO全阶段串起来训练的pipeline:[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) ,其对应的colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb),运行完大概需要15分钟,我运行成功后的副本colab:[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kMIe3pTec2snQvLBA00Br8ND1_zwy3Gr?usp=sharing)
- 提供完整PT+SFT+RLHF全阶段串起来训练的pipeline:[run_training_ppo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) ,其对应的colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_ppo_pipeline.ipynb) ,运行完大概需要20分钟,我运行成功后的副本colab:[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RGkbev8D85gR33HJYxqNdnEThODvGUsS?usp=sharing)
Expand Down Expand Up @@ -209,7 +209,7 @@ yi:
- [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
- [01-ai/Yi-34B](https://huggingface.co/01-ai/Yi-34B)

## 💻 Inference
## 💻 Inference
训练完成后,现在我们加载训练好的模型,验证模型生成文本的效果。

```shell
Expand Down Expand Up @@ -267,7 +267,7 @@ CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node 2 inference_multigpu_demo.py
</details>


## 📚 Dataset
## 📚 Dataset
### 医疗数据集

- 240万条中文医疗数据集(包括预训练、指令微调和奖励数据集):[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)
Expand Down Expand Up @@ -342,7 +342,7 @@ MedicalGPT项目代码的授权协议为 [The Apache License 2.0](/LICENSE),

之后即可提交PR。

## 💕 Acknowledgements
## 💕 Acknowledgements

- [Direct Preference Optimization:Your Language Model is Secretly a Reward Model](https://arxiv.org/pdf/2305.18290.pdf)
- [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/finetune.py)
Expand Down
22 changes: 16 additions & 6 deletions README_EN.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)
[**🇨🇳中文**](https://github.com/shibing624/MedicalGPT/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/MedicalGPT/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/MedicalGPT/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624)

<div align="center">
<a href="https://github.com/shibing624/MedicalGPT">
Expand All @@ -19,7 +19,7 @@

## 📖 Introduction

**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
**MedicalGPT** training medical GPT model with ChatGPT training pipeline, implemantation of Pretraining,
Supervised Finetuning, Reward Modeling and Reinforcement Learning.


Expand Down Expand Up @@ -117,7 +117,17 @@ sh run_ppo.sh
[Training Detail wiki](https://github.com/shibing624/MedicalGPT/wiki/Training-Details)


### Hardware Requirement(VRAM)
## 💾 Install
#### Updating the requirements
From time to time, the `requirements.txt` changes. To update, use this command:

```markdown
git clone https://github.com/shibing624/MedicalGPT
cd MedicalGPT
pip install -r requirements.txt --upgrade
```

### Hardware Requirement (VRAM)

| Method | Bits | 7B | 13B | 30B | 65B | 8x7B |
| ------ | ---- | ----- | ----- | ----- | ------ | ------ |
Expand All @@ -126,7 +136,7 @@ sh run_ppo.sh
| QLoRA | 8 | 10GB | 16GB | 40GB | 80GB | 80GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 32GB |

## 🔥 Inference
## 🔥 Inference
After the training is complete, now we load the trained model to verify the effect of the model generating text.

```shell
Expand Down Expand Up @@ -160,7 +170,7 @@ Parameter Description:
<br/>


## 📚 Dataset
## 📚 Dataset

- 2.4 million Chinese medical datasets (including pre-training, instruction fine-tuning and reward datasets): [shibing624/medical](https://huggingface.co/datasets/shibing624/medical)

Expand Down Expand Up @@ -208,7 +218,7 @@ The project code is still very rough. If you have improved the code, you are wel

Then you can submit a PR.

## 💕 Acknowledgements
## 💕 Acknowledgements

- [tloen/alpaca-lora](https://github.com/tloen/alpaca-lora/blob/main/finetune.py)
- [ymcui/Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)
Expand Down
10 changes: 5 additions & 5 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
accelerate~=0.27.2
datasets>=2.14.6
loguru
transformers>=4.39.3
peft~=0.10.0
sentencepiece
datasets>=2.14.6
tqdm
scikit-learn
tensorboard
tqdm>=4.47.0
peft~=0.10.0
accelerate~=0.27.2
transformers>=4.39.3
trl~=0.8.3
4 changes: 2 additions & 2 deletions reward_modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import torch
from datasets import load_dataset
from loguru import logger
from peft import LoraConfig, TaskType, get_peft_model, PeftModel, prepare_model_for_int8_training
from peft import LoraConfig, TaskType, get_peft_model, PeftModel, prepare_model_for_kbit_training
from sklearn.metrics import mean_squared_error, mean_absolute_error
from torch.utils.data import Dataset
from transformers import (
Expand Down Expand Up @@ -425,7 +425,7 @@ def main():
else:
logger.info("Init new peft model")
if model_args.load_in_8bit:
model = prepare_model_for_int8_training(model)
model = prepare_model_for_kbit_training(model)
target_modules = script_args.target_modules.split(',') if script_args.target_modules else None
if target_modules and 'all' in target_modules:
target_modules = find_all_linear_names(model, int4=False, int8=model_args.load_in_8bit)
Expand Down
Loading

0 comments on commit 3b604e5

Please sign in to comment.