Skip to content

Commit

Permalink
update notebook.
Browse files Browse the repository at this point in the history
  • Loading branch information
shibing624 committed Jun 15, 2023
1 parent f1581f3 commit c03a853
Showing 1 changed file with 12 additions and 10 deletions.
22 changes: 12 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -28,6 +28,12 @@ Supervised Finetuning, Reward Modeling and Reinforcement Learning.

分四阶段训练GPT模型,来自Andrej Karpathy的演讲PDF [State of GPT](https://karpathy.ai/stateofgpt.pdf),视频 [Video](https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2)

## 🔥 News
[2023/06/15] v1.0版本: 发布中文医疗LoRA模型[shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora),基于Ziya-LLaMA-13B-v1模型,SFT微调了一版医疗模型,医疗问答效果有提升,发布微调后的LoRA权重,详见[Release-v1.0](https://github.com/shibing624/MedicalGPT/releases/tag/1.0.0)

[2023/06/05] v0.2版本: 以医疗为例,训练领域大模型,实现了四阶段训练:包括二次预训练、有监督微调、奖励建模、强化学习训练。详见[Release-v0.2](https://github.com/shibing624/MedicalGPT/releases/tag/0.2.0)


## 😊 Feature
基于ChatGPT Training Pipeline,本项目实现了领域模型--医疗模型的四阶段训练:

@@ -36,19 +42,15 @@ Supervised Finetuning, Reward Modeling and Reinforcement Learning.
- 第三阶段:RM(Reward Model)奖励模型建模,构造人类偏好排序数据集,训练奖励模型,用来对齐人类偏好,主要是"HHH"原则,具体是"helpful, honest, harmless"
- 第四阶段:RL(Reinforcement Learning)基于人类反馈的强化学习(RLHF),用奖励模型来训练SFT模型,生成模型使用奖励或惩罚来更新其策略,以便生成更高质量、更符合人类偏好的文本

### News
[2023/06/15] v1.0版本: 发布中文医疗LoRA模型[shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora),基于Ziya-LLaMA-13B-v1模型,SFT微调了一版医疗模型,医疗问答效果有提升,发布微调后的LoRA权重,详见[Release-v1.0](https://github.com/shibing624/MedicalGPT/releases/tag/1.0.0)

[2023/06/05] v0.2版本: 以医疗为例,训练领域大模型,实现了四阶段训练:包括二次预训练、有监督微调、奖励建模、强化学习训练。详见[Release-v0.2](https://github.com/shibing624/MedicalGPT/releases/tag/0.2.0)


## 🔥 Release Models
### Release Models


| Model | Base Model | Introduction |
|:--------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------|
| [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型,医疗问答效果有提升,发布微调后的LoRA权重 |


## ▶️ Demo

- Hugging Face Demo: doing
@@ -76,10 +78,10 @@ Training Stage:

| Stage | Introduction | Open In Colab | Python script | Shell script |
|:--------------------------------|:-------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| Stage 1: Continue Pretraining | 增量预训练 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_pretraining.ipynb) | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_pt.sh) |
| Stage 2: Supervised Fine-tuning | 有监督微调 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_supervised_finetuning.ipynb) | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_sft.sh) |
| Stage 3: Reward Modeling | 奖励模型建模 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_reward_modeling.ipynb) | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_rm.sh) |
| Stage 4: Reinforcement Learning | 强化学习 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_rl_training.ipynb) | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/rl_training.py) | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_rl.sh) |
| Stage 1: Continue Pretraining | 增量预训练 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_pretraining.ipynb) | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |
| Stage 2: Supervised Fine-tuning | 有监督微调 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_supervised_finetuning.ipynb) | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |
| Stage 3: Reward Modeling | 奖励模型建模 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_reward_modeling.ipynb) | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py) | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh) |
| Stage 4: Reinforcement Learning | 强化学习 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_rl_training.ipynb) | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/rl_training.py) | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rl.sh) |


[训练参数说明wiki](https://github.com/shibing624/MedicalGPT/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82%E8%AF%B4%E6%98%8E)

0 comments on commit c03a853

Please sign in to comment.