update notebook.

shibing624 · Jun 15, 2023 · c03a853 · c03a853
1 parent f1581f3
commit c03a853
Showing 1 changed file with 12 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -28,6 +28,12 @@ Supervised Finetuning, Reward Modeling and Reinforcement Learning.
 
 分四阶段训练GPT模型，来自Andrej Karpathy的演讲PDF [State of GPT](https://karpathy.ai/stateofgpt.pdf)，视频 [Video](https://build.microsoft.com/en-US/sessions/db3f4859-cd30-4445-a0cd-553c3304f8e2)
 
+## 🔥 News
+[2023/06/15] v1.0版本: 发布中文医疗LoRA模型[shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora)，基于Ziya-LLaMA-13B-v1模型，SFT微调了一版医疗模型，医疗问答效果有提升，发布微调后的LoRA权重，详见[Release-v1.0](https://github.com/shibing624/MedicalGPT/releases/tag/1.0.0)
+
+[2023/06/05] v0.2版本: 以医疗为例，训练领域大模型，实现了四阶段训练：包括二次预训练、有监督微调、奖励建模、强化学习训练。详见[Release-v0.2](https://github.com/shibing624/MedicalGPT/releases/tag/0.2.0)
+
+
 ## 😊 Feature
 基于ChatGPT Training Pipeline，本项目实现了领域模型--医疗模型的四阶段训练：
 
@@ -36,19 +42,15 @@ Supervised Finetuning, Reward Modeling and Reinforcement Learning.
 - 第三阶段：RM(Reward Model)奖励模型建模，构造人类偏好排序数据集，训练奖励模型，用来对齐人类偏好，主要是"HHH"原则，具体是"helpful, honest, harmless"
 - 第四阶段：RL(Reinforcement Learning)基于人类反馈的强化学习(RLHF)，用奖励模型来训练SFT模型，生成模型使用奖励或惩罚来更新其策略，以便生成更高质量、更符合人类偏好的文本
 
-### News
-[2023/06/15] v1.0版本: 发布中文医疗LoRA模型[shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora)，基于Ziya-LLaMA-13B-v1模型，SFT微调了一版医疗模型，医疗问答效果有提升，发布微调后的LoRA权重，详见[Release-v1.0](https://github.com/shibing624/MedicalGPT/releases/tag/1.0.0)
-
-[2023/06/05] v0.2版本: 以医疗为例，训练领域大模型，实现了四阶段训练：包括二次预训练、有监督微调、奖励建模、强化学习训练。详见[Release-v0.2](https://github.com/shibing624/MedicalGPT/releases/tag/0.2.0)
 
-
-## 🔥 Release Models
+### Release Models
 
 
 | Model                                                                                                   | Base Model                                                                        | Introduction                                                                                                                           | 
 |:--------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------|
 | [shibing624/ziya-llama-13b-medical-lora](https://huggingface.co/shibing624/ziya-llama-13b-medical-lora) | [IDEA-CCNL/Ziya-LLaMA-13B-v1](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) | 在240万条中英文医疗数据集[shibing624/medical](https://huggingface.co/datasets/shibing624/medical)上SFT微调了一版Ziya-LLaMA-13B模型，医疗问答效果有提升，发布微调后的LoRA权重 |
 
+
 ## ▶️ Demo
 
 - Hugging Face Demo: doing
@@ -76,10 +78,10 @@ Training Stage:
 
 | Stage                           | Introduction | Open In Colab                                                                                                                                                                                   | Python script                                                                                                           | Shell script                                                                        |                      
 |:--------------------------------|:-------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
-| Stage 1: Continue Pretraining   | 增量预训练        | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_pretraining.ipynb)           | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/pretraining.py)                     | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_pt.sh)   | 
-| Stage 2: Supervised Fine-tuning | 有监督微调        | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_supervised_finetuning.ipynb) | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_sft.sh) | 
-| Stage 3: Reward Modeling        | 奖励模型建模       | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_reward_modeling.ipynb)       | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/reward_modeling.py)             | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_rm.sh)   | 
-| Stage 4: Reinforcement Learning | 强化学习         |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/notebook/run_rl_training.ipynb)          | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/scripts/rl_training.py)                     | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/scripts/run_rl.sh)   | 
+| Stage 1: Continue Pretraining   | 增量预训练        | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_pretraining.ipynb)           | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py)                     | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh)   | 
+| Stage 2: Supervised Fine-tuning | 有监督微调        | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_supervised_finetuning.ipynb) | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) | 
+| Stage 3: Reward Modeling        | 奖励模型建模       | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_reward_modeling.ipynb)       | [reward_modeling.py](https://github.com/shibing624/MedicalGPT/blob/main/reward_modeling.py)             | [run_rm.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rm.sh)   | 
+| Stage 4: Reinforcement Learning | 强化学习         |  [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_rl_training.ipynb)          | [rl_training.py](https://github.com/shibing624/MedicalGPT/blob/main/rl_training.py)                     | [run_rl.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_rl.sh)   | 
 
 
 [训练参数说明wiki](https://github.com/shibing624/MedicalGPT/wiki/%E8%AE%AD%E7%BB%83%E7%BB%86%E8%8A%82%E8%AF%B4%E6%98%8E)