Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data engine #37

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
8dcd127
[upgrade] stage 1 code clean
zxrys Nov 22, 2024
627684d
[upgrade] stage 1 passed
zxrys Nov 23, 2024
644aecd
[upgrade] stage 2 passed
zxrys Nov 24, 2024
b510d4b
[upgrade] stage 3 passed
zxrys Nov 25, 2024
3e9c239
[upgrade] able to train
zxrys Nov 25, 2024
9e5e1ae
[upgrade] fix and readme
zxrys Nov 26, 2024
cbd51f8
[upgrade]
zxrys Nov 28, 2024
a2be592
[upgrade]
zxrys Dec 1, 2024
36e436f
Merge remote-tracking branch 'upstream/main'
zxrys Dec 1, 2024
1271f4e
[upgrade]
zxrys Dec 1, 2024
d221e20
[upgrade]
zxrys Dec 2, 2024
f604e0a
Merge remote-tracking branch 'upstream/main'
zxrys Dec 2, 2024
21fcd8b
[upgrade] README add some explanation
zxrys Dec 2, 2024
0dbb20f
[upgrade] some simple change
zxrys Dec 3, 2024
be73e25
[upgrade] refactor code
zxrys Dec 3, 2024
65755eb
[upgrade]
zxrys Dec 4, 2024
0863ec2
[upgrade]
zxrys Dec 7, 2024
8cdab04
[upgrade]
zxrys Dec 8, 2024
1419ee1
Merge branch 'RLHF-V:main' into main
zxrys Dec 8, 2024
0871cd7
[upgrade] refine README
zxrys Dec 9, 2024
5e8ac54
Merge remote-tracking branch 'origin/main'
zxrys Dec 9, 2024
8c46e3a
[upgrade] refine README
zxrys Dec 9, 2024
3046b7d
[upgrade] minicpm inference
zxrys Dec 11, 2024
ebd2e11
[upgrade] minicpm logps
zxrys Dec 11, 2024
da511f4
[fix]
zxrys Dec 11, 2024
b1d6136
[fix]
zxrys Dec 12, 2024
e57a062
[upgrade] support MiniCPM-V
zxrys Dec 12, 2024
301a9eb
[upgrade] code robustness increase
zxrys Dec 24, 2024
5d95832
[upgrade] llava update
zxrys Dec 25, 2024
d9bfbb4
[fix]
zxrys Dec 25, 2024
9593bc1
[upgrade] llava critic gen answer
zxrys Dec 26, 2024
29caf18
[fix]
zxrys Dec 27, 2024
8bd24b9
[upgrade]
zxrys Dec 28, 2024
801a3f9
[fix]
zxrys Dec 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions data_engine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Each has distinct requirements, explained below.

#### Process Method

Detailed in the corresponding research paper.
Use RLAIF-V divide-and-conquer strategy to collect AI feedback.

#### Required Models

Expand Down Expand Up @@ -75,8 +75,7 @@ Dataset should be in `.jsonl` format with the following fields:

#### Process Method

Generates rewards using the DPO framework to rank answers. Higher-ranked answers are marked as "chosen," while
lower-ranked ones are "rejected."
Use RLAIF-V self-feedback guidance with DPO-trained models.

#### Custom Implementation

Expand Down
4 changes: 2 additions & 2 deletions data_engine/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

#### 处理方法

具体流程详见论文
使用 RLAIF-V 分而治之策略收集 AI 反馈

#### 所需模型

Expand Down Expand Up @@ -72,7 +72,7 @@

#### 处理方法

使用 DPO 框架生成奖励以对答案排序。高分答案标记为Chosen,低分答案标记为Rejected
将 RLAIF-V 自反馈指导与 DPO 训练模型结合使用
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用 RLAIF-V 基于 DPO-aligned 模型构造的自反馈信号。


#### 自定义实现

Expand Down