RLHF-V · zxrys · Nov 22, 2024 · Nov 23, 2024 · Nov 24, 2024 · Nov 25, 2024
diff --git a/data_engine/README.md b/data_engine/README.md
@@ -24,7 +24,7 @@ Each has distinct requirements, explained below.
 
 #### Process Method
 
-Detailed in the corresponding research paper.
+Use RLAIF-V divide-and-conquer strategy to collect AI feedback.
 
 #### Required Models
 
@@ -75,8 +75,7 @@ Dataset should be in `.jsonl` format with the following fields:
 
 #### Process Method
 
-Generates rewards using the DPO framework to rank answers. Higher-ranked answers are marked as "chosen," while
-lower-ranked ones are "rejected."
+Use RLAIF-V self-feedback guidance with DPO-trained models.
 
 #### Custom Implementation
 

diff --git a/data_engine/README_zh.md b/data_engine/README_zh.md
@@ -22,7 +22,7 @@
 
 #### 处理方法
 
-具体流程详见论文。
+使用 RLAIF-V 分而治之策略收集 AI 反馈。
 
 #### 所需模型
 
@@ -72,7 +72,7 @@
 
 #### 处理方法
 
-使用 DPO 框架生成奖励以对答案排序。高分答案标记为Chosen，低分答案标记为Rejected。
+将 RLAIF-V 自反馈指导与 DPO 训练模型结合使用。
 
 #### 自定义实现