Data engine #37

zxrys · 2024-12-02T16:37:56Z

Add data engine to enable users to build their own DPO dataset.

# Conflicts: # chat.py # muffin/eval/muffin_inference_logp.py

# Conflicts: # pyproject.toml

zxrys · 2024-12-02T16:39:27Z

yiranyyu

Great work! This PR supports automatically generating high-quality preference learning dataset efficiently with RLAIF-V models or other reward models and instruction models.

Still, some of the modification should be further revised before permitted to be merged.

data_engine/README.md

yiranyyu · 2024-12-03T11:15:17Z

data_engine/README.md

+Please refer to the `run_engine.sh` script.
+
+You will need to provide the path and name for both the reward model and the instruction model. Currently, we support the following models: llava-1.5-7b, RLAIF-V-7B, OmniLMM-12B, and RLAIF-V-12B. We are considering adding more models in the future. \
+If the model you wish to use is not listed, you may need to implement the corresponding code yourself (for model loading, add code to `RLAIF-V/builder`; for answer sampling, refer to `RLAIF-V/llava/llava15_sample_data.py` to see how data is formatted (don't forget to pass `raw_images`) and add call it in `RLAIF-V/data_engine/answer_sampler.py`; for log probability calculation, change data formatting part in `RLAIF-V/data_engine/logps_calculator.py` and `get_multimodal_sample_logps` function in `RLAIF-V/muffin/eval/muffin_inference_logp.py`).


这里拆成几个不同的 subsection.

Generate Rollouts

Reward collection

Customize your reward model

Customize your instruction model

yiranyyu · 2024-12-03T11:17:50Z

data_engine/README.md

+
+You can specify a `--work_dir` to store intermediate files and the final output under this directory (which will actually be a subdirectory within it).
+
+If you encounter errors during generation, you can pass the stage next to the stage that has been completed using the `--continue_from_stage` parameter (0, 1, or 2). When the value is 0, it will start from scratch. (For example, if you've completed stages 0 and 1 but encounter an error during stage 2, you can fix the issue and set `--continue_from_stage 2` to continue from that point.) You can check the `data_engine.py` file for details on what each stage does.


就分成三个脚本吧，尽量不要有含糊的信息。

yiranyyu · 2024-12-03T11:20:32Z

pyproject.toml

这些版本都需要改吗，会不会影响其他结果可复现性

我印象中transformers版本为4.35时training报错，印象中是说Bfloat不支持还是啥的，升级到4.37后解决了。其他几个基本上都是因为transformers升级为避免版本依赖冲突而一起升级到。确实可能会影响可复现性，可能需要讨论下

omnilmm/train/train_utils.py

omnilmm/model/omnilmm.py

muffin/eval/muffin_inference_logp.py

muffin/data/datasets.py

yiranyyu

Last step, refine the readme to improve the readability.

yiranyyu · 2024-12-09T09:13:06Z

data_engine/README.md


-Thank you for choosing RLAIF-V. Best wishes for your project!
+Generates rewards using the DPO framework to rank answers. Higher-ranked answers are marked as "chosen," while


Use RLAIF-V self-feedback guidance with DPO-trained models.

yiranyyu · 2024-12-09T09:13:35Z

data_engine/README.md

-```
+#### Process Method
+
+Detailed in the corresponding research paper.


Use RLAIF-V divide-and-conquer strategy to collect AI feedback.

yiranyyu · 2024-12-09T09:14:52Z

data_engine/README_zh.md

+
+#### 处理方法
+
+具体流程详见论文。


中文参考英文修改

yiranyyu · 2024-12-09T14:08:09Z

data_engine/README_zh.md

@@ -72,7 +72,7 @@

 #### 处理方法

-使用 DPO 框架生成奖励以对答案排序。高分答案标记为Chosen，低分答案标记为Rejected。
+将 RLAIF-V 自反馈指导与 DPO 训练模型结合使用。


使用 RLAIF-V 基于 DPO-aligned 模型构造的自反馈信号。

zxrys added 13 commits November 22, 2024 15:51

[upgrade] stage 1 code clean

8dcd127

[upgrade] stage 1 passed

627684d

[upgrade] stage 2 passed

644aecd

[upgrade] stage 3 passed

b510d4b

[upgrade] able to train

3e9c239

[upgrade] fix and readme

9e5e1ae

[upgrade]

cbd51f8

[upgrade]

a2be592

Merge remote-tracking branch 'upstream/main'

36e436f

# Conflicts: # chat.py # muffin/eval/muffin_inference_logp.py

[upgrade]

1271f4e

[upgrade]

d221e20

Merge remote-tracking branch 'upstream/main'

f604e0a

# Conflicts: # pyproject.toml

[upgrade] README add some explanation

21fcd8b

yiranyyu requested changes Dec 3, 2024

View reviewed changes

zxrys and others added 6 commits December 3, 2024 21:58

[upgrade] some simple change

0dbb20f

[upgrade] refactor code

be73e25

[upgrade]

65755eb

[upgrade]

0863ec2

[upgrade]

8cdab04

Merge branch 'RLHF-V:main' into main

1419ee1

yiranyyu requested changes Dec 9, 2024

View reviewed changes

zxrys added 2 commits December 9, 2024 20:23

[upgrade] refine README

0871cd7

Merge remote-tracking branch 'origin/main'

5e8ac54

yiranyyu requested changes Dec 9, 2024

View reviewed changes

[upgrade] refine README

8c46e3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data engine #37

Data engine #37

zxrys commented Dec 2, 2024

zxrys commented Dec 2, 2024

yiranyyu left a comment

yiranyyu Dec 3, 2024

yiranyyu Dec 3, 2024

yiranyyu Dec 3, 2024

zxrys Dec 3, 2024

yiranyyu left a comment

yiranyyu Dec 9, 2024

yiranyyu Dec 9, 2024

yiranyyu Dec 9, 2024

yiranyyu Dec 9, 2024


		You can specify a `--work_dir` to store intermediate files and the final output under this directory (which will actually be a subdirectory within it).

		If you encounter errors during generation, you can pass the stage next to the stage that has been completed using the `--continue_from_stage` parameter (0, 1, or 2). When the value is 0, it will start from scratch. (For example, if you've completed stages 0 and 1 but encounter an error during stage 2, you can fix the issue and set `--continue_from_stage 2` to continue from that point.) You can check the `data_engine.py` file for details on what each stage does.


		Thank you for choosing RLAIF-V. Best wishes for your project!
		Generates rewards using the DPO framework to rank answers. Higher-ranked answers are marked as "chosen," while

Data engine #37

Are you sure you want to change the base?

Data engine #37

Conversation

zxrys commented Dec 2, 2024

zxrys commented Dec 2, 2024

yiranyyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Generate Rollouts

Reward collection

Customize your reward model

Customize your instruction model

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiranyyu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment