What is the correct way to do DPO training on CHAT preference datasets? #1399

abhinand5 · 2024-03-13T02:19:42Z

abhinand5
Mar 13, 2024

!! The fields are dict instead of strings. !!

So I have written a script to tokenize them and then save the tokenized prompts in a dataset, which I am trying to then load and do the training with custom dataset type.

rl: dpo
datasets:
  - path: dpo-mix-formatted.jsonl
    ds_type: json
    split: train
    type:
        field_prompt: prompt
        field_chosen: chosen
        field_rejected: rejected

The training is running currently, but I am not sure if this is the correct way to do this. Can anyone confirm?

NanoCode012 · 2024-04-03T15:19:40Z

NanoCode012
Apr 3, 2024
Collaborator

I haven't tried rlhf datasets yet. You may be able to run preprocess with --debug to see the processed outputs?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the correct way to do DPO training on CHAT preference datasets? #1399

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What is the correct way to do DPO training on CHAT preference datasets? #1399

abhinand5 Mar 13, 2024

Replies: 1 comment

NanoCode012 Apr 3, 2024 Collaborator

abhinand5
Mar 13, 2024

NanoCode012
Apr 3, 2024
Collaborator