Cross Contamination in SFT Trainer #204

elichen3051 · 2024-11-18T17:52:22Z

Dear HuggingFace

I've noted that in run_cpt.py and run_sft.py, we introduce packing=True. However, we didn't provide DataCollatorForCompletionOnlyLM into SFTtrainer; would it introduce cross contamination in training?

referenece article: Improving Hugging Face Training Efficiency Through Packing with Flash Attention
trl issue on github: huggingface/trl#805

The text was updated successfully, but these errors were encountered:

lewtun · 2024-11-19T09:55:06Z

Hello @elichen3051 the task is the same whether one uses packing or not (i.e. next token prediction). The DataCollatorForCompletionOnlyLM is for the special case where you want to mask the inputs / prompts and in some cases gives a small performance boost

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross Contamination in SFT Trainer #204

Cross Contamination in SFT Trainer #204

elichen3051 commented Nov 18, 2024

lewtun commented Nov 19, 2024

Cross Contamination in SFT Trainer #204

Cross Contamination in SFT Trainer #204

Comments

elichen3051 commented Nov 18, 2024

lewtun commented Nov 19, 2024