fix bug in run_clm_sft_with_peft.py #674

STHSF · 2023-06-26T14:25:26Z

Adjust pad token before count the number of tokens

add pad token before count the number of tokens

airaria · 2023-06-26T15:24:04Z

We recommend using the Alpaca tokenizer when running run_clm_sft_with_peft.py.
The if statement is to check if the tokenizer is alpaca tokenizer (of which vocab size is 49954).

In #666, you used the merged tokenizer (of which vocab size is 49953), instead of the alpaca tokenizer from the chinese-alpaca-lora.
Therefore, if you switch to using the Alpaca tokenizer, the script should function correctly.

Update run_clm_sft_with_peft.py

05adbe3

add pad token before count the number of tokens

ymcui marked this pull request as draft July 7, 2023 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix bug in run_clm_sft_with_peft.py #674

fix bug in run_clm_sft_with_peft.py #674

STHSF commented Jun 26, 2023

airaria commented Jun 26, 2023

fix bug in run_clm_sft_with_peft.py #674

Are you sure you want to change the base?

fix bug in run_clm_sft_with_peft.py #674

Conversation

STHSF commented Jun 26, 2023

airaria commented Jun 26, 2023