fix: Add pad_token to special_tokens_dict when pad_token == eos_token #436
+2
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the change
Add
pad_token
to special_tokens_dict whenpad_token == eos_token
to increase F1 micro score for Granite3.0-8b model.Related issue number
Issue:#343
Issue 343 modifies pad_token of tokenizer but in this PR adds it to
special_tokens_dict
for model resizing.Reason of this case of
pad_token == eos_token
:When tokenizer has PAD token equal to EOS token, then the DataCollatorForCompletionOnlyLM masks the EOS token from labels of dataset batch and hence its not used in loss calculation and hence it leads to repetitive
predicted_target
during inference time.How to verify the PR
Was the PR tested