Skip to content

Commit

Permalink
Fix whole word masking labels (#326)
Browse files Browse the repository at this point in the history
  • Loading branch information
lewtun authored Oct 4, 2022
1 parent 2a61a64 commit 4cd2c0e
Show file tree
Hide file tree
Showing 5 changed files with 10 additions and 0 deletions.
2 changes: 2 additions & 0 deletions chapters/en/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return default_data_collator(features)
```
Expand Down Expand Up @@ -563,6 +564,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return tf_default_data_collator(features)
```
Expand Down
2 changes: 2 additions & 0 deletions chapters/fr/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return default_data_collator(features)
```
Expand Down Expand Up @@ -564,6 +565,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return tf_default_data_collator(features)
```
Expand Down
2 changes: 2 additions & 0 deletions chapters/ja/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return default_data_collator(features)
```
Expand Down Expand Up @@ -575,6 +576,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return tf_default_data_collator(features)
```
Expand Down
2 changes: 2 additions & 0 deletions chapters/vi/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return default_data_collator(features)
```
Expand Down Expand Up @@ -563,6 +564,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return tf_default_data_collator(features)
```
Expand Down
2 changes: 2 additions & 0 deletions chapters/zh-CN/chapter7/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return default_data_collator(features)
```
Expand Down Expand Up @@ -564,6 +565,7 @@ def whole_word_masking_data_collator(features):
for idx in mapping[word_id]:
new_labels[idx] = labels[idx]
input_ids[idx] = tokenizer.mask_token_id
feature["labels"] = new_labels

return tf_default_data_collator(features)
```
Expand Down

0 comments on commit 4cd2c0e

Please sign in to comment.