You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add tokenizer, normalizer parameters in calculating scores like rouge
Motivation
I wish I could consider another language when calculating scores like rouge
Because of _normalize_and_tokenize_text, I can only use english.
If only this is supported as a parameter, it will be good to be able to use other languages easily.
Pitch
rouge_score(preds, target, tokenizer, normalizer)
def_normalize_and_tokenize_text(text: str, tokenizer:lambda, normalizer:lambda, stemmer: Optional[Any] =None) ->Sequence[str]:
"""Rouge score should be calculated only over lowercased words and digits. Optionally, Porter stemmer can be used to strip word suffixes to improve matching. The text normalization follows the implemantion from `Rouge score_Text Normalizition`_ Args: text: An input sentence. stemmer: Porter stemmer instance to strip word suffixes to improve matching. """# Replace any non-alpha-numeric characters with spaces.ifnormalizerisNone:
text=re.sub(r"[^a-z0-9]+", " ", text.lower())
text=normalizer(text)
tokens=re.split(r"\s+", text)
ifstemmer:
# Only stem words more than 3 characters long.tokens= [stemmer.stem(x) iflen(x) >3elsexforxintokens]
# One final check to drop any empty or invalid tokens.iftokenizerisNone:
tokens= [xforxintokensif (isinstance(x, str) andre.match(r"^[a-z0-9]+$", x))]
tokens=tokenizer(text)
returntokens
The text was updated successfully, but these errors were encountered:
Hi! thanks for your contribution!, great first issue!
hookSSi
changed the title
Can add additional parameters in calculating scores like bleu, rouge?
Can add additional parameters in calculating scores like rouge?
Feb 3, 2022
Hi @hookSSi, I think it's a good idea. The sacrebleu implementation of BLEUScore also allows to use of different tokenization techniques depending on the target language, so feel free to open PR. Looking forward :]
🚀 Feature
Add tokenizer, normalizer parameters in calculating scores like rouge
Motivation
I wish I could consider another language when calculating scores like rouge
Because of
_normalize_and_tokenize_text
, I can only use english.If only this is supported as a parameter, it will be good to be able to use other languages easily.
Pitch
The text was updated successfully, but these errors were encountered: