You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to calculate at which place correct word option lies in the top 5 predicted words from GPT-2 model?
For this purpose, I am using following code snippet:
subseq = "The car moves very" #sample sequence
orignal_word="fast"
sequence = tokenizer.encode(subseq, return_tensors="pt")
next_word_id = tokenizer.encode(orignal_word, return_tensors="pt")
next_word = tokenizer.decode(next_word_id[0])
next_word_logits = model(sequence)[0][0, -1].detach()
probabilities, word_ids = next_word_logits.topk(5) #Getting top 5 next word options
rank=1.0
for word_id in word_ids:
word = tokenizer.decode([word_id])
if word == next_word:
break;
rank=rank+1.0
print("Rank of Correct option is "+ str(rank))
I am not sure whether it is done perfectly or not as GPT-2 model uses BPE tokenizer. Am I doing it in a right way? Kindly share your thoughts, and correct me if I am doing something wrong in it.
The text was updated successfully, but these errors were encountered:
It won't be that easy since some words will be split into multiple tokens so you have to make two forward passes.
If you limit your original_word to just one token words (you can check that simply with len(tokenizer.encode(original_word))==1. Then your idea here should work.
If not it's gonna be trickier. Also this issue might be helpful: #2311
Thanks @patrickvonplaten for your response.
Yes, the code works for len(tokenizer.encode(original_word))==1, but not for those original_word , which consist of more than one tokens.
I look at the shared issue, but I am confused, which selected word id, should I pass to the model again, as next_word_logits.topk(5) gives me 5 token ids?
Can you please share any code snippet, which will work for the second part?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi,
I would like to calculate at which place correct word option lies in the top 5 predicted words from GPT-2 model?
For this purpose, I am using following code snippet:
I am not sure whether it is done perfectly or not as GPT-2 model uses BPE tokenizer. Am I doing it in a right way? Kindly share your thoughts, and correct me if I am doing something wrong in it.
The text was updated successfully, but these errors were encountered: