Can I use BERT / gpt-2 for text generation #2311

orenpapers · 2019-12-25T13:31:38Z

❓ Questions & Help

I want to get a list of possible completions and their probabilities.
For example,
For the sentence "I put the glass of the _"
I want to get a vector with word and probabilities from a pre-trained model, such as :
desk = 0.1
table = 0.2
car = 0.05
shirt = 0.001
Is that possible?

patrickvonplaten · 2019-12-25T20:29:44Z

You could do something like this when using gpt2

from transformers import GPT2LMHeadModel, GPT2Tokenizer
from torch.nn import functional as F
import torch

model = GPT2LMHeadModel.from_pretrained('gpt2-medium')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')

# encode input context
input_ids = torch.tensor(tokenizer.encode('I put the glass of the')).unsqueeze(0)
# get logits of last predicted token
next_word_logits = model(input_ids)[0][0, -1].detach()
next_word_probs = F.softmax(next_word_logits, dim=0)

next_words = ['desk', 'table', 'car', 'shirt']
next_words_probs = []
# encode tokens for which prob is to be estimated
next_word_ids = [tokenizer.encode(next_word) for next_word in next_words]

for next_word_id in next_word_ids:
    next_word_input_ids = input_ids.clone()
    next_word_prob = next_word_probs[next_word_id[0]].item()
    # We need a while loop here because a single word can be composed of multiple tokens
    # 'desk' is encoded to 2 tokens so that we have to call the model another time
    while(len(next_word_id) > 1):
        next_word_input_ids = torch.cat((next_word_input_ids, torch.tensor([next_word_id[0]]).unsqueeze(0)), dim=1)
        # get logits of last predicted token
        next_word_logits = model(next_word_input_ids)[0][0, -1].detach()
        # multiply prob of next token to prob of previous tokens
        next_word_prob *= F.softmax(next_word_logits, dim=0)[next_word_id[1]].item()
        # remove first token since already used
        next_word_id = next_word_id[1:]
    next_words_probs.append(next_word_prob)

# print result
for next_word, next_word_prob in zip(next_words, next_words_probs):
    print('{} = {}'.format(next_word, next_word_prob))

shashankMadan-designEsthetics · 2019-12-26T07:15:41Z

Yes it is possible u need to take the topk of lm_logits (it will be output[0] in case of gpt)which essentially gives to 50257 probabilities (highest to lowest) which is the vocab size then you need to take top k which gives indices and values, values are nothing but urs scores(0.8, 0.1) and the indices which correspond to the 50257 vocabulary words which u can decode using tokenize decode.

orenpapers · 2019-12-29T08:47:20Z

@patrickvonplaten Amazing thanks!
And if I want the rank of these words from all the word in the vocab?
e.g. desk is the most probable word , table in #12 , etc. ?

patrickvonplaten · 2020-01-01T13:45:00Z

Since GPT-2's output is based on byte-pair-encoding tokens and not on words you would have to define your own vocabulary. Having defined your vocabulary, I would simply calculate the probability for each word using the above procedure and then sort the tensor.
To better understand how byte-pair-encoding works this might help.
To sort the tensor this might help.

orenpapers · 2020-01-01T13:51:24Z

@patrickvonplaten Thanks, you think it will be possible to do it for all (or at least most) of the words in English in my personal MAC?

patrickvonplaten · 2020-01-01T14:14:57Z

Yeah, I think that should definitely be feasible.
Many words will consists of two tokens or less and will therefore need at most one additional forward pass (because the first forward pass is the same for all words and need to be calculated only once).

So if you have a vocabulary of say 300.000 words, I'd estimate that you would have to compute around 200.000 forward passes. You can calculate how much time a forward pass would take by averaging the computation time for 100 times calculating the probability for the word 'desk'.

Concerning memory, there should not be a problem.

patrickvonplaten · 2020-01-01T14:43:24Z

And the final vector giving the probabilities over your defined vocabulary should be normalized to make a prob distribution.

orenpapers · 2020-01-02T08:39:36Z

@patrickvonplaten You mean using softmax?

stale · 2020-03-02T11:55:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

patrickvonplaten · 2020-03-02T12:33:58Z

I was thinking to just normalize like this:
https://stackoverflow.com/questions/26785354/normalizing-a-list-of-numbers-in-python

but you could also use softmax again - depends on what you want and what works better for you!

orenpapers · 2020-05-12T13:00:31Z

@patrickvonplaten is it possible with BERT pre-trained model?
Thanks!

patrickvonplaten · 2020-05-12T13:20:49Z

You might take a look at masked language modeling :-) https://huggingface.co/transformers/usage.html#masked-language-modeling

orenkobo · 2020-05-12T15:13:06Z

@patrickvonplaten Nice! Thanks for the pointer!
And let's say I want to check a specific word in a masked location (What is the probability of the word "package " in the sequence "HuggingFace is creating a { } that the community uses to"? Is this possible?

stale bot added the wontfix label Mar 2, 2020

patrickvonplaten closed this as completed Mar 2, 2020

frankniujc mentioned this issue Mar 19, 2020

Does this project have this function ? #3162

Closed

patrickvonplaten mentioned this issue Apr 21, 2020

How to find a correct place of original word from the list of predicted words from GPT-2 model? #3886

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use BERT / gpt-2 for text generation #2311

Can I use BERT / gpt-2 for text generation #2311

orenpapers commented Dec 25, 2019

patrickvonplaten commented Dec 25, 2019

shashankMadan-designEsthetics commented Dec 26, 2019

orenpapers commented Dec 29, 2019

patrickvonplaten commented Jan 1, 2020

orenpapers commented Jan 1, 2020

patrickvonplaten commented Jan 1, 2020

patrickvonplaten commented Jan 1, 2020

orenpapers commented Jan 2, 2020

stale bot commented Mar 2, 2020

patrickvonplaten commented Mar 2, 2020

orenpapers commented May 12, 2020

patrickvonplaten commented May 12, 2020

orenkobo commented May 12, 2020

Can I use BERT / gpt-2 for text generation #2311

Can I use BERT / gpt-2 for text generation #2311

Comments

orenpapers commented Dec 25, 2019

❓ Questions & Help

patrickvonplaten commented Dec 25, 2019

shashankMadan-designEsthetics commented Dec 26, 2019

orenpapers commented Dec 29, 2019

patrickvonplaten commented Jan 1, 2020

orenpapers commented Jan 1, 2020

patrickvonplaten commented Jan 1, 2020

patrickvonplaten commented Jan 1, 2020

orenpapers commented Jan 2, 2020

stale bot commented Mar 2, 2020

patrickvonplaten commented Mar 2, 2020

orenpapers commented May 12, 2020

patrickvonplaten commented May 12, 2020

orenkobo commented May 12, 2020