Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this project have this function ? #3162

Closed
SeekPoint opened this issue Mar 6, 2020 · 5 comments
Closed

Does this project have this function ? #3162

SeekPoint opened this issue Mar 6, 2020 · 5 comments
Labels

Comments

@SeekPoint
Copy link

🚀 Feature request

can we use this project to calculate the probability that a input text as a real/resonable sentence base on the corpus we trained

@frankniujc
Copy link

#2311

@SeekPoint
Copy link
Author

@frankniujc it is helpful
but maybe a better way is take the all tokens in a whole, not prediction the next tokens

@frankniujc
Copy link

The probability of a sentence P(s0s1s2s3s4...sn) = P(s1|s0) * P(s2|s0s1) * P(s3|s0s1s2) * ... * P(sn|s0s1s2...sn-1)

So you can do something like this

def sentence_probability(sent):
    bos = tokenizer.encode('<|endoftext|>')
    tokens = tokenizer.encode(sent)
    tokens = bos + tokens
    input_ids = torch.tensor(tokens).unsqueeze(0).to('cuda')

    sent_probs = []

    for i, next_word in enumerate(tokens[1:]):
        next_word_logits = model(input_ids[:,:i+1])[0][0, -1].detach()
        next_word_prob = F.log_softmax(next_word_logits, dim=0)[next_word].item()

        sent_probs.append(next_word_prob)

    return sum(sent_probs)

@simonepri
Copy link

@lovejasmine Have a look at lm-scorer.

It is a tiny wrapper around transformers I wrote that allows you to get sentences probabilities using models that support it (only GPT2 models are implemented at the time of writing).

@stale
Copy link

stale bot commented Jun 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 6, 2020
@stale stale bot closed this as completed Jun 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants