Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on the computation of P(True) baseline #6

Open
seonwoo-min opened this issue Jun 19, 2023 · 0 comments
Open

Question on the computation of P(True) baseline #6

seonwoo-min opened this issue Jun 19, 2023 · 0 comments

Comments

@seonwoo-min
Copy link

seonwoo-min commented Jun 19, 2023

Hi @lorenzkuhn,

I wanted to bring to your attention a potential error in the computation of the P(True) baseline, unless I have misunderstood something here.

Currently, in the code snippet, only the first len(tokenized_base_prompt) targets are to -100:

# This computation of the negative log likelihoods follows this tutorial: https://huggingface.co/docs/transformers/perplexity
tokenized_base_prompt = generation_tokenizer(base_prompt)['input_ids']
tokenized_prompt_true = torch.tensor(generation_tokenizer(prompt_true)['input_ids'], device=device)
target_ids_true = tokenized_prompt_true.clone()
target_ids_true[:len(tokenized_base_prompt)] = -100

However, it seems that this approach does not ignore the entire context tokens when calculating the NLL loss, as the prompt_true also includes the few_shot_prompt prior to the base_prompt:

base_prompt = prompt_template.format(question, generated_texts, most_likely_answer)
prompt_true = few_shot_promopt + prompt_template.format(question, generated_texts, most_likely_answer) + ' True'

It seems not ignoring the entire context tokens for the NLL loss computation might lead to inaccurate results.
Could you please provide some insights or clarification on this matter?

In addition, the current codes only use the n_samples_to_use = 2000 samples for P(True) baseline.
Do the experiment settings for the P(True) and the others different? I don't recall reading any explanation in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant