You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, it seems that this approach does not ignore the entire context tokens when calculating the NLL loss, as the prompt_true also includes the few_shot_prompt prior to the base_prompt:
It seems not ignoring the entire context tokens for the NLL loss computation might lead to inaccurate results.
Could you please provide some insights or clarification on this matter?
In addition, the current codes only use the n_samples_to_use = 2000 samples for P(True) baseline.
Do the experiment settings for the P(True) and the others different? I don't recall reading any explanation in the paper.
The text was updated successfully, but these errors were encountered:
Hi @lorenzkuhn,
I wanted to bring to your attention a potential error in the computation of the P(True) baseline, unless I have misunderstood something here.
Currently, in the code snippet, only the first
len(tokenized_base_prompt)
targets are to -100:semantic_uncertainty/code/get_prompting_based_uncertainty.py
Lines 108 to 113 in 27adbf0
However, it seems that this approach does not ignore the entire context tokens when calculating the NLL loss, as the
prompt_true
also includes thefew_shot_prompt
prior to thebase_prompt
:semantic_uncertainty/code/get_prompting_based_uncertainty.py
Lines 105 to 106 in 27adbf0
It seems not ignoring the entire context tokens for the NLL loss computation might lead to inaccurate results.
Could you please provide some insights or clarification on this matter?
In addition, the current codes only use the
n_samples_to_use = 2000
samples for P(True) baseline.Do the experiment settings for the P(True) and the others different? I don't recall reading any explanation in the paper.
The text was updated successfully, but these errors were encountered: