You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '
I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""
is there any reference for it?
thanks a lot
The text was updated successfully, but these errors were encountered:
Hi! This is to make sure that the estimate is not scaled up just because you sample more actions. The regrets get more accurate the more actions you sample but the expectation of the value should stay the same and not go up linearly. Does this make sense? It's not in the paper, you are right - thank you for checking before opening the issue, appreciated! This is an implementation detail and the paper itself doesn't use MOS sampling - it uses External sampling where this division doesn't really matter
https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py
as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '
I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""
is there any reference for it?
thanks a lot
The text was updated successfully, but these errors were encountered: