Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why mean over all actions sampled in multi outcome sampling #7

Open
annw0922 opened this issue Jun 24, 2020 · 1 comment
Open

Why mean over all actions sampled in multi outcome sampling #7

annw0922 opened this issue Jun 24, 2020 · 1 comment

Comments

@annw0922
Copy link

https://github.com/EricSteinberger/Deep-CFR/blob/master/DeepCFR/workers/la/sampling_algorithms/MultiOutcomeSampler.py

as 'aprx_imm_reg' here is computed for every action and put to buffer without being summed up, I have no idea why
'aprx_imm_reg *= legal_action_mask / n_actions_to_smpl '

I think it is because I could not understand the formula here(v~(I) = * p(a) * |A(I)), and I failed find corresponding part in your paper,
"""
Last state values are the average, not the sum of all samples of that state since we add
v~(I) = * p(a) * |A(I)|. Since we sample multiple actions on each traverser node, we have to average over
their returns like: v~(I) * Sum_a=0_N (v~(I|a) * p(a) * ||A(I)|| / N).
"""

is there any reference for it?

thanks a lot

@EricSteinberger
Copy link
Owner

Hi! This is to make sure that the estimate is not scaled up just because you sample more actions. The regrets get more accurate the more actions you sample but the expectation of the value should stay the same and not go up linearly. Does this make sense? It's not in the paper, you are right - thank you for checking before opening the issue, appreciated! This is an implementation detail and the paper itself doesn't use MOS sampling - it uses External sampling where this division doesn't really matter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants