Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about the distribution of acc1-acc2 #4

Open
JianSun411 opened this issue Mar 12, 2021 · 2 comments
Open

Some question about the distribution of acc1-acc2 #4

JianSun411 opened this issue Mar 12, 2021 · 2 comments

Comments

@JianSun411
Copy link

Hey there, CCIT contributors,
From line 389 in CCIT.py file, I think you believe that acc1-acc2 obeys the normal distribution N(0, 2\sigma(acc2)^2) where \sigma(acc2) is the standard variance of acc2. I think this is right, too. But based on this thought, there are two inconsistent points in the other part of the codes:

  1. In line 373, only "s2 = np.std(cleaned, axis = 0, doff = 1)[4]" is the sample variance, the unbiased estimator of \sigma(acc2) (the standard variance of acc2). "np.std(cleaned, axis = 0)[4]" is the population standard variance which is not the unbiased estimator of \sigma(acc2).
  2. In line 391, when bootstrap == False, why the standard variance is np.sqrt(2) * 1/np.sqrt(ntot) (np.sqrt(2) is multiplied in function "pvalue", line 325)? I think it should be np.sqrt(2) * np.sqrt(acc2 * (1-acc2)/ntot) since acc2 obeys the distribution N(acc2, acc2*(1-acc2)/ntot) (acc2 follows the normal distribution since it is generated from a Binomial Distribution where y_pret == y_test)

BTY, I appreciate your paper Model-powered Conditional Independence Test. It is great!

@rajatsen91
Copy link
Owner

Thanks for the comment. You are right on both counts, I will change it in a future revision.

I suspect that the performance gap due to (1) will be pretty small.

@JianSun411
Copy link
Author

Thanks for your reply. Besides, I got puzzled by the explanation of CCIT function where it says "If pval is low CI is rejected if it's high we fail to reject CI.". However, the paper says "... when H_0 is true, the bias will be close to 0" (in the paragraph named "Algorithm with Bias Correction"). If so, CI is rejected if the pval (0.5 * erfc(x/np.sqrt(2))) is far away from 0.5. These two statements consistent with each other if the bias is always positive. Although the paper does state that the bias > 0, I find that, in practice, the pval can be higher than 0.5, i.e, the bias is negative.

Do you have any idea about why the bias can be negative? And when should we reject the CI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants