For likelihood ratio tests, exact sampling distributions are unknown for
most probability density functions. Instead, p value calculations rely
on the asymptotic
Definitions:
- Asymptotic p value - p value based on the asymptotic
$\chi^2$ approximation. - Empirical p value - proportion of p values as extreme or more extreme than the asymptotic p value.
If the
If the
For each test:
- Generate data from distribution.
- Call hypothesis testing function and get asymptotic p value.
- Compare each iteration’s asymptotic p value to all other asymptotic p values to calculate an empirical p value.
Sample size is increased and the process is repeated until calibration is good between the two p values.
Ideally, calibration is good across the entire range of asymptotic p
value. What is critical is calibration at .20 and less. Almost no one
sets
For all three alternative hypotheses, dots are near the red line for asymptotic p values below .20. Most tests are well calibrated over the entire range of asymptotic p values.
For one way tests, calibration is great for most tests. The empirical quantile test has the worst calibration.