Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutation test for significance of between treatment parameter differences #41

Open
zaneveld opened this issue Jul 16, 2018 · 0 comments

Comments

@zaneveld
Copy link
Owner

zaneveld commented Jul 16, 2018

Typically, we should expect that every treatment in a time-series study will show some differences in theta, lambda and sigma. Users will want to know which differences are attributable to chance, vs which should be investigated further. I suggest we provide this capability with an individual-wise permutation test.

Here's the idea. After reading in user data and separating out time-series according to individuals, we infer the model once using the real associations between individuals and treatments. We record the difference in parameter values between each pair of treatments (i.e. the observed effect size of the treatment.) Then for some number of user-specified permuations (e.g. --n_permutations = 100), we first shuffle which individuals are in each treatment, then repeat the inference process. This gives us a null distribution of how large of parameter differences we should expect between treatments just due to chance. Our permutational p-value is the number of individual-wise permutations in which an equal or larger difference in parameter values as the real difference was observed, divided by the number of permutations.

We then report a table with columns: treatment1,treatment2, parameter, average parameter value treatment 1, average parameter value treatment 2, n_individuals treatment 1, n_individuals treatment 2, effect size (real data), average effect size (permutational null model), permutational p-value, and the Bonferroni-corrected p-value (multiplying the p-value by number of treatment-treatment comparisons).

Users could therefore run the script and reach conclusions like: lambda was significantly different between IBD and non-IBD patients (p=0.01)

Critically, I think the permutations must be done at the level of assignment of individuals to treatments (not at the level of timepoints within individuals or by just scrambling all samples). If there is inter-individual variation or non-treatment temporal effects then just permuting sample labels won't make a lot of sense and will I think give incorrect p-values. For example, in coral data, if there is a heat-wave at timepoint 10 in the real data, and we just permuted all data, then real differences between treatments may not show up as significant because in the permuted data that timepoint 10 is scrambled between many time-points, inflating differences between treatments in the null model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant