Permutation test for significance of between treatment parameter differences #41

zaneveld · 2018-07-16T19:05:03Z

Typically, we should expect that every treatment in a time-series study will show some differences in theta, lambda and sigma. Users will want to know which differences are attributable to chance, vs which should be investigated further. I suggest we provide this capability with an individual-wise permutation test.

Here's the idea. After reading in user data and separating out time-series according to individuals, we infer the model once using the real associations between individuals and treatments. We record the difference in parameter values between each pair of treatments (i.e. the observed effect size of the treatment.) Then for some number of user-specified permuations (e.g. --n_permutations = 100), we first shuffle which individuals are in each treatment, then repeat the inference process. This gives us a null distribution of how large of parameter differences we should expect between treatments just due to chance. Our permutational p-value is the number of individual-wise permutations in which an equal or larger difference in parameter values as the real difference was observed, divided by the number of permutations.

We then report a table with columns: treatment1,treatment2, parameter, average parameter value treatment 1, average parameter value treatment 2, n_individuals treatment 1, n_individuals treatment 2, effect size (real data), average effect size (permutational null model), permutational p-value, and the Bonferroni-corrected p-value (multiplying the p-value by number of treatment-treatment comparisons).

Users could therefore run the script and reach conclusions like: lambda was significantly different between IBD and non-IBD patients (p=0.01)

Critically, I think the permutations must be done at the level of assignment of individuals to treatments (not at the level of timepoints within individuals or by just scrambling all samples). If there is inter-individual variation or non-treatment temporal effects then just permuting sample labels won't make a lot of sense and will I think give incorrect p-values. For example, in coral data, if there is a heat-wave at timepoint 10 in the real data, and we just permuted all data, then real differences between treatments may not show up as significant because in the permuted data that timepoint 10 is scrambled between many time-points, inflating differences between treatments in the null model.

zaneveld added the enhancement label Jul 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permutation test for significance of between treatment parameter differences #41

Permutation test for significance of between treatment parameter differences #41

zaneveld commented Jul 16, 2018 •

edited

Loading

Permutation test for significance of between treatment parameter differences #41

Permutation test for significance of between treatment parameter differences #41

Comments

zaneveld commented Jul 16, 2018 • edited Loading

zaneveld commented Jul 16, 2018 •

edited

Loading