Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wilcoxon statistical calculation and plotting to 'yPlot()' and 'freqPlot()' #14

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

dtm2451
Copy link
Owner

@dtm2451 dtm2451 commented May 17, 2024

Goal: Add functionality to yPlot() and freqPlot() to perform statistical testing between frequencies of pairwise group.by-groups (and/or color.by-groups).

This is a BIG add in terms of both expected value add to users, and also required code complexity. Currently a work in progress.

An internal function is used to calculate p-values for pairwise comparisons between groupings. The methodology and control points of this function are still evolving, but the plan is to document and export this function for visibility purposes before merge of this PR. The p-values are plotted with ggpubr::stat_pvalue_manual().

ToDo before merge:

  • Add statistical test methodology flexibility (pvalues.test.method & pvalues.test.adjust)
  • Add optional summarization by samples prior to running comparisons (pvalues.sample.by & pvalues.sample.summary)
  • Not yet, but potential future feature: Decide if and how to implement a description of comparisons run
  • improve bracket height offset system
    • use range (per panel when split.adjust = scale = "free_y") instead of just the max and 0-to-max
  • allow comparing between color.by-groups when group.by != color.by.
    • initial (calculation swap, bracket positioning)
    • improve positioning
    • improve interpretation (stick with group.by when `color.by constitutes super-groups?)
  • unit tests
    • basic
    • complicated features
  • ensure plays nicely with do.hover
  • documentation via vignette because examples are already take too long
  • decide how/if to implement option for plotting */** instead of direct pvalues.
    • Supported currently via manual provision of involved inputs to pvalues.plot.adjust it cannot actually be achieved this way with stat_pvalues_manual...
    • Added pvalues.plot.symbols logical or function (default = FALSE) input make this customizeable yet easy. Symbol definitions are given in the input documentation.

NOTE:

  • due to a clash with ggpubr internals, this PR will require changing the name of the "label" column of the compositional data.frames generated internally by freqPlot() & barPlot(), accessible via data.out = TRUE. The PR uses "Y" as the new name currently, but I may settle on something different before merge.

@j-andrews7
Copy link
Contributor

j-andrews7 commented Jun 14, 2024

ensure plays nicely with color.by, if can. Likely will need to block use of this feature when group.by != color.by

I recognize this increases complexity dramatically, but being able to use both group.by and color.by to get stats for between colors within each group would be excellent. I've frequently found it the most compact way to display the data.

edit: geom_pwc actually makes this not bad to do after the fact, so maybe not necessary.

pp <- dittoPlot(object = myRNA, var = "gene1", group.by = "conditions",
             color.by = "timepoint")
pp + geom_pwc(aes(group = timepoint))

image

@dtm2451
Copy link
Owner Author

dtm2451 commented Jun 17, 2024

Agreed that subgroup comparisons would also be very nice, and...

edit: geom_pwc actually makes this not bad to do after the fact, so maybe not necessary.

This is also what I discovered!

It's just a bit unfortunate to build only half the solution internally and then need to rely on something like internal descriptions reliant on an external tooling, or to maintain pointers to external documentation if I/we find a good geom_pwc tutorial. External docs could update whenever and then leave dittoViz's statistical calculation recommendations out-of-date.... but that's workable.

The fact that I want to add a pvalues.test.use control to allow statistical modeling flexibility is where things break down a bit more. I'd like to support chi^2 in particular which I think can be better for cell frequency differences. But the method input of ggpubr::stat_pwc is limited to only certain methods (of the rstatix R package, per docs), and unfortunately the list does not include chisq.

@dtm2451 dtm2451 changed the title Add wilcoxon statistical calculation and plotting to 'freqPlot()' Add wilcoxon statistical calculation and plotting to 'yPlot()' and 'freqPlot()' Jun 17, 2024
@dtm2451
Copy link
Owner Author

dtm2451 commented Jun 18, 2024

There's a working implementation for using both group.by and color.by now! Of course, I've just noticed it breaks down for the not fully shared subgroups setting of your example above, so there's more work to do to finish it up. But this case now works:

library(dittoViz)
example("dittoExampleData", echo = FALSE)
freqPlot(example_df, "clustering", "sample", "category", "subcategory", add.pvalues = "all", pvalues.adjust = FALSE)

image

I spent a good bit of time troubleshooting my way through applying the docs that ggpubr points to today for adding p-values between subgroups. Unfortunately, I couldn't seem to make use of the same functions with my p-values generated outside of the rstatix package. BUT I was able to understand the moving pieces enough to re-create them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants