Explore Variant_Classification's Impact on TMB: Part 1: Filter vs no Filter #739

cansavvy · 2020-08-20T12:42:14Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Part 1 of #729

A notebook that runs the calculate_tmb.R script with and without --nonsynfilter and creates some plots to get visualizes on how the participant data changes with filters.

How different are the synonymous vs nonsynonymous mutation counts?

Short answer, not much different at all.

What was your approach?

Does the filter change a participant's TMB? I did a non-filter filter correlation and scatterplot.
Are some histologies affected more than others? I facet wrapped the scatterplot by short_histology.
Does the filter affect the TCGA-PBTA comparison change? I plotted the TMB no filter/filter and TCGA/PBTA side by side.
Plot the TMB plot with no filter data - I plotted "the" tmb plot with the no filter version of the tmb and have imported the filtered version for viewing side by side.

What GitHub issue does your pull request address?

Part 1 of #729

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

How do we feel about the set up in general?
Does this adequately address part 1 of the question in Exploratory Side Analysis: Variant_Classification breakdown #729? "How does the nonsynonymous filter affect TMB"?
Any plots we'd like to see that aren't included?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes. Here's the rendered notebook:
explore_nonsynfilter.nb.html.zip

Results

What types of results are included (e.g., table, figure)?

Plots can be seen here explore_nonsynfilter.nb.html.zip

What is your summary of the results?

Overall TMB comparisons do not seem to be affected much by filter of nonsynonymous mutations.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

I haven't added a README since this is a side analysis. The main information is in the Rmd. It's not a lone standing module so I also didn't add it to the table. But if we would like these items to be added, I can.

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

cansavvy · 2020-08-20T12:51:47Z

Before you get into a fine detail review, @jashapiro can you give this a "big picture" review and see if it fits what you pictured for part 1 of #729 and your comment that that issue is based on: #728 (review)

…on' into cansavvy/var_class_investigation

jashapiro

This looks good. I would add something about the fractions of mutations that are coding vs. noncoding, especially in the context of comparisons among tumor types. If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

Adding some plots of the proportion coding (filter/nofilter) in each sample, split by morphology, might be a useful addition. This is perhaps simpler to plot than the slopes.

jashapiro · 2020-09-01T17:37:19Z

analyses/snv-callers/explore_variant_classifications/explore_nonsynfilter.Rmd

+  "no_filter",
+  "pbta-snv-mutation-tmb-coding.tsv"
+)) %>%
+  # This variable is weird when binding but we don't need it for the plot so we'll just remove it.


tell me more?

Hmm... It was weird when I used this kind of import and bind data.frames in another notebook but now its fine, so I deleted this step.

cansavvy · 2020-09-01T18:36:21Z

If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

I'm all for doing this comparison, but it will require different datasets and different set up than what I have going on in this notebook. So can I propose that if what is here is okay then I can start a new notebook/ new PR that looks into non-coding/coding proportions?

jashapiro · 2020-09-01T18:47:24Z

If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

I'm all for doing this comparison, but it will require different datasets and different set up than what I have going on in this notebook. So can I propose that if what is here is okay then I can start a new notebook/ new PR that looks into non-coding/coding proportions?

My language is imprecise here, I don't really mean coding/noncoding. I just mean a violin plot (or similar) of tmb_filter/tmb_no_filter separated by disease. I think you have the data you need for that in this notebook.

cansavvy · 2020-09-01T18:48:08Z

If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

I'm all for doing this comparison, but it will require different datasets and different set up than what I have going on in this notebook. So can I propose that if what is here is okay then I can start a new notebook/ new PR that looks into non-coding/coding proportions?

My language is imprecise here, I don't really mean coding/noncoding. I just mean a violin plot (or similar) of tmb_filter/tmb_no_filter separated by disease. I think you have the data you need for that in this notebook.

Whew, okay that is much easier. Will do.

jashapiro

This looks good to me. Nothing too surprising!

cansavvy added 7 commits August 13, 2020 13:22

Add script and Rmd basics

c261bb2

Add no nonsyn filter to steps

464d4d2

Got some basic analyses here.

d9630f9

Edits to script. More polishing.

afaceb5

Add some wording

6a085cc

Add the tmb plot comparison; rename

970bee5

Merge branch 'master' into cansavvy/var_class_investigation

721f1bb

cansavvy marked this pull request as ready for review August 20, 2020 12:48

cansavvy requested a review from jashapiro August 20, 2020 12:51

cansavvy added 3 commits August 20, 2020 08:59

Fix render

08e7224

Merge remote-tracking branch 'cansavvy/cansavvy/var_class_investigati…

9702928

…on' into cansavvy/var_class_investigation

Fix file path

da398de

cansavvy mentioned this pull request Aug 20, 2020

Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

Merged

5 tasks

cansavvy added 2 commits August 21, 2020 10:51

Add some more notes, re-run

f9940d1

Merge branch 'master' into cansavvy/var_class_investigation

2a29bd3

jashapiro reviewed Sep 1, 2020

View reviewed changes

Get rid of the unnecessary region_size removal step and re-run

00f9b26

Add ratio violin plot

0f2225a

cansavvy requested a review from jashapiro September 1, 2020 19:21

jashapiro approved these changes Sep 1, 2020

View reviewed changes

jaclyn-taroni merged commit 0e64389 into AlexsLemonade:master Sep 2, 2020

cansavvy mentioned this pull request Sep 3, 2020

Updated analysis: filter to only non-synonymous mutations for TMB #726

Closed

cansavvy deleted the cansavvy/var_class_investigation branch September 8, 2020 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore Variant_Classification's Impact on TMB: Part 1: Filter vs no Filter #739

Explore Variant_Classification's Impact on TMB: Part 1: Filter vs no Filter #739

cansavvy commented Aug 20, 2020 •

edited

Loading

cansavvy commented Aug 20, 2020

jashapiro left a comment

jashapiro Sep 1, 2020

cansavvy Sep 1, 2020 •

edited

Loading

cansavvy commented Sep 1, 2020

jashapiro commented Sep 1, 2020

cansavvy commented Sep 1, 2020

jashapiro left a comment

Explore Variant_Classification's Impact on TMB: Part 1: Filter vs no Filter #739

Explore Variant_Classification's Impact on TMB: Part 1: Filter vs no Filter #739

Conversation

cansavvy commented Aug 20, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

cansavvy commented Aug 20, 2020

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Sep 1, 2020

Choose a reason for hiding this comment

cansavvy Sep 1, 2020 • edited Loading

Choose a reason for hiding this comment

cansavvy commented Sep 1, 2020

jashapiro commented Sep 1, 2020

cansavvy commented Sep 1, 2020

jashapiro left a comment

Choose a reason for hiding this comment

cansavvy commented Aug 20, 2020 •

edited

Loading

cansavvy Sep 1, 2020 •

edited

Loading