Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Explore Variant_Classification's Impact on TMB: Part 1: Filter vs no Filter #739

Merged

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Aug 20, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Part 1 of #729

A notebook that runs the calculate_tmb.R script with and without --nonsynfilter and creates some plots to get visualizes on how the participant data changes with filters.

How different are the synonymous vs nonsynonymous mutation counts?

Short answer, not much different at all.

What was your approach?

  • Does the filter change a participant's TMB? I did a non-filter filter correlation and scatterplot.
  • Are some histologies affected more than others? I facet wrapped the scatterplot by short_histology.
  • Does the filter affect the TCGA-PBTA comparison change? I plotted the TMB no filter/filter and TCGA/PBTA side by side.
  • Plot the TMB plot with no filter data - I plotted "the" tmb plot with the no filter version of the tmb and have imported the filtered version for viewing side by side.

What GitHub issue does your pull request address?

Part 1 of #729

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes. Here's the rendered notebook:
explore_nonsynfilter.nb.html.zip

Results

What types of results are included (e.g., table, figure)?

Plots can be seen here explore_nonsynfilter.nb.html.zip

What is your summary of the results?

Overall TMB comparisons do not seem to be affected much by filter of nonsynonymous mutations.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

I haven't added a README since this is a side analysis. The main information is in the Rmd. It's not a lone standing module so I also didn't add it to the table. But if we would like these items to be added, I can.

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@cansavvy cansavvy marked this pull request as ready for review August 20, 2020 12:48
@cansavvy
Copy link
Collaborator Author

Before you get into a fine detail review, @jashapiro can you give this a "big picture" review and see if it fits what you pictured for part 1 of #729 and your comment that that issue is based on: #728 (review)

@cansavvy cansavvy requested a review from jashapiro August 20, 2020 12:51
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I would add something about the fractions of mutations that are coding vs. noncoding, especially in the context of comparisons among tumor types. If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

Adding some plots of the proportion coding (filter/nofilter) in each sample, split by morphology, might be a useful addition. This is perhaps simpler to plot than the slopes.

"no_filter",
"pbta-snv-mutation-tmb-coding.tsv"
)) %>%
# This variable is weird when binding but we don't need it for the plot so we'll just remove it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tell me more?

Copy link
Collaborator Author

@cansavvy cansavvy Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... It was weird when I used this kind of import and bind data.frames in another notebook but now its fine, so I deleted this step.

@cansavvy
Copy link
Collaborator Author

cansavvy commented Sep 1, 2020

If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

I'm all for doing this comparison, but it will require different datasets and different set up than what I have going on in this notebook. So can I propose that if what is here is okay then I can start a new notebook/ new PR that looks into non-coding/coding proportions?

@jashapiro
Copy link
Member

If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

I'm all for doing this comparison, but it will require different datasets and different set up than what I have going on in this notebook. So can I propose that if what is here is okay then I can start a new notebook/ new PR that looks into non-coding/coding proportions?

My language is imprecise here, I don't really mean coding/noncoding. I just mean a violin plot (or similar) of tmb_filter/tmb_no_filter separated by disease. I think you have the data you need for that in this notebook.

@cansavvy
Copy link
Collaborator Author

cansavvy commented Sep 1, 2020

If there were not a strong correlation between coding and (coding + noncoding) as you have shown, I would be very surprised, but if the proportions of coding and noncoding changes were different across different tumor types I would be somewhat less shocked (but still surprised).

I'm all for doing this comparison, but it will require different datasets and different set up than what I have going on in this notebook. So can I propose that if what is here is okay then I can start a new notebook/ new PR that looks into non-coding/coding proportions?

My language is imprecise here, I don't really mean coding/noncoding. I just mean a violin plot (or similar) of tmb_filter/tmb_no_filter separated by disease. I think you have the data you need for that in this notebook.

Whew, okay that is much easier. Will do.

@cansavvy cansavvy requested a review from jashapiro September 1, 2020 19:21
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Nothing too surprising!

@jaclyn-taroni jaclyn-taroni merged commit 0e64389 into AlexsLemonade:master Sep 2, 2020
@cansavvy cansavvy deleted the cansavvy/var_class_investigation branch September 8, 2020 18:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants