Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derived dataset for each sample: fraction of genome altered by 1: copy number change 2. number of mutations #1

Open
jingchunzhu opened this issue May 5, 2017 · 4 comments

Comments

@jingchunzhu
Copy link
Collaborator

jingchunzhu commented May 5, 2017

Build derived datasets:
for each sample: fraction of genome altered by copy number change
for each sample: number of mutations

Hi,

Would you guys be able to create track for fraction of genome altered by 1: copy number change and 2: number of mutations for each TCGA cohort or for the pan cancer? It used to be available in cBioPortal. The number of mutations per sample is still available but fraction of the genome altered by copy number is no longer available. Someone from MSKCC is working on getting that live again. Or is there a way to generate this data from downloading it form Xena and calculating it myself?

Thanks,

@jingchunzhu
Copy link
Collaborator Author

  1. Total mutation count (mutation burden): It is only important to know how many mutations are present. The specific mutations are not important.

  2. Fraction of genome altered by copy number (0-1): cBioPortal has calculated it as follows: The fraction of copy number altered genome = length of segments with log2 CNA value larger than 0.2 divided by the length of all segments measured. This is basically a measurement of genomic instability.

question: is there any background on the cutoff of 0.2 ?

@jingchunzhu
Copy link
Collaborator Author

in gbm, classify PTEN using 0.2, there is 84% samples with PTEN deletion. Is this about right?

http://dev.xenabrowser.net/heatmap/?bookmark=a05f9847421717d27d5e6fa60a67e79b

http://dev.xenabrowser.net/heatmap/?bookmark=723e4cd313b2380869a255f5dde62171

@jingchunzhu
Copy link
Collaborator Author

“In a diploid genome, a single-copy gain in a perfectly pure, homogeneous sample has a copy ratio of 3/2. In log2 scale, this is log2(3/2) = 0.585, and a single-copy loss is log2(1/2) = -1.0.” However, most tumors are heterogeneous (clonal tumor populations) and have some normal stroma. Therefore, the sample’s purity and heterogeneity need to be considered so alterations are not missed, meaning a lower threshold. I have also seen a lot of cancer focused publications using 0.2 as a threshold. I am guessing 0.2 is used of these reasons.

The frequency of a PTEN deletion (one or both alleles) in GBM is 89% (514/577).

@ucscXena ucscXena deleted a comment from souravsingh Mar 10, 2018
@ucscXena ucscXena deleted a comment from souravsingh Mar 10, 2018
@duxiuju
Copy link

duxiuju commented Sep 15, 2018

Dear jingchunzhu,
I would like to ask that 'log2 CNA value larger than 0.2' just represents the value larger than +0.2 or the absolute value larger than 0.2? Because if it only represents the value larger than +0.2, the copy numcer loss is neglected,right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants