-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate SQL files used to create CH Tables #10867
Consolidate SQL files used to create CH Tables #10867
Conversation
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and just in time (I will need to create these views soon on the new version of the genie database, so I can run this).
I point out a couple of concerns, but don't think they need to hold up this work.
concat(cs.cancer_study_identifier, '_', sample.stable_id) AS sample_unique_id, | ||
genetic_alteration_type AS alteration_type, | ||
-- If a mutation is found in a gene that is not in a gene panel we assume Whole Exome Sequencing WES | ||
ifnull(gene_panel.stable_id, 'WES') AS gene_panel_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently I looked into this during the fixing of the 4 non-panel but profiled genie samples. According to the docs, the value 'NA' is specified to be used "When the sample is not profiled on a gene panel, or if the sample is not profiled at all"
So I'm not sure whether NA is supposed to be used for whole exome sequencing or not ... but it sounds like maybe. But then I guess the 'NA' value would pass through in this logic. I do wonder whether having a mixture of 'WES' and 'NA' would be confusing in the downstream logic. Perhaps our docs should be changed to reserve 'NA' only for cases where a sample was not profiled and to reserve 'WES' or something similar for whole exome sequencing.
https://docs.cbioportal.org/file-formats/#values
This comment applies to several tables below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this isn't finalized and would be interested jumping on a call and discussing this in the future. To figure out what the best approach is
'WES' AS gene_panel_id, | ||
gene.hugo_gene_symbol AS gene | ||
FROM gene | ||
WHERE gene.entrez_gene_id > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of our databases use negative entrez_gene_id values (mainly for microRNAmature/precursor combinations related to expression profiling). I'm not sure if that matters here. Maybe a note should be added about the possible future need to support negative values? Or maybe those are intentionally being excluded here.
No description provided.