Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate SQL files used to create CH Tables #10867

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

haynescd
Copy link
Collaborator

No description provided.

Copy link

Copy link
Contributor

@sheridancbio sheridancbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and just in time (I will need to create these views soon on the new version of the genie database, so I can run this).

I point out a couple of concerns, but don't think they need to hold up this work.

concat(cs.cancer_study_identifier, '_', sample.stable_id) AS sample_unique_id,
genetic_alteration_type AS alteration_type,
-- If a mutation is found in a gene that is not in a gene panel we assume Whole Exome Sequencing WES
ifnull(gene_panel.stable_id, 'WES') AS gene_panel_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently I looked into this during the fixing of the 4 non-panel but profiled genie samples. According to the docs, the value 'NA' is specified to be used "When the sample is not profiled on a gene panel, or if the sample is not profiled at all"
So I'm not sure whether NA is supposed to be used for whole exome sequencing or not ... but it sounds like maybe. But then I guess the 'NA' value would pass through in this logic. I do wonder whether having a mixture of 'WES' and 'NA' would be confusing in the downstream logic. Perhaps our docs should be changed to reserve 'NA' only for cases where a sample was not profiled and to reserve 'WES' or something similar for whole exome sequencing.
https://docs.cbioportal.org/file-formats/#values

This comment applies to several tables below as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this isn't finalized and would be interested jumping on a call and discussing this in the future. To figure out what the best approach is

'WES' AS gene_panel_id,
gene.hugo_gene_symbol AS gene
FROM gene
WHERE gene.entrez_gene_id > 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of our databases use negative entrez_gene_id values (mainly for microRNAmature/precursor combinations related to expression profiling). I'm not sure if that matters here. Maybe a note should be added about the possible future need to support negative values? Or maybe those are intentionally being excluded here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants