Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutation prediction outstanding issues/updates #53

Open
3 of 7 tasks
jjc2718 opened this issue Sep 8, 2022 · 0 comments
Open
3 of 7 tasks

Mutation prediction outstanding issues/updates #53

jjc2718 opened this issue Sep 8, 2022 · 0 comments

Comments

@jjc2718
Copy link
Member

jjc2718 commented Sep 8, 2022

  • use_pancancer should be false for all_other_cancers in process_data_for_gene() (i.e. there shouldn't be unused dummy/one-hot variables)
  • Label filtering should happen after we take the intersection of samples between gene expression and mutation (this will make the proportions in 08_cell_line_prediction/download_data.ipynb match what we actually see when the scripts run)
  • tcga_utilities should probably be renamed to something more general, or split
  • CNV data for cell lines, in ccle_data_model _generate_labels()
  • remove unknown/non-cancerous samples in load_sample_info()
  • maybe try sklearn LogisticRegression with elastic net penalty rather than SGDClassifier
  • save label proportions to plot AUPR baseline: CCLE drug response classification, stratified by cancer type #56 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant