Marginal gain of gene expression data over covariates #67

joshlevy89 · 2016-11-01T20:41:55Z

Compares a covariates only model to a covariates with gene expression model in predicting TP53 mutations.

…xpression model.

dhimmel · 2016-11-01T20:46:20Z

AWESOME! Just two things before I review.

Can you remove the space in explore/marginal-gain/1. Marginal-Gain-TP53-Example.ipynb?

Then run

cd explore/marginal-gain
jupyter nbconvert --to script 1.Marginal-Gain-TP53-Example.ipynb

and track the exported .py file. We use the py file for GitHub diffing and line comments.

… comparison to multiple mutations.

joshlevy89 · 2016-11-02T04:04:48Z

I added a notebook (and script) that extends the comparison of covariates only to covariates+expression models to multiple mutations. If you look at the bar plot at the bottom, you'll see that the training auroc is always better for the covariates+expression model (positive values) but that the testing auroc drops off dramatically and is lower in every case other than TP53. This is probably a case of overtraining. Thoughts?

dhimmel

Cool, I think there are some weird things going on that make these results hard to interpret. Hopefully, addressing my comments will clear things up!

dhimmel · 2016-11-04T14:27:29Z

explore/marginal-gain/2.Marginal-Gain-Multiple-Mutations.py

+
+# In[46]:
+
+get_ipython().run_cell_magic('time', '', '# Train model a: covariates only.\nwarnings.filterwarnings("ignore") # ignore deprecation warning for grid_scores_\nrows = list()\nfor m in list(mutations):\n    series = pd.Series()\n    series[\'mutation\'] = m\n    series[\'symbol\'] = mutations[m]\n    rows.append(get_aurocs(X[\'a\'], Y[m], cv_pipeline[\'a\'], series))\nauroc_df[\'a\'] = pd.DataFrame(rows)\nauroc_df[\'a\'].sort_values([\'symbol\', \'testing_auroc\'], ascending=[True, False], inplace=True)')


How about more descriptive keys for auroc_df than a, b, and c? Also maybe rename auroc_df to auroc_dfs to make clear that it itself is not a dataframe.

yes. a and b are not terribly descriptive but they actually stand for models a and b, which I have consistently used. I changed c though b/c that is totally not description to diff_ba (difference between model b and a)

Do what you think is best. My opinion is that it's almost always better to be explicit and save the reader from having to remember a "labeling dictionary" of sorts.

Will change to model a and model b

dhimmel · 2016-11-04T14:31:45Z

explore/marginal-gain/2.Marginal-Gain-Multiple-Mutations.py

+# In[68]:
+
+plot_df = pd.melt(auroc_df['c'], id_vars='symbol', value_vars=['mean_cv_auroc', 'training_auroc', 'testing_auroc'], var_name='kind', value_name='auroc')
+grid = sns.factorplot(y='symbol', x='auroc', hue='kind', data=plot_df, kind="bar")


Can you change the x-axis label from auroc to Δ AUROC, so it's clear that it's the change in AUROC?

dhimmel · 2016-11-04T14:35:27Z

explore/marginal-gain/1.Marginal-Gain-TP53-Example.py

+cv_pipeline = {}
+for k in ['a','b']:
+    if k == 'a': param_grid['select__k'] = ['all']
+    elif k=='b': param_grid['select__k'] = [2000]


I think you're likely selecting out most of the covariates because they have a low median absolute deviation. This is probably why the "covariate + expression" models are worse than the "covaraite only" models.

FeatureUnion may be the correct way to deal with this.

Ah, good catch.

I implemented it two ways. 1) Using FeatureUnion. This seems to work but is a bit clunky. 2) Using DataFrameMapper. Cleaner syntax and I think it works, but getting a warning. Let me know which approach you prefer.

Cool... What's the warning for DataFrameMapper -- I didn't see a warning in your notebook. I agree DataFrameMapper is cleaner. Do you think the added cleanliness is worth the extra dependency?

Can you modify 2.Marginal-Gain-Multiple-Mutations.ipynb to use your preferred method?

I will go with FeatureUnion. The error from DataFrameMapper (you did not see the warning because ran other method) was that the estimator "Estimator %s modifies parameters in init." from the sklearn base library (http://contrib.scikit-learn.org/imbalanced-learn/_modules/sklearn/base.html). Could never figure that one out, but also the DataFrameMapper lib was finicky (e.g. the mapper needed to be at top of pipeline). Overall, I think it is not worth the extra dependency, as you suggested. I modified the FeatureUnion method to use lambda method in FunctionTrasformer which makes it a bit more concise).

dhimmel · 2016-11-04T14:40:15Z

explore/marginal-gain/1.Marginal-Gain-TP53-Example.py

+    'classify__loss': ['log'],
+    'classify__penalty': ['elasticnet'],
+    'classify__alpha': [10 ** x for x in range(-3, 1)],
+    'classify__l1_ratio': [0, 0.2, 0.8, 1],


Let's stick to a single l1_ratio but try more alpha values (see #56):

'classify__alpha': 10.0 ** np.linspace(-3, 1, 10), 'classify__l1_ratio': [0.15],

The above values are from logistic_regression.py#L19-L20.

Sounds good.

dhimmel · 2016-11-04T14:43:50Z

explore/marginal-gain/1.Marginal-Gain-TP53-Example.py

+# In[263]:
+
+# What are the top weighted features for model a and model b?
+display(coef_df['b'].head(5))


Can we add n_positives and n_negatives to these dataframes? Would help me diagnose the variability in performance between CV, training, & testing.

For which dataframe? coef_df? I think I'm misunderstanding b/c seems like that wouldn't help--the weights already tell you that. Maybe you mean for auroc_df or something?

My mistake... yeah I'm referring to auroc_df in the second notebook.

…ates+expression).

joshlevy89

I have modified the code but still need to run it. Once I have done so, I will repost with changes.

dhimmel

Awesome work, especially dealing with the difficulties of parrallel feature processing -- it looks like the sklearn design didn't fully anticipate this crucial use case.

This PR is close to ready... now time to apply your solutions to the second notebook.

dhimmel · 2016-11-05T23:15:51Z

explore/marginal-gain/1.Marginal-Gain-TP53-Example.py

+# In[263]:
+
+# What are the top weighted features for model a and model b?
+display(coef_df['b'].head(5))


My mistake... yeah I'm referring to auroc_df in the second notebook.

dhimmel · 2016-11-05T23:31:53Z

explore/marginal-gain/1.Marginal-Gain-TP53-Example.py

+cv_pipeline = {}
+for k in ['a','b']:
+    if k == 'a': param_grid['select__k'] = ['all']
+    elif k=='b': param_grid['select__k'] = [2000]


Cool... What's the warning for DataFrameMapper -- I didn't see a warning in your notebook. I agree DataFrameMapper is cleaner. Do you think the added cleanliness is worth the extra dependency?

Can you modify 2.Marginal-Gain-Multiple-Mutations.ipynb to use your preferred method?

dhimmel · 2016-11-05T23:35:47Z

explore/marginal-gain/1.Marginal-Gain-TP53-Example.py

+param_grid = {
+    'classify__loss': ['log'],
+    'classify__penalty': ['elasticnet'],
+    'classify__alpha': [10 ** x for x in range(-3, 1)],


Looks like your notebook only has 1 value here. Reminder to reset when you're ready for a final analysis. I think it's also fine to consolidate to a single notebook later (probably the current 2.).

Will do. Just for testing.

dhimmel · 2016-11-05T23:37:26Z

explore/marginal-gain/2.Marginal-Gain-Multiple-Mutations.py

+
+# In[46]:
+
+get_ipython().run_cell_magic('time', '', '# Train model a: covariates only.\nwarnings.filterwarnings("ignore") # ignore deprecation warning for grid_scores_\nrows = list()\nfor m in list(mutations):\n    series = pd.Series()\n    series[\'mutation\'] = m\n    series[\'symbol\'] = mutations[m]\n    rows.append(get_aurocs(X[\'a\'], Y[m], cv_pipeline[\'a\'], series))\nauroc_df[\'a\'] = pd.DataFrame(rows)\nauroc_df[\'a\'].sort_values([\'symbol\', \'testing_auroc\'], ascending=[True, False], inplace=True)')


Do what you think is best. My opinion is that it's almost always better to be explicit and save the reader from having to remember a "labeling dictionary" of sorts.

dhimmel · 2016-11-06T14:44:45Z

Is it possible to only apply the Imputer to the covariate features? The gene expression doesn't have missing data, so it may speed things up.

joshlevy89 · 2016-11-06T16:33:28Z

It should be possible to apply Imputer to only covariates. But for some reason this is not working with FeatureUnion. It's as if FunctionTransformer were were working for SelectKBest but not Imputer. I will run the code through and then maybe you can have a look at restructuring the code to include this change? Could be I'm missing something.

…k and updating pipeline for feature combination.

joshlevy89 · 2016-11-06T19:07:03Z

I ran the code on a restricted dataset (1000 samples) for all the mutations. The results are in the 1.Marginal-Gain-Multiple-Mutations notebook. As you can see, the testing cv's are still lower for the combined model for a number of mutations.

To diagnose, I added n_positives and n_negatives as you asked. I also added some statistics on the ranks of the weights for covariates wrt ranks of all features. Cumulative, median and mean ranks are presented--cumulative rank means the total index positions of covariate features in the sorted coef_df dataframe.

I can run this on the whole dataset, but I think I will hit the same issue.

(Also, note that to make this run, I have set the 1st value of the y_test and y_train to 1 just to ensure at least one positive mutation example. Not sure--could be affecting the results a bit.).

joshlevy89 · 2016-11-06T19:09:28Z

After writing that, I realized that I did not change the alpha parameter back to test all values. Let me do that and get back to you. Probably a regularization issue.

joshlevy89 · 2016-11-06T21:30:45Z

OK, I re-ran it with all hyperparamaters and data. Still, the testing is only substantially better for TP53. Strangely, the training auroc is lower for some mutations--I don't even know how that's possible. Hopefully the updates will make it easier to diagnose.

dhimmel

I made a few additional comments. Additionally, it looks like you may have run a git add --all that added some files we don't want to track.

Specifically some .DS_Store files crept in which should be removed.

Also 1.Marginal-Gain-Multiple-Mutations.py appears to have been uploaded in two directories.

dhimmel · 2016-11-07T19:26:08Z

1.Marginal-Gain-Multiple-Mutations.py

+
+# In[14]:
+
+auroc_dfs['model a'].to_pickle('./covariates_only.pkl')


Any reason for the switch to pickles over TSV here -- TSV is preferable so its readable outside of Python.

Consider combining both tables into a single dataframe for storage by adding a model column and then concatenating.

dhimmel · 2016-11-07T19:29:29Z

explore/marginal-gain/1.Marginal-Gain-Multiple-Mutations.py

+    series['mean_cv_auroc'] = cv_score_df.score.max()
+    series['training_auroc'] = roc_auc_score(y_train, y_pred_train)
+    series['testing_auroc'] = roc_auc_score(y_test, y_pred_test)
+    series['n_pos_coeffs'] = n_pos


The new columns n_pos_coeffs and n_neg_coeffs are really useful. They're not actually what I meant by n_positives and n_negatives -- but let's definitely keep them.

By n_positives I meant number of samples with the mutation (sum(y == 1)).

Gotcha. Updated. As you might expect, the combined model shows the clearest advantage where we have a lot of positive examples to work from (TP53).

dhimmel · 2016-11-07T19:33:18Z

explore/marginal-gain/1.Marginal-Gain-Multiple-Mutations.py

+    Fit the classifier for the given mutation (y) and output predictions for it
+    """
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
+    y_train[0] = 1


(Also, note that to make this run, I have set the 1st value of the y_test and y_train to 1 just to ensure at least one positive mutation example. Not sure--could be affecting the results a bit.).

@joshlevy89, I'm concerned about this workaround. I think specifying the stratify=y in train_test_split should resolve this problem. If not, I don't think we want to make a workaround -- the failure shouldn't be masked.

Don't worry, that was just a workaround during debugging--I took it out when I ran the script (I think you are looking at an old notebook here.) In any case, I like your method better.

dhimmel · 2016-11-07T19:36:27Z

explore/marginal-gain/1.Marginal-Gain-Multiple-Mutations.py

+    """
+    Fit the classifier for the given mutation (y) and output predictions for it
+    """
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)


Let's increase testing to 25% (test_size=0.25). Our testing AUROCs are potentially super unstable due to the small number of positives.

dhimmel · 2016-11-07T19:47:27Z

explore/marginal-gain/1.Marginal-Gain-Multiple-Mutations.py

+covariates_pipeline = Pipeline(steps=[
+    ('imputer', Imputer()),
+    ('standardize', StandardScaler()),
+    ('select', SelectKBest(fs_mad,'all')),


Let's just delete this entire step if its not actually doing any selection.

I assume you just mean the ('select', SelectKBest(fs_mad,'all')) step. I will take it out (and modify how the coef_df dataframe is built to handle lack of 'select' step).

joshlevy89 · 2017-01-15T15:36:00Z

@dhimmel I have updated this pull request with the requested changes, and added some other stuff as well. We are no longer experiencing fitting failures/turbulence when working with the expression data, probably due to adding the feature selection/pca. 😃

A few notes on what I have done here:

There are three notebooks: marginal-gain-Ret-mad.ipynb, marginal-gain-multiple-genes-mad.ipynb, marginal-gain-multiple-genes-pca.ipynb
In all three notebooks, I run fitting and visualization for three data sets:
- covariates only
- expression only
- combined (covariates + expression)
In marginal-gain-Ret-mad.ipynb, I use these datasets to predict Ret mutations, as an example, with parameters used in Branka's PR Classifier results for different genes #52.
- Doing this, I get identical training/testing aurocs for the expression only data, thus verifying that the pipeline is working as expected.
- Note that feature selection with fs_mad is used here.
- Notably, the combined dataset does not substantially improve prediction over covariates only.
In marginal-gain-multiple-genes-mad.ipynb, I do this same thing for all mutations in Branka's PR Classifier results for different genes #52, except that l1_ratio is set to 0 for expediency.
- Note that this dataset is larger than what I was using before: it contains all the mutations found in Evaluate performance of covariates at predicting various mutations #47, except for CD274, and many others as well.
- As before, the combined dataset does not substantially improve prediction over covariates only.
In marginal-gain-multiple-genes-pca.ipynb, I drop fs_mad selection in favor of PCA (the prior notebooks should nevertheless be kept, in my opinion, because they offers a comparison to Classifier results for different genes #52).
- I chose to follow hctai's approach in Improved the performance of the SGD classifier on sparse mutations by reducing the noise #71 for picking components. In that PR, he mentions that the optimal number of components ranges from 30-100, so I split the difference and chose 65. Down the line, if we decide to incorporate his suggestion to add n_components as a hyperparameter, we can update this notebook as well.
- Using pca, there is a tiny (perhaps spurious) increase in performance for the combined data over the covariates only. Perhaps improved pca methodology or hyperparameter tuning would improve that.
- It seems to me that there is a relationship between the marginal gain in the mean_cv_auroc/testing_auroc (but not training_auroc) and the number of mutations for a given gene (shown at the bottom).

I think that this PR is now ready to be merged.

dhimmel · 2017-01-18T17:03:40Z

Hey @joshlevy89 -- thanks for this huge PR!

Wanted to let you know it's on my radar, but I may not get to it for a little bit.

The next step will be understanding why expression does not improve upon the covariate models -- whether it's fundamentally not useful or because we need to change our modeling.

joshlevy89 · 2017-01-18T17:10:36Z

@dhimmel No problem. I agree that is the next step. Indeed, it seems like the current approaches do not improve upon covariates.

dhimmel

Made some small suggestions to help us narrow in on the situation. Apply to all 3 analyses.

Also fine to proceed ahead with only 2 of 3 notebooks and delete the third -- the PCA one is most important I think. Up to you.

dhimmel · 2017-01-16T19:27:19Z

explore/marginal-gain/marginal-gain-multiple-genes-pca.py

+    '2353':'FOS',#Transcription factors
+}
+
+mutations = {**genes_LungCancer, **genes_TumorSuppressors, **genes_Oncogenes}


Really cool way to combine dicts, didn't realize you could do this!

dhimmel · 2017-01-18T17:13:00Z

explore/marginal-gain/marginal-gain-multiple-genes-pca.py

+
+# In[43]:
+
+get_ipython().run_cell_magic('time', '', "path = os.path.join('..', '..', 'download', 'covariates.tsv')\ncovariates = pd.read_table(path, index_col=0)")


Let's restrict to only the following covariates:

n_mutations_log1p

anything starting with acronym_

We've seen these are the most important. This way we can remove imputation, which won't currently work for categorical variables that have been dummied anyway.

dhimmel · 2017-01-18T17:13:26Z

explore/marginal-gain/marginal-gain-multiple-genes-pca.py

+
+# In[43]:
+
+get_ipython().run_cell_magic('time', '', "path = os.path.join('..', '..', 'download', 'covariates.tsv')\ncovariates = pd.read_table(path, index_col=0)")


We can remove %%time statements from every cell besides .fit, given how annoying it makes the diffing.

dhimmel · 2017-01-18T17:14:39Z

explore/marginal-gain/marginal-gain-multiple-genes-pca.py

+
+# In[43]:
+
+get_ipython().run_cell_magic('time', '', "path = os.path.join('..', '..', 'download', 'covariates.tsv')\ncovariates = pd.read_table(path, index_col=0)")


Can you rerun the download notebook, so we're working with v6 of the dataset?

joshlevy89 · 2017-01-22T15:35:08Z

@dhimmel applied changes and re-ran. I deleted marginal-gain-Ret-mad.ipynb example notebook, as it was just to validate that everything was working correctly. I only modified marginal-gain-multiple-genes-pca.ipynb as more changes may be on the way, and we may want to delete marginal-gain-multiple-genes-mad.ipynb later anyway.

dhimmel · 2017-01-29T21:35:48Z

@joshlevy89 great work.

I think you've shown that the prediction of many mutations is not benefiting from expression data, once covariates are accounted for. However, some mutations still are such as AKT1, TP53, RB1, NF2, and HRAS. This is really good to know -- our approach may only apply to certain mutations.

I also think your last plots are really informative -- gene expression seems most likely to contribute when the number of positives is higher. So this leads me to believe that our model lacks the necessary power to detect gene expression signatures when the sample size is low, even when using PCA.

I think it makes sense to refocus our ML team to look more into fitting models with both covariates and expression. Thanks for completing this gigantic analysis with lots of back and forth!

Up to you whether you want to delete marginal-gain-multiple-genes-mad.ipynb or update it. After that I think it'll be time to merge.

cgreene · 2017-01-31T20:57:20Z

One thing that I noticed - many of the genes where the expression seems to add information beyond the covariates seem to be cell cycle genes. The various Ras genes + NF1 where @gwaygenomics has worked a fair amount seem to be relatively high performing. P53, but this is not surprising as it is known to induce a big effect in the transcriptome. RB1, PTEN, AKT are sort of in the cell cycle hit list. This list would be enriched for cell cycle genes anyway as a cancer gene list, but these are the things that stuck out at me.

gwaybio · 2017-01-31T21:27:09Z

i agree - this PR is monumental! nice work @joshlevy89

It is a bit jarring to see how well covariates perform alone in many circumstances. Can you remind me what the covariates are? Dummy variable disease type and log(mutation count)? It may be good to show a covariates.columns if space permits in the jupyter notebook.

Also, it could be good to eventually show what the coefficients are for some of the models - at the very least to double check that all the signal the gene expression adds is not washed out by giant covariate weights. It would also be interesting to see the weights of the disease type for each model. I suspect that for some gene mutations, especially those that are heavily skewed across cancer types, (e.g. NF1 - 11% GBM (17/149) but 0.6% in THCA (3/485)) the covariate model alone will just pick out disease types rather than individual samples. Another way to test this would be to build models using only a single disease type and then make the comparisons suggested in this PR.

Perhaps these suggestions belong in a separate PR? I do not want to stall merging!

dhimmel · 2017-01-31T22:31:40Z

Perhaps these suggestions belong in a separate PR?

Let's do a separate PR. I think @joshlevy89 was in the process of moving. However, @gwaygenomics raises some great questions. @joshlevy89 you are free to continue tackling them after we merge this PR. But let's save that for a future PR.

joshlevy89 · 2017-02-13T03:54:45Z

@dhimmel I applied the changes to the marginal-gain-multiple-genes-mad.ipynb notebook and re-ran. It will be good to keep this as a reference, especially to Branka's PR #52 .

Should be good to merge now. Thanks for the help on getting this out! I think it raises some interesting issues as pointed out by @cgreene and @gwaygenomics, and it will be interesting to pursue this further.

dhimmel · 2017-02-13T15:36:43Z

Great work @joshlevy89! Will merge now.

joshlevy89 added 3 commits October 25, 2016 21:16

Work on TP53 mutation prediction from metadata

383fb0e

Work on comparison of covariates only model to covariates with gene e…

66ce66a

…xpression model.

Fixed output display for a cell block.

ab39b4d

joshlevy89 changed the title ~~WIP Marginal gain of gene expression data over covariates~~ WIP Marginal gain of gene expression data over covariates only Nov 1, 2016

joshlevy89 changed the title ~~WIP Marginal gain of gene expression data over covariates only~~ WIP Marginal gain of gene expression data over covariates only model Nov 1, 2016

joshlevy89 changed the title ~~WIP Marginal gain of gene expression data over covariates only model~~ WIP Marginal gain of gene expression data over covariates Nov 1, 2016

joshlevy89 added 3 commits November 1, 2016 16:55

Removed space in file name

1b46c9e

Converted notebook to script.

058075e

Work on extending covariates only to covariates with expression model…

70c0638

… comparison to multiple mutations.

dhimmel requested changes Nov 4, 2016

View reviewed changes

joshlevy89 added 3 commits November 4, 2016 11:47

Changed alpha/il_ratio params for grid search.

a649bb2

Created different pipelines for covariates and combined model (covari…

893ed83

…ates+expression).

Added alternative pipeline for combined model using DataFrameMapper.

927d3a2

joshlevy89 commented Nov 5, 2016

View reviewed changes

Work on fixing pipeline and naming conventions.

60b099b

dhimmel reviewed Nov 5, 2016

View reviewed changes

Work on consolidating example notebook into multiple mutation noteboo…

04575d3

…k and updating pipeline for feature combination.

joshlevy89 added 2 commits November 6, 2016 16:25

Ran comparison with complete model and hyperparamters.

a25e75d

Re-converted to script.

2ea19f0

Re-saved auroc dfs

214f83a

dhimmel reviewed Nov 7, 2016

View reviewed changes

joshlevy89 added 2 commits November 8, 2016 14:08

.DS_Store banished!

c4a1979

Removed duplicate notebook

cc3d9f5

joshlevy89 added 9 commits January 14, 2017 18:47

Ran all updated notebooks.

2ec415b

Ran ret example.

5fa96ad

Scaled before pca and re-ran

ef02525

Added variance explained calculation.

bb55140

Fixed heatmap on ret example.

f7c776a

Converted notebooks to scripts

1a8282c

Update master from simplified branch.

fc7d35f

Fixed plot labels

f5e754a

Converted nbs to scripts.

823a527

dhimmel reviewed Jan 18, 2017

View reviewed changes

joshlevy89 added 3 commits January 22, 2017 09:59

Applied PR changes.

9cc705d

Re-ran download and script.

464a88b

Converted pca notebook to script

d5ef04f

Applied pr changes to mad nb

0b7addf

dhimmel approved these changes Feb 13, 2017

View reviewed changes

dhimmel changed the title ~~WIP Marginal gain of gene expression data over covariates~~ Marginal gain of gene expression data over covariates Feb 13, 2017

dhimmel merged commit 4e02964 into cognoma:master Feb 13, 2017

patrick-miller mentioned this pull request May 20, 2017

Add covariates-only model for comparison in the main notebook #93

Merged

dhimmel mentioned this pull request Jun 6, 2017

PCA (and other pre-processing steps) on just the expression matrix in CV pipeline #96

Closed

rdvelazquez mentioned this pull request Oct 14, 2017

TP53 mutation prediction from metadata #66

Closed


		# In[46]:

		get_ipython().run_cell_magic('time', '', '# Train model a: covariates only.\nwarnings.filterwarnings("ignore") # ignore deprecation warning for grid_scores_\nrows = list()\nfor m in list(mutations):\n series = pd.Series()\n series[\'mutation\'] = m\n series[\'symbol\'] = mutations[m]\n rows.append(get_aurocs(X[\'a\'], Y[m], cv_pipeline[\'a\'], series))\nauroc_df[\'a\'] = pd.DataFrame(rows)\nauroc_df[\'a\'].sort_values([\'symbol\', \'testing_auroc\'], ascending=[True, False], inplace=True)')


		# In[14]:

		auroc_dfs['model a'].to_pickle('./covariates_only.pkl')


		# In[43]:

		get_ipython().run_cell_magic('time', '', "path = os.path.join('..', '..', 'download', 'covariates.tsv')\ncovariates = pd.read_table(path, index_col=0)")

Marginal gain of gene expression data over covariates #67

Marginal gain of gene expression data over covariates #67

Conversation

joshlevy89 commented Nov 1, 2016

dhimmel commented Nov 1, 2016

joshlevy89 commented Nov 2, 2016

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel Nov 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshlevy89 left a comment

Choose a reason for hiding this comment

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel commented Nov 6, 2016 • edited Loading

joshlevy89 commented Nov 6, 2016

joshlevy89 commented Nov 6, 2016

joshlevy89 commented Nov 6, 2016

joshlevy89 commented Nov 6, 2016

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshlevy89 commented Jan 15, 2017

dhimmel commented Jan 18, 2017

joshlevy89 commented Jan 18, 2017

dhimmel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshlevy89 commented Jan 22, 2017

dhimmel commented Jan 29, 2017

cgreene commented Jan 31, 2017

gwaybio commented Jan 31, 2017 • edited Loading

dhimmel commented Jan 31, 2017 • edited Loading

joshlevy89 commented Feb 13, 2017

dhimmel commented Feb 13, 2017

dhimmel Nov 4, 2016 •

edited

Loading

dhimmel commented Nov 6, 2016 •

edited

Loading

gwaybio commented Jan 31, 2017 •

edited

Loading

dhimmel commented Jan 31, 2017 •

edited

Loading