Simplify API docs for a minimalist design #24

dhimmel · 2016-09-21T19:49:40Z

Motivation

Modify the API to be more simplistic and to align with the current state of cancer-data.

Implementation Notes

This is a work in progress (WIP). Please comment of the diffs for more information on why I'm proposing certain changes.

awm33 · 2016-09-22T03:35:58Z

doc/api.md

-| entrezid | integer | Primary Key. |
-| systematic_name | string |  |
-| standard_name | string |  |
+| entrez_gene_id | integer | Primary Key. |


This is actually coming from https://pypi.python.org/pypi/django-genes/0.2 do we want to roll our own instead?

Ah didn't realize it. Let's stick with django-genes. Just talked @rzelayafavila -- a programmer in the Greene Lab who helped make django-genes. He said we will also need to install django-organisms for the organism table.

Opened an issue to check for Python 3 compatibility of django-genes.

awm33 · 2016-09-22T03:38:26Z

@dhimmel Why remove the algorithm classifier field and model? I know we're starting with one algorithm, but are we now only going to have one going forward?

dhimmel · 2016-09-22T14:19:08Z

Why remove the algorithm classifier field and model?

Because I'm not sure exactly how we want to store classifier information. Once the machine learning module matures, we'll have a better idea of how to encode different classifier options. I did think your setup of algorithm (string) and algorithm_parameters (JSON) was versatile. However, the machine learning module in cognoma/machine-learning#51 doesn't include a framework for accepting algorithm_parameters.

cognoma/machine-learning#51 does however output some classifier information such as:

{  
   "class": "SGDClassifier",
   "module": "sklearn.linear_model.stochastic_gradient",
   "parameters": {  
      "alpha": 0.1,
      "average": false,
      "class_weight": "balanced",
      "epsilon": 0.1,
      "eta0": 0.0,
      "fit_intercept": true,
      "l1_ratio": 0.0,
      "learning_rate": "optimal",
      "loss": "log",
      "n_iter": 5,
      "n_jobs": 1,
      "penalty": "elasticnet",
      "power_t": 0.5,
      "random_state": 0,
      "shuffle": true,
      "verbose": 0,
      "warm_start": false
   }
}

@awm33 given these considerations, do you think it makes sense to hold out on making algorithm fields until we know exactly what fields we need and have a way for the machine learning module to accept the input?

awm33 · 2016-09-24T18:44:36Z

doc/api.md

-| mutation_type | string |  |
-| gene | object | Gene associated with this mutation. |
+| sample_id | string | Foreign Key referencing samples |
+| entrez_gene_id | object | Foreign Key referencing a mutated gene for the sample |


Do we still want to keep the gene as an optional join? We could also wait on the frontend for this. What that would mean is that you wouldn't have to pull all the joins and lookup the gene id to do things like display the name of the gene

Do we still want to keep the gene as an optional join?

Are you asking whether we should add additional columns here for gene_symbol and gene_description?

awm33 · 2016-09-24T18:46:02Z

doc/api.md


 | Field        | Type           | Description |
 | ------------- |:-------------:| ----------:|
 | taxonomy_id | integer | Primary Key. Taxonomy ID assigned by NCBI. |
 | common_name | string | Organism common name, e.g. 'Human' |
 | scientific_name | string | Organism scientific/binomial name, e.g. 'Homo sapiens' |

-### Sample Mutation Summary (embedded in Gene)
+### Disease Types (/diseases)


Is this disease or disease type? Disease is used in the sample model, but I think we should pick one "disease" or "disease type" and stick with it

We should use disease rather than disease type. It's shorter and is what's used in samples.tsv

With this setup "disease" is in the long format (e.g. Glioblastoma Multiforme) - alternatively, we could also use "acronym" (e.g. GBM).

I created an issue to discuss acronyms: cognoma/cancer-data#26 (comment). My worry is that using acronyms as primary identifiers is dangerous. For example, if a Xena disease name changes or a new disease gets added, things will break.

awm33 · 2016-09-24T18:50:20Z

@dhimmel I can see removing the parameters, but don't understand why removing the parameters unless we are only using one algorithm going forward.

Would the example JSON above be the classifier output for the JSON results object? Or potentially a subobject along with other result objects?

See https://git.io/vipuP

dhimmel · 2016-09-26T14:38:30Z

I can see removing the parameters, but don't understand why removing the parameters unless we are only using one algorithm going forward.

Basically we don't yet know how to specify an algorithm via parameters yet. So should we keep a field that we're not going to use initially or just add it later. Up to you.

Would the example JSON above be the classifier output for the JSON results object?

The above JSON is a subobject of classifiers.results. See here for an example of the entire classifiers.results object.

dhimmel · 2016-09-29T17:42:46Z

Pinging @awm33. Would love to get this PR merged.

awm33 · 2016-09-30T03:11:11Z

@dhimmel I think I worded my response to the algorithm fields weirdly. I meant removing algorithm_parameters but keeping algorithm in order to denote what algorithm was used. I suppose we could backfill it once we support more than one algorithm. I assume they would all be "SGDClassifier" until that happens.

If you think the backfill approoch works, than that's fine to remove. Let's just make sure we don't add an algorithm and not note which one was used.

I'm not assigned as the review, but approved 👍

dhimmel · 2016-09-30T14:28:20Z

Okay I think the backfill plan makes sense, especially since I'm not sure SGDClassifier is the best name.

Will merge.

Simplify API docs for a minimalist design

855b565

awm33 reviewed Sep 22, 2016

View reviewed changes

Revert genes and organisms tables

234a595

dhimmel force-pushed the mvp-api branch from f2002b5 to 234a595 Compare September 22, 2016 14:45

awm33 reviewed Sep 24, 2016

View reviewed changes

Use "disease" rather than "disease type"

471e928

See https://git.io/vipuP

dhimmel merged commit c320aac into cognoma:master Sep 30, 2016

dhimmel deleted the mvp-api branch September 30, 2016 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify API docs for a minimalist design #24

Simplify API docs for a minimalist design #24

dhimmel commented Sep 21, 2016

awm33 Sep 22, 2016

dhimmel Sep 22, 2016

dhimmel Sep 22, 2016

awm33 commented Sep 22, 2016

dhimmel commented Sep 22, 2016

awm33 Sep 24, 2016

dhimmel Sep 26, 2016

awm33 Sep 24, 2016

dhimmel Sep 26, 2016

gwaybio Sep 26, 2016

dhimmel Sep 26, 2016 •

edited

Loading

awm33 commented Sep 24, 2016

dhimmel commented Sep 26, 2016

dhimmel commented Sep 29, 2016

awm33 commented Sep 30, 2016

dhimmel commented Sep 30, 2016

Simplify API docs for a minimalist design #24

Simplify API docs for a minimalist design #24

Conversation

dhimmel commented Sep 21, 2016

Motivation

Implementation Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awm33 commented Sep 22, 2016

dhimmel commented Sep 22, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel Sep 26, 2016 • edited Loading

Choose a reason for hiding this comment

awm33 commented Sep 24, 2016

dhimmel commented Sep 26, 2016

dhimmel commented Sep 29, 2016

awm33 commented Sep 30, 2016

dhimmel commented Sep 30, 2016

dhimmel Sep 26, 2016 •

edited

Loading