-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify API docs for a minimalist design #24
Conversation
| entrezid | integer | Primary Key. | | ||
| systematic_name | string | | | ||
| standard_name | string | | | ||
| entrez_gene_id | integer | Primary Key. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually coming from https://pypi.python.org/pypi/django-genes/0.2 do we want to roll our own instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah didn't realize it. Let's stick with django-genes. Just talked @rzelayafavila -- a programmer in the Greene Lab who helped make django-genes
. He said we will also need to install django-organisms
for the organism table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue to check for Python 3 compatibility of django-genes
.
@dhimmel Why remove the algorithm classifier field and model? I know we're starting with one algorithm, but are we now only going to have one going forward? |
Because I'm not sure exactly how we want to store classifier information. Once the machine learning module matures, we'll have a better idea of how to encode different classifier options. I did think your setup of cognoma/machine-learning#51 does however output some classifier information such as: {
"class": "SGDClassifier",
"module": "sklearn.linear_model.stochastic_gradient",
"parameters": {
"alpha": 0.1,
"average": false,
"class_weight": "balanced",
"epsilon": 0.1,
"eta0": 0.0,
"fit_intercept": true,
"l1_ratio": 0.0,
"learning_rate": "optimal",
"loss": "log",
"n_iter": 5,
"n_jobs": 1,
"penalty": "elasticnet",
"power_t": 0.5,
"random_state": 0,
"shuffle": true,
"verbose": 0,
"warm_start": false
}
} @awm33 given these considerations, do you think it makes sense to hold out on making algorithm fields until we know exactly what fields we need and have a way for the machine learning module to accept the input? |
| mutation_type | string | | | ||
| gene | object | Gene associated with this mutation. | | ||
| sample_id | string | Foreign Key referencing samples | | ||
| entrez_gene_id | object | Foreign Key referencing a mutated gene for the sample | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep the gene as an optional join? We could also wait on the frontend for this. What that would mean is that you wouldn't have to pull all the joins and lookup the gene id to do things like display the name of the gene
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want to keep the gene as an optional join?
Are you asking whether we should add additional columns here for gene_symbol
and gene_description
?
|
||
| Field | Type | Description | | ||
| ------------- |:-------------:| ----------:| | ||
| taxonomy_id | integer | Primary Key. Taxonomy ID assigned by NCBI. | | ||
| common_name | string | Organism common name, e.g. 'Human' | | ||
| scientific_name | string | Organism scientific/binomial name, e.g. 'Homo sapiens' | | ||
|
||
### Sample Mutation Summary (embedded in Gene) | ||
### Disease Types (/diseases) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this disease or disease type? Disease is used in the sample model, but I think we should pick one "disease" or "disease type" and stick with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use disease rather than disease type. It's shorter and is what's used in samples.tsv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this setup "disease" is in the long format (e.g. Glioblastoma Multiforme) - alternatively, we could also use "acronym" (e.g. GBM).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue to discuss acronyms: cognoma/cancer-data#26 (comment). My worry is that using acronyms as primary identifiers is dangerous. For example, if a Xena disease name changes or a new disease gets added, things will break.
@dhimmel I can see removing the parameters, but don't understand why removing the parameters unless we are only using one algorithm going forward. Would the example JSON above be the classifier output for the JSON results object? Or potentially a subobject along with other result objects? |
Basically we don't yet know how to specify an algorithm via parameters yet. So should we keep a field that we're not going to use initially or just add it later. Up to you.
The above JSON is a subobject of |
Pinging @awm33. Would love to get this PR merged. |
@dhimmel I think I worded my response to the algorithm fields weirdly. I meant removing If you think the backfill approoch works, than that's fine to remove. Let's just make sure we don't add an algorithm and not note which one was used. I'm not assigned as the review, but approved 👍 |
Okay I think the backfill plan makes sense, especially since I'm not sure Will merge. |
Motivation
Modify the API to be more simplistic and to align with the current state of
cancer-data
.Implementation Notes
This is a work in progress (WIP). Please comment of the diffs for more information on why I'm proposing certain changes.