-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protein-translation: Add canonical data. #997
protein-translation: Add canonical data. #997
Conversation
Problem description. closes #577 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sjwarner-bp Regarding the length, this is small compared to some (clock
, for example).
"description": "Translate RNA strand into correct protein list", | ||
"property": "translate", | ||
"strand": "AUGUUUUGG", | ||
"expected": ["Methionine","Phenylalanine","Tryptophan"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type returned by translate
should be consistent; until this line, the output of translate
has been a string, but now a list is expected. The python test cases use two different properties of_codon
and of_rna
for these; something similar would be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course - my mistake! I'll correct that 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a lot like what has been discussed recently at #983. "translate_codon"
seems to be an intermediate test of what the description is really asking for. The "translate_codon"
tests may not belong. @petertseng, @Insti thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rpottsoh I believe @ErikSchierboom beat you to it. See his comments.
Oh dear - I've made changes but the Travis build has failed. Do I need to add the new property names ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already looking good! I have some points of discussion, which I've added as comments. Thanks for working on this!
{ | ||
"description": "Translate input RNA sequences into proteins", | ||
"comments": [ | ||
" Returns the name of the protein if given RNA is valid, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe lose the extra space in the beginning of the string?
"description": "Translate input RNA sequences into proteins", | ||
"comments": [ | ||
" Returns the name of the protein if given RNA is valid, " | ||
, " else throws an error. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe lose the extra space in the beginning of the string?
"cases": [ | ||
{ | ||
"description": "Methionine RNA sequence is identified", | ||
"property": "translate_codon", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lately, we have tried to have move away from testing intermediate results in the canonical data, and only test end results. For this canonical data, I would suggest to do the same and only have a single translate
property to test. The existing translate_codon
property tests (like this one) would then use the translate
property and have as its expected value an array with a single value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think of this, but it makes much more sense. I'll work on that.
}, | ||
{ | ||
"description": "Translate RNA strand into correct protein list", | ||
"property": "translate_rna", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In keeping in line with my comment above on testing only one property, I would suggest renaming this to translate
.
@sjwarner-bp Regarding the failed Travis build, this from the canonical-schema: "property":
{ "description": "A letters-only, lowerCamelCase property name"
, "type" : "string"
, "pattern" : "^[a-z]+([A-Z][a-z]+)*[A-Z]?$"
}, |
That makes sense - I'll keep this in mind for future. |
"expected": ["Methionine","Phenylalanine"] | ||
}, | ||
{ | ||
"description": "Translation stops if codon present 2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Translation stops if codon" -> "Translation stops if STOP codon present 2"
Maybe describe how this case differs from the first one?
"expected": ["Phenylalanine"] | ||
}, | ||
{ | ||
"description": "Phenylalanine RNA sequence 2 is identified", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if there is another way to discern between the different Phenylalanine types? They are currently named "Phenylalanine RNA sequence 1" and "Phenylalanine RNA sequence 2", but maybe there is a more descriptive name? If there isn't, just ignore this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the problem description, I couldn't see a particularly good way of differentiating the two from each other - maybe if someone knows biology they could chime in, otherwise this might be as good as we can get it.
], | ||
"cases": [ | ||
{ | ||
"description": "Methionine RNA sequence is identified", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the " is identified" postfix can be removed, as it doesn't add much extra information I think. But maybe someone else disagrees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me - concise is good!
@@ -118,13 +118,13 @@ | |||
"expected": ["Methionine","Phenylalanine","Tryptophan"] | |||
}, | |||
{ | |||
"description": "Translation stops if STOP codon present 1", | |||
"description": "Translation stops if STOP codon present in short string", | |||
"property": "translate", | |||
"strand": "AUGUUUUAA", | |||
"expected": ["Methionine","Phenylalanine"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STOP
is included in the above test cases where it is the only codon; either the above cases should expected empty lists as a result or STOP
should be included in the results here.
@@ -118,13 +118,13 @@ | |||
"expected": ["Methionine","Phenylalanine","Tryptophan"] | |||
}, | |||
{ | |||
"description": "Translation stops if STOP codon present 1", | |||
"description": "Translation stops if STOP codon present in short string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps Translation stops if STOP codon at end of sequence
vs below Translation stops if STOP codon in middle of sequence
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or
Translation stops when STOP codon encountered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to distinguish between this case and the next... perhaps Translation stops when STOP codon encountered at end of sequence
vs Translation stops when STOP codon encountered in middle of sequence
?
"property": "translate", | ||
"strand": "AUGUUUUAA", | ||
"expected": ["Methionine","Phenylalanine"] | ||
}, | ||
{ | ||
"description": "Translation stops if codon present 2", | ||
"description": "Translation stops if codon present in longer string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my above comment, perhaps a change in wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good! Thanks @sjwarner-bp!
Love how fast this one's going! @sjwarner-bp please see my latest comments, I'm afraid they might've been covered up by the flurry of activity. |
I think I got them - I've added 'STOP' to the end of each list where a 'STOP' codon is involved. I also amended the description for the test of early termination - hope this was right! 🙂 |
@@ -118,7 +118,7 @@ | |||
"expected": ["Methionine","Phenylalanine","Tryptophan"] | |||
}, | |||
{ | |||
"description": "Translation stops if STOP codon present in short string", | |||
"description": "Translation stops if STOP codon at end of sequence", | |||
"property": "translate", | |||
"strand": "AUGUUUUAA", | |||
"expected": ["Methionine","Phenylalanine", "STOP"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "STOP" be included in "expected"? The stop example in the README didn't include stop in the resulting protein.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added because of @cmccandless point (above) regarding consistency between tests. If "STOP" is expected in the tests for "STOP" codons, it might make sense to include them in the list of expected proteins?
Equally, it might not make too much sense - the Python track does not include them in protein lists, and only identifies them if they are input as a single codon.
What do we think? I'd be interested in hearing which way people think makes most sense and still remains consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps then:
"strand": "UAA",
"expected": ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or update the README to list the STOP protein in the stop example....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to expect an empty list in the case of "expected": ["STOP"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, empty list coming up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You all seem to be doing a good job on this 👍 keep it up.
"expected": ["Tryptophan","Cysteine","Tyrosine"] | ||
}, | ||
{ | ||
"description": "Test invalid codons", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't test invalid codons.
See also: #902
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Maybe remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done!
"cases": [ | ||
{ | ||
"description": "Methionine RNA sequence", | ||
"property": "translate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about proteins
as the property?
"expected": [] | ||
}, | ||
{ | ||
"description": "STOP codon RNA sequence 3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see tests for more of the simple STOP codon sequences.
UAGUGG
UGGUAG
UGGUAGUGG
{ | ||
"description": "Translation stops if codon in middle of sequence", | ||
"property": "translate", | ||
"strand": "UGGUGUUAUUAAUGGUUU", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put some thought put into which sequences are easiest for humans to read and distinguish.
…existing stop codon tests, remove testing for invalid data.
Merged. Thanks @sjwarner-bp! 🎉 |
Add protein translation canonical data as per #577.
After going to implement this exercise in Java, and realising that there is not a
canonical-data.json
for the problem, I decided to do this. I have helped a friend complete this exercise on the Python track, and believe the Python tests to be very comprehensive, so I have used them as my main source of data.Please let me know if this is too long for a
canonical-data.json
, as there are about 20 tests...