Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protein-translation: Add canonical data. #997

Merged

Conversation

sjwarner-bp
Copy link
Contributor

Add protein translation canonical data as per #577.

After going to implement this exercise in Java, and realising that there is not a canonical-data.json for the problem, I decided to do this. I have helped a friend complete this exercise on the Python track, and believe the Python tests to be very comprehensive, so I have used them as my main source of data.

Please let me know if this is too long for a canonical-data.json, as there are about 20 tests...

@rpottsoh rpottsoh changed the title Add protein translation canonical data as per #577. protein-translation: Add canonical data. Nov 8, 2017
@rpottsoh
Copy link
Member

rpottsoh commented Nov 8, 2017

Problem description.

closes #577

Copy link
Contributor

@cmccandless cmccandless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjwarner-bp Regarding the length, this is small compared to some (clock, for example).

"description": "Translate RNA strand into correct protein list",
"property": "translate",
"strand": "AUGUUUUGG",
"expected": ["Methionine","Phenylalanine","Tryptophan"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type returned by translate should be consistent; until this line, the output of translate has been a string, but now a list is expected. The python test cases use two different properties of_codon and of_rna for these; something similar would be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course - my mistake! I'll correct that 🙂

Copy link
Member

@rpottsoh rpottsoh Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a lot like what has been discussed recently at #983. "translate_codon" seems to be an intermediate test of what the description is really asking for. The "translate_codon" tests may not belong. @petertseng, @Insti thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpottsoh I believe @ErikSchierboom beat you to it. See his comments.

@sjwarner-bp
Copy link
Contributor Author

Oh dear - I've made changes but the Travis build has failed. Do I need to add the new property names (translate_codon,translate_rna) somewhere?

Copy link
Member

@ErikSchierboom ErikSchierboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking good! I have some points of discussion, which I've added as comments. Thanks for working on this!

{
"description": "Translate input RNA sequences into proteins",
"comments": [
" Returns the name of the protein if given RNA is valid, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lose the extra space in the beginning of the string?

"description": "Translate input RNA sequences into proteins",
"comments": [
" Returns the name of the protein if given RNA is valid, "
, " else throws an error. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lose the extra space in the beginning of the string?

"cases": [
{
"description": "Methionine RNA sequence is identified",
"property": "translate_codon",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lately, we have tried to have move away from testing intermediate results in the canonical data, and only test end results. For this canonical data, I would suggest to do the same and only have a single translate property to test. The existing translate_codon property tests (like this one) would then use the translate property and have as its expected value an array with a single value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of this, but it makes much more sense. I'll work on that.

},
{
"description": "Translate RNA strand into correct protein list",
"property": "translate_rna",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In keeping in line with my comment above on testing only one property, I would suggest renaming this to translate.

@cmccandless
Copy link
Contributor

@sjwarner-bp Regarding the failed Travis build, this from the canonical-schema:

"property":
        { "description": "A letters-only, lowerCamelCase property name"
        , "type"       : "string"
        , "pattern"    : "^[a-z]+([A-Z][a-z]+)*[A-Z]?$"
        },

@sjwarner-bp
Copy link
Contributor Author

That makes sense - I'll keep this in mind for future.

"expected": ["Methionine","Phenylalanine"]
},
{
"description": "Translation stops if codon present 2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Translation stops if codon" -> "Translation stops if STOP codon present 2"

Maybe describe how this case differs from the first one?

"expected": ["Phenylalanine"]
},
{
"description": "Phenylalanine RNA sequence 2 is identified",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if there is another way to discern between the different Phenylalanine types? They are currently named "Phenylalanine RNA sequence 1" and "Phenylalanine RNA sequence 2", but maybe there is a more descriptive name? If there isn't, just ignore this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the problem description, I couldn't see a particularly good way of differentiating the two from each other - maybe if someone knows biology they could chime in, otherwise this might be as good as we can get it.

],
"cases": [
{
"description": "Methionine RNA sequence is identified",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the " is identified" postfix can be removed, as it doesn't add much extra information I think. But maybe someone else disagrees?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me - concise is good!

@@ -118,13 +118,13 @@
"expected": ["Methionine","Phenylalanine","Tryptophan"]
},
{
"description": "Translation stops if STOP codon present 1",
"description": "Translation stops if STOP codon present in short string",
"property": "translate",
"strand": "AUGUUUUAA",
"expected": ["Methionine","Phenylalanine"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STOP is included in the above test cases where it is the only codon; either the above cases should expected empty lists as a result or STOP should be included in the results here.

@@ -118,13 +118,13 @@
"expected": ["Methionine","Phenylalanine","Tryptophan"]
},
{
"description": "Translation stops if STOP codon present 1",
"description": "Translation stops if STOP codon present in short string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Translation stops if STOP codon at end of sequence vs below Translation stops if STOP codon in middle of sequence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or
Translation stops when STOP codon encountered

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to distinguish between this case and the next... perhaps Translation stops when STOP codon encountered at end of sequence vs Translation stops when STOP codon encountered in middle of sequence?

"property": "translate",
"strand": "AUGUUUUAA",
"expected": ["Methionine","Phenylalanine"]
},
{
"description": "Translation stops if codon present 2",
"description": "Translation stops if codon present in longer string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my above comment, perhaps a change in wording?

Copy link
Member

@rpottsoh rpottsoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good! Thanks @sjwarner-bp!

@cmccandless
Copy link
Contributor

Love how fast this one's going!

@sjwarner-bp please see my latest comments, I'm afraid they might've been covered up by the flurry of activity.

@sjwarner-bp
Copy link
Contributor Author

I think I got them - I've added 'STOP' to the end of each list where a 'STOP' codon is involved.

I also amended the description for the test of early termination - hope this was right! 🙂

@@ -118,7 +118,7 @@
"expected": ["Methionine","Phenylalanine","Tryptophan"]
},
{
"description": "Translation stops if STOP codon present in short string",
"description": "Translation stops if STOP codon at end of sequence",
"property": "translate",
"strand": "AUGUUUUAA",
"expected": ["Methionine","Phenylalanine", "STOP"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "STOP" be included in "expected"? The stop example in the README didn't include stop in the resulting protein.

Copy link
Contributor Author

@sjwarner-bp sjwarner-bp Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added because of @cmccandless point (above) regarding consistency between tests. If "STOP" is expected in the tests for "STOP" codons, it might make sense to include them in the list of expected proteins?

Equally, it might not make too much sense - the Python track does not include them in protein lists, and only identifies them if they are input as a single codon.

What do we think? I'd be interested in hearing which way people think makes most sense and still remains consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps then:

"strand":  "UAA",
"expected": ""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or update the README to list the STOP protein in the stop example....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to expect an empty list in the case of "expected": ["STOP"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, empty list coming up!

Copy link
Member

@rpottsoh rpottsoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@Insti Insti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You all seem to be doing a good job on this 👍 keep it up.

"expected": ["Tryptophan","Cysteine","Tyrosine"]
},
{
"description": "Test invalid codons",
Copy link
Contributor

@Insti Insti Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't test invalid codons.

See also: #902

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe remove this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done!

"cases": [
{
"description": "Methionine RNA sequence",
"property": "translate",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about proteins as the property?

"expected": []
},
{
"description": "STOP codon RNA sequence 3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see tests for more of the simple STOP codon sequences.
UAGUGG
UGGUAG
UGGUAGUGG

{
"description": "Translation stops if codon in middle of sequence",
"property": "translate",
"strand": "UGGUGUUAUUAAUGGUUU",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put some thought put into which sequences are easiest for humans to read and distinguish.

…existing stop codon tests, remove testing for invalid data.
@ErikSchierboom
Copy link
Member

Merged. Thanks @sjwarner-bp! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants