Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protein-translation: Add canonical data. #997

Merged
143 changes: 143 additions & 0 deletions exercises/protein-translation/canonical-data.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
{
"exercise": "protein-translation",
"version": "1.0.0",
"cases": [
{
"description": "Translate input RNA sequences into proteins",
"comments": [
" Returns the name of the protein if given RNA is valid, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lose the extra space in the beginning of the string?

, " else throws an error. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lose the extra space in the beginning of the string?

],
"cases": [
{
"description": "Methionine RNA sequence is identified",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the " is identified" postfix can be removed, as it doesn't add much extra information I think. But maybe someone else disagrees?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me - concise is good!

"property": "translate_codon",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lately, we have tried to have move away from testing intermediate results in the canonical data, and only test end results. For this canonical data, I would suggest to do the same and only have a single translate property to test. The existing translate_codon property tests (like this one) would then use the translate property and have as its expected value an array with a single value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of this, but it makes much more sense. I'll work on that.

"strand": "AUG",
"expected": "Methionine"
},
{
"description": "Phenylalanine RNA sequence 1 is identified",
"property": "translate_codon",
"strand": "UUU",
"expected": "Phenylalanine"
},
{
"description": "Phenylalanine RNA sequence 2 is identified",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if there is another way to discern between the different Phenylalanine types? They are currently named "Phenylalanine RNA sequence 1" and "Phenylalanine RNA sequence 2", but maybe there is a more descriptive name? If there isn't, just ignore this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the problem description, I couldn't see a particularly good way of differentiating the two from each other - maybe if someone knows biology they could chime in, otherwise this might be as good as we can get it.

"property": "translate_codon",
"strand": "UUC",
"expected": "Phenylalanine"
},
{
"description": "Leucine RNA sequence 1 is identified",
"property": "translate_codon",
"strand": "UUA",
"expected": "Leucine"
},
{
"description": "Leucine RNA sequence 2 is identified",
"property": "translate_codon",
"strand": "UUG",
"expected": "Leucine"
},
{
"description": "Serine RNA sequence 1 is identified",
"property": "translate_codon",
"strand": "UCU",
"expected": "Serine"
},
{
"description": "Serine RNA sequence 2 is identified",
"property": "translate_codon",
"strand": "UCC",
"expected": "Serine"
},
{
"description": "Serine RNA sequence 3 is identified",
"property": "translate_codon",
"strand": "UCA",
"expected": "Serine"
},
{
"description": "Serine RNA sequence 4 is identified",
"property": "translate_codon",
"strand": "UCG",
"expected": "Serine"
},
{
"description": "Tyrosine RNA sequence 1 is identified",
"property": "translate_codon",
"strand": "UAU",
"expected": "Tyrosine"
},
{
"description": "Tyrosine RNA sequence 2 is identified",
"property": "translate_codon",
"strand": "UAC",
"expected": "Tyrosine"
},
{
"description": "Cysteine RNA sequence 1 is identified",
"property": "translate_codon",
"strand": "UGU",
"expected": "Cysteine"
},
{
"description": "Cysteine RNA sequence 2 is identified",
"property": "translate_codon",
"strand": "UGC",
"expected": "Cysteine"
},
{
"description": "Tryptophan RNA sequence is identified",
"property": "translate_codon",
"strand": "UGG",
"expected": "Tryptophan"
},
{
"description": "STOP codon RNA sequence is identified",
"property": "translate_codon",
"strand": "UAA",
"expected": "STOP"
},
{
"description": "STOP codon RNA sequence is identified",
"property": "translate_codon",
"strand": "UAG",
"expected": "STOP"
},
{
"description": "STOP codon RNA sequence is identified",
"property": "translate_codon",
"strand": "UGA",
"expected": "STOP"
},
{
"description": "Translate RNA strand into correct protein list",
"property": "translate_rna",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In keeping in line with my comment above on testing only one property, I would suggest renaming this to translate.

"strand": "AUGUUUUGG",
"expected": ["Methionine","Phenylalanine","Tryptophan"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type returned by translate should be consistent; until this line, the output of translate has been a string, but now a list is expected. The python test cases use two different properties of_codon and of_rna for these; something similar would be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course - my mistake! I'll correct that 🙂

Copy link
Member

@rpottsoh rpottsoh Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a lot like what has been discussed recently at #983. "translate_codon" seems to be an intermediate test of what the description is really asking for. The "translate_codon" tests may not belong. @petertseng, @Insti thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpottsoh I believe @ErikSchierboom beat you to it. See his comments.

},
{
"description": "Translation stops if STOP codon present 1",
"property": "translate_rna",
"strand": "AUGUUUUAA",
"expected": ["Methionine","Phenylalanine"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STOP is included in the above test cases where it is the only codon; either the above cases should expected empty lists as a result or STOP should be included in the results here.

},
{
"description": "Translation stops if codon present 2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Translation stops if codon" -> "Translation stops if STOP codon present 2"

Maybe describe how this case differs from the first one?

"property": "translate_rna",
"strand": "UGGUGUUAUUAAUGGUUU",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put some thought put into which sequences are easiest for humans to read and distinguish.

"expected": ["Tryptophan","Cysteine","Tyrosine"]
},
{
"description": "Test invalid codons",
Copy link
Contributor

@Insti Insti Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't test invalid codons.

See also: #902

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe remove this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done!

"property": "translate_codon",
"strand": "CARROT",
"expected": {
"error": "Invalid nucleotide in strand"
}
}
]
}
]
}