You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In theory, circular sequences should not contain partial genes, as genes spanning the breakpoint should appear seamlessly at both the beginning and the end of the sequence. However, Prodigal/pyrodigal currently do not: (1) assign the same gene ID to partial sequences at both ends of the sequence; and (2) more critically, it treats the sequence edges as independent. As a result, a partial gene can sometimes be identified at only one end.
To address this issue, I've been using a script that I wrote that iteratively changes the breakpoint to minimize gene truncation. However, this approach is obviously suboptimal and can occasionally fail (i.e. find no breakpoint that eliminates truncations), as the predicted genes may change with each breakpoint change.
Although addressing this limitation would require significant effort, it would pyrodigal stand out among gene prediction tools.
The text was updated successfully, but these errors were encountered:
After looking a bit into the issue, here's what I think is possible to do so while keeping the core algorithm by just allowing some nodes to "wrap" about both ends of the sequences. The rest of the scoring procedures (RBS detection, GC%, etc) can be further updated but I think most of them wouldn't change.
The biggest problem here is that we'd need to do some refactoring first, as there are some areas of the code where I can't really guarantee what happens when it receives negative indices or indices greater than the sequence length. In clean code this should potentially never happen but I found in some issues here that Prodigal already does some out-of-buffer indexing at times... I'll start a refactor first so it's easier to isolate the code that needs to be changed.
In theory, circular sequences should not contain partial genes, as genes spanning the breakpoint should appear seamlessly at both the beginning and the end of the sequence. However, Prodigal/
pyrodigal
currently do not: (1) assign the same gene ID to partial sequences at both ends of the sequence; and (2) more critically, it treats the sequence edges as independent. As a result, a partial gene can sometimes be identified at only one end.To address this issue, I've been using a script that I wrote that iteratively changes the breakpoint to minimize gene truncation. However, this approach is obviously suboptimal and can occasionally fail (i.e. find no breakpoint that eliminates truncations), as the predicted genes may change with each breakpoint change.
Although addressing this limitation would require significant effort, it would
pyrodigal
stand out among gene prediction tools.The text was updated successfully, but these errors were encountered: