You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When subsettign to individual exons, many may be identical entirely at the sequence/region level if they are shared between different full length transcripts. THis is wasted output in the GTF (inflating its size) and also triggers a warning when generating the salmon index.
[2022-03-19 15:44:58.606] [puff::index::jointLog] [warning] Removed 10618 transcripts that were sequence duplicates of indexed transcripts.
[2022-03-19 15:44:58.606] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
Would maybe be good to double-check for sequence duplicates prior to outputting the GTF. Could always assign a 'combined tx id' in these cases (e.g. transcript IDs combined with string separator).
As salmon index removes duplicates this shouldn't cause any downstream problems, save for potentially tx IDs disappearign between the quant GTF and salmon quantification output.
The text was updated successfully, but these errors were encountered:
When subsettign to individual exons, many may be identical entirely at the sequence/region level if they are shared between different full length transcripts. THis is wasted output in the GTF (inflating its size) and also triggers a warning when generating the salmon index.
Would maybe be good to double-check for sequence duplicates prior to outputting the GTF. Could always assign a 'combined tx id' in these cases (e.g. transcript IDs combined with string separator).
As salmon index removes duplicates this shouldn't cause any downstream problems, save for potentially tx IDs disappearign between the quant GTF and salmon quantification output.
The text was updated successfully, but these errors were encountered: