Fix rendering of UCSC repeatmasker BigBed and BED files #4638
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
UCSC RepeatMasker BED files have extended fields which are somewhat complicated
This PR fixes the rendering of repeatmasker BED, BEDTabix, and BigBed files from UCSC with full fields
Example here https://hgdownload.soe.ucsc.edu/gbdb/hs1/t2tRepeatMasker/
See the "Full mode visualization" which shows, at least from the UCSC UI perspective, how complex these data are...https://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=2372603109_AcR4Yt0XayFS0KE6hJ1sksSrAkTv&db=hub_3671779_hs1&c=chr1&g=hub_3671779_t2tRepeatMasker
In the data format, they re-use UCSC specific BED fields and specifically troublesome is they make negative valued blockSizes to indicate unaligned regions. This caused JBrowse to create negative length SimpleFeature's which crashes the track (we just don't allow negative size features, probably best to not change that)
This fixes it by filtering out the negative size features, and also does some heuristics to check if the track is a repeatmasker track to avoid it being inferred to be a "ProcessedTranscript" (it is hard to exactly tell since RepeatMasker uses thickStart,thickEnd, blockSizes, etc. this new heuristic is that description.split(' ').length == 15. hacky, but it is otherwise hard to tell)