Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rendering of UCSC repeatmasker BigBed and BED files #4638

Merged
merged 7 commits into from
Nov 6, 2024
Merged

Conversation

cmdcolin
Copy link
Collaborator

@cmdcolin cmdcolin commented Nov 6, 2024

UCSC RepeatMasker BED files have extended fields which are somewhat complicated

This PR fixes the rendering of repeatmasker BED, BEDTabix, and BigBed files from UCSC with full fields

Example here https://hgdownload.soe.ucsc.edu/gbdb/hs1/t2tRepeatMasker/

See the "Full mode visualization" which shows, at least from the UCSC UI perspective, how complex these data are...https://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=2372603109_AcR4Yt0XayFS0KE6hJ1sksSrAkTv&db=hub_3671779_hs1&c=chr1&g=hub_3671779_t2tRepeatMasker

In the data format, they re-use UCSC specific BED fields and specifically troublesome is they make negative valued blockSizes to indicate unaligned regions. This caused JBrowse to create negative length SimpleFeature's which crashes the track (we just don't allow negative size features, probably best to not change that)

This fixes it by filtering out the negative size features, and also does some heuristics to check if the track is a repeatmasker track to avoid it being inferred to be a "ProcessedTranscript" (it is hard to exactly tell since RepeatMasker uses thickStart,thickEnd, blockSizes, etc. this new heuristic is that description.split(' ').length == 15. hacky, but it is otherwise hard to tell)

@cmdcolin cmdcolin force-pushed the repeatmasker branch 2 times, most recently from 7411e6f to 1deb10d Compare November 6, 2024 16:24
@cmdcolin cmdcolin merged commit 32ad1a6 into main Nov 6, 2024
@cmdcolin cmdcolin deleted the repeatmasker branch November 6, 2024 20:29
@cmdcolin cmdcolin added the bug Something isn't working label Nov 8, 2024
@cmdcolin cmdcolin changed the title Support UCSC repeatmasker BigBed and BED files Fix rendering of UCSC repeatmasker BigBed and BED files Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant