Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3Tabix issues #780

Closed
cmdcolin opened this issue Jul 6, 2016 · 9 comments
Closed

GFF3Tabix issues #780

cmdcolin opened this issue Jul 6, 2016 · 9 comments
Assignees
Labels
bug this is a problem that needs to be fixed high priority related to a high-level project goal in progress currently being worked on
Milestone

Comments

@cmdcolin
Copy link
Contributor

cmdcolin commented Jul 6, 2016

There are some potential issues with the semi-new GFF3Tabix parser that, depending on your use case, may make it unusable

  1. The exons/CDS of a given feature can be missing from a given "View details" popup because of the way Tabix information is downloaded, i.e. on a block by block basis (so if exon is not in block, it is missed)
  2. The features may not render correctly by NeatCanvasFeatures for similar reasons to 1, because NeatCanvasFeatures needs complete information about exons to calculate intron hats
  3. The way that features are sorted in a tabix gff3 file can make the GFF3Tabix parser miss the first exon of a gene. For example, when creating the GFF3 tabix file, you would generally sort the gff3 by coordinate, but this can end up placing subfeatures before the parent feature in the sorting order, (i.e. if the exon and gene both share a start coordinate, sorting programs can arbitrarily place the exon before the gene line). This creates a valid tabix file, but the GFF3Tabix parser fails to find that exon. Just for reference, the standard GFF3 parser in jbrowse does not allow subfeatures occurring before parent feature line either.
  4. The features are not indexed by generate-names.pl. Technically VCFTabix tracks are indexed by generate-names.pl, so it might not be too big of a stretch to index GFF3Tabix as well

The workarounds IMO would be
(1) a custom "view details" box could be made for this case
(2) to not use neatCanvasFeatures, however, NeatCanvasFeatures is enabled on the sample browser, and there's no way to disable it on the specific track.
(3) to sort the GFF3 file carefully so that subfeatures don't occur before the parent feature -or- to make all information about a feature occur on a single line requiring more preprocessing of the original gff file (see #785)
(4) add support to generate-names.pl for gff tabix

Given the drawbacks we could remove gff3 tabix support entirely or address these issues over time.

Feedback welcome. Note: BEDTabix or GFF3Tabix of a file with only single level features wouldn't suffer any problems

@cmdcolin
Copy link
Contributor Author

cmdcolin commented Jul 11, 2016

The third issue can possibly be fixed using this technique

https://github.com/GMOD/jbrowse/tree/update_tabix_sort

@cmdcolin
Copy link
Contributor Author

The first issue could be solved potentially by addressing #559 (i.e. if we don't load all subfeatures until needed, it is not a problem)

@billzt
Copy link
Contributor

billzt commented Jul 13, 2016

Well, I still want GFF3 tabix support, even if much workaround required. The traditional flatfile-to-json.pl script would generate huge number of small files, which makes backup of JBrowse data extremely difficult.

@billzt
Copy link
Contributor

billzt commented May 9, 2017

Currently the third issue can be resolved by a Perl script: https://github.com/billzt/gff3sort

@cmdcolin
Copy link
Contributor Author

@billzt awesome I'll check that out! I had used genometools with the linesort option to prepare gff3tabix before, but it actually was not perfect, so I will check out your script

@rbuels rbuels added bug this is a problem that needs to be fixed high priority related to a high-level project goal labels Jan 30, 2018
@nathandunn nathandunn self-assigned this Feb 1, 2018
@nathandunn nathandunn changed the title Potential GFF3Tabix issues GFF3Tabix issues Feb 1, 2018
@nathandunn nathandunn added this to the 1.12.4 milestone Feb 1, 2018
@nathandunn
Copy link
Contributor

I have a fix for most of these issues that I can integrate into a PR done on an Apollo projection branch. Fixes missing exons (precomputes appropriate block ranges), and seemingly renders the subfeatures. I haven't tested this for NeatFeatures, but it does fix for HTMLFeautres.

https://github.com/nathandunn/Apollo/blob/project-gff3/client/apollo/js/View/Track/DraggableProjectedHTMLFeatures.js#L1031

The other "workaround" is to remove the gene entry file for a GFF3, but I like seeing the gene in the details section and I think this will be a tractable solution.

@nathandunn
Copy link
Contributor

If anyone else wants to take a shot, also fine. I just have some ideas of what the fixes are

@rbuels rbuels added the in progress currently being worked on label Feb 1, 2018
@rbuels rbuels modified the milestones: 1.12.4, 1.13.0 Feb 2, 2018
@rbuels rbuels modified the milestones: 1.13.0, 1.14.0 Mar 14, 2018
@rbuels
Copy link
Collaborator

rbuels commented Apr 7, 2018

I think all of this is taken care of in the dev branch now, with the GFF3tabix overhaul I just did, plus the new topLevelFeatures config. could you guys have a look to confirm and reopen this if there are still problems?

@rbuels rbuels closed this as completed Apr 7, 2018
@cmdcolin
Copy link
Contributor Author

cmdcolin commented Apr 8, 2018

These all look great! I am experiencing a little bit of slowness with scrolling around but I think that overall the correctness of the data and bugfixes are working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug this is a problem that needs to be fixed high priority related to a high-level project goal in progress currently being worked on
Projects
None yet
Development

No branches or pull requests

4 participants