Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified file sets for data generated by flatfile-to-json.pl #785

Open
keiranmraine opened this issue Jul 13, 2016 · 2 comments
Open

Simplified file sets for data generated by flatfile-to-json.pl #785

keiranmraine opened this issue Jul 13, 2016 · 2 comments
Labels
feature req this adds new functionality to JBrowse 1

Comments

@keiranmraine
Copy link
Contributor

@billzt, @cmdcolin, relates to #780.

Currently when a gff3 file is converted to a gene/transcript track with flatfile-to-json.pl a folder and a minimum of 2 data files are generated per chromosome. For human GRCh37 gene/transcript track with decoy and scaffolds that comes to 984 lf-*.jsonz and 99 hits-*.jsonz.

Have you thought about using tabix in a more novel way?

We use tabix to make pre-generated data structures easily accessible, specifically for gene data (everything after the first 3 columns is custom, but column 5 contains a perl data structure for the transcript):

1       29553   31097   ENST00000473358 MIR1302-10      712     $VAR1 = bless( {'_genomicminpos' => 29554,'_accversion' => 1,'_ccds' => undef,'_dbvers
1       30266   31109   ENST00000469289 MIR1302-10      535     $VAR1 = bless( {'_genomicminpos' => 30267,'_accversion' => 1,'_ccds' => undef,'_dbvers

You could build a standard JSON structure for each gene but write it to file as

chr 1-start 1-end JSON

1 line per gene, and then bgzip and index with tabix:

bgzip lf.json
tabix -s 1 -b 2 -e 3 lf.json.gz
bgzip hist.json
tabix -s 1 -b 2 -e 3 hist.json.gz

This would replacte the 1000+ files with 4 for the whole genome. lf.json.gz.tbi and hist.json.gz.tbi

Even if one file is maintained per chromosome this would still reduce down to 184 (46chr*4)

@cmdcolin
Copy link
Contributor

Technically what you propose is like a BED file with a bunch of info encoded in the 4th column. Since BED and BEDTabix is now mainline, there's nothing blocking using a format like this, but it just needs some code to convert into this format and interpret it.

@rbuels rbuels added the feature req this adds new functionality to JBrowse 1 label Jan 28, 2018
@rbuels
Copy link
Collaborator

rbuels commented Jan 28, 2018

Any implementation of this would need to be careful that the "old" format made by flatfile-to-json.pl still worked in the browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature req this adds new functionality to JBrowse 1
Projects
None yet
Development

No branches or pull requests

3 participants