Simplified file sets for data generated by flatfile-to-json.pl #785

keiranmraine · 2016-07-13T04:24:11Z

Currently when a gff3 file is converted to a gene/transcript track with flatfile-to-json.pl a folder and a minimum of 2 data files are generated per chromosome. For human GRCh37 gene/transcript track with decoy and scaffolds that comes to 984 lf-*.jsonz and 99 hits-*.jsonz.

Have you thought about using tabix in a more novel way?

We use tabix to make pre-generated data structures easily accessible, specifically for gene data (everything after the first 3 columns is custom, but column 5 contains a perl data structure for the transcript):

1       29553   31097   ENST00000473358 MIR1302-10      712     $VAR1 = bless( {'_genomicminpos' => 29554,'_accversion' => 1,'_ccds' => undef,'_dbvers
1       30266   31109   ENST00000469289 MIR1302-10      535     $VAR1 = bless( {'_genomicminpos' => 30267,'_accversion' => 1,'_ccds' => undef,'_dbvers

You could build a standard JSON structure for each gene but write it to file as

chr 1-start 1-end JSON

1 line per gene, and then bgzip and index with tabix:

bgzip lf.json
tabix -s 1 -b 2 -e 3 lf.json.gz
bgzip hist.json
tabix -s 1 -b 2 -e 3 hist.json.gz

This would replacte the 1000+ files with 4 for the whole genome. lf.json.gz.tbi and hist.json.gz.tbi

Even if one file is maintained per chromosome this would still reduce down to 184 (46chr*4)

The text was updated successfully, but these errors were encountered:

cmdcolin · 2016-07-26T17:34:41Z

Technically what you propose is like a BED file with a bunch of info encoded in the 4th column. Since BED and BEDTabix is now mainline, there's nothing blocking using a format like this, but it just needs some code to convert into this format and interpret it.

rbuels · 2018-01-28T18:07:02Z

Any implementation of this would need to be careful that the "old" format made by flatfile-to-json.pl still worked in the browser.

cmdcolin mentioned this issue Jan 11, 2017

GFF3Tabix issues #780

Closed

rbuels added the feature req this adds new functionality to JBrowse 1 label Jan 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified file sets for data generated by flatfile-to-json.pl #785

Simplified file sets for data generated by flatfile-to-json.pl #785

keiranmraine commented Jul 13, 2016

cmdcolin commented Jul 26, 2016

rbuels commented Jan 28, 2018

Simplified file sets for data generated by flatfile-to-json.pl #785

Simplified file sets for data generated by flatfile-to-json.pl #785

Comments

keiranmraine commented Jul 13, 2016

cmdcolin commented Jul 26, 2016

rbuels commented Jan 28, 2018