You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when a gff3 file is converted to a gene/transcript track with flatfile-to-json.pl a folder and a minimum of 2 data files are generated per chromosome. For human GRCh37 gene/transcript track with decoy and scaffolds that comes to 984 lf-*.jsonz and 99 hits-*.jsonz.
Have you thought about using tabix in a more novel way?
We use tabix to make pre-generated data structures easily accessible, specifically for gene data (everything after the first 3 columns is custom, but column 5 contains a perl data structure for the transcript):
Technically what you propose is like a BED file with a bunch of info encoded in the 4th column. Since BED and BEDTabix is now mainline, there's nothing blocking using a format like this, but it just needs some code to convert into this format and interpret it.
@billzt, @cmdcolin, relates to #780.
Currently when a gff3 file is converted to a gene/transcript track with
flatfile-to-json.pl
a folder and a minimum of 2 data files are generated per chromosome. For human GRCh37 gene/transcript track with decoy and scaffolds that comes to 984lf-*.jsonz
and 99hits-*.jsonz
.Have you thought about using tabix in a more novel way?
We use tabix to make pre-generated data structures easily accessible, specifically for gene data (everything after the first 3 columns is custom, but column 5 contains a perl data structure for the transcript):
You could build a standard JSON structure for each gene but write it to file as
1 line per gene, and then bgzip and index with tabix:
This would replacte the 1000+ files with 4 for the whole genome.
lf.json.gz.tbi
andhist.json.gz.tbi
Even if one file is maintained per chromosome this would still reduce down to 184 (46chr*4)
The text was updated successfully, but these errors were encountered: