Release 3.1.1
This release addresses an inconsistencies and incompleteness of the rRNA annotation specific to the CHM13 version of the CHESS annotation.
Changelog
- rRNA annotation moved over from the official RefSeq CHM13 annotation, including all annotated copies
- New gene and transcript IDs were assigned consistent with the CHS nomenclature
- rRNA1 and rRNA2 genes on chrM assigned matching CHSID for GRCh38 and CHM13
Statement
Chess Release 3.1.1
Files
Filenames | Genome | Content | Description |
---|---|---|---|
chess3.1.1.GRCh38.gff.gz, chess3.1.1.GRCh38.gtf.gz, chess3.1.1.GRCh38.bb.gz | GRCh38 | CHESS gene annotation | This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.1.1.CHM13.gff.gz, chess3.1.1.CHM13.gtf.gz, chess3.1.1.CHM13.bb.gz | CHM13 | CHESS gene annotation on CHM13 | This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.1.1.GRCh38.primary.gff.gz, chess3.1.1.GRCh38.primary.gtf.gz | GRCh38 | CHESS gene annotation excluding alternative scaffolds | This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.1.1.GRCh38.protein.fa.gz | GRCh38 | CHESS proteins | This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome. |
chess3.1.1.CHM13.protein.fa.gz | CHM13 | CHESS proteins | This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome. |
chess3.1.1.mapfile.tsv | - | Cross-Reference | This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz | GRCh38 | Assembled Transcripts | Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
genes | transcripts | |
---|---|---|
protein_coding | 19838 | 99201 |
lncRNA | 17624 | 34709 |
pseudogene | 16774 | 17263 |
other | 4269 | 7190 |
alt_scaffolds | 5250 | 10088 |