Releases: chess-genome/chess
Releases · chess-genome/chess
Release 3.1.3
This release introduces transcriptome assembly and quantification oriented versions of the annotation, removing alternative scaffolds as well as duplicated transcripts. CHM13 version of the annotation has tRNAs copied over from RefSeq.
Changelog
- Duplicated transcripts removed from .assembly.* files. Representative transcripts were chosen if 1. CDS matches MANE; 2. CDS is closest to MANE; 3. CDS maximizes Tukey's Median of the ILPIs of CDSs at each locus; 4. random choice
- tRNAs copied over to CHM13 version of CHESS from RefSeq
Statement
Chess Release 3.1.3
Files
Filenames |
Genome |
Content |
Description |
chess3.1.3.GRCh38.gff.gz, chess3.1.3.GRCh38.gtf.gz, chess3.1.3.GRCh38.bb.gz |
GRCh38 |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.1.3.CHM13.gff.gz, chess3.1.3.CHM13.gtf.gz, chess3.1.3.CHM13.bb.gz |
CHM13 |
CHESS gene annotation on CHM13 |
This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.1.3.GRCh38.primary.gff.gz, chess3.1.3.GRCh38.primary.gtf.gz, chess3.1.3.GRCh38.primary.bb.gz |
GRCh38 |
CHESS gene annotation excluding alternative scaffolds |
This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.1.3.GRCh38.assembly.gff.gz, chess3.1.3.GRCh38.assembly.gtf.gz, chess3.1.3.GRCh38.assembly.bb.gz |
GRCh38 |
CHESS gene annotation excluding alternative scaffolds and duplicate transcripts |
This file contains the assembly gene set described in the CHESS paper but excludes annotations of any alternative scaffolds and retains a single copy of each transcript duplicate. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.1.3.CHM13.assembly.gff.gz, chess3.1.3.CHM13.assembly.gtf.gz, chess3.1.3.CHM13.assembly.bb.gz |
CHM13 |
CHESS gene annotation excluding alternative scaffolds and duplicate transcripts |
This file contains the assembly gene set described in the CHESS paper but only retains a single copy of each transcript duplicate. All genes and transcripts are mapped onto the CHM13 human reference genome. |
chess3.1.3.GRCh38.protein.fa.gz |
GRCh38 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome. |
chess3.1.3.CHM13.protein.fa.gz |
CHM13 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome. |
chess3.1.3.mapfile.tsv |
- |
Cross-Reference |
This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz |
GRCh38 |
Assembled Transcripts |
Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
|
genes |
transcripts |
protein_coding |
19838 |
99201 |
lncRNA |
17624 |
34709 |
pseudogene |
16774 |
17263 |
other |
4269 |
7190 |
alt_scaffolds |
5250 |
10088 |
Release 3.1.2
This release addresses invalid new gene_id assignment for gene duplications identified by LiftOff on the CHM13 reference genome.
Changelog
- Release 3.1.0 assigned gene_ids to gene duplications reported by LiftOff. An edge case resulted in multiple transcripts of the same duplication being assigned unique gene_ids. Those records were merged under the same gene_id this time.
Statement
Chess Release 3.1.2
Files
Filenames |
Genome |
Content |
Description |
chess3.1.2.GRCh38.gff.gz, chess3.1.2.GRCh38.gtf.gz, chess3.1.2.GRCh38.bb.gz |
GRCh38 |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.1.2.CHM13.gff.gz, chess3.1.2.CHM13.gtf.gz, chess3.1.2.CHM13.bb.gz |
CHM13 |
CHESS gene annotation on CHM13 |
This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.1.2.GRCh38.primary.gff.gz, chess3.1.2.GRCh38.primary.gtf.gz |
GRCh38 |
CHESS gene annotation excluding alternative scaffolds |
This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.1.2.GRCh38.protein.fa.gz |
GRCh38 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome. |
chess3.1.2.CHM13.protein.fa.gz |
CHM13 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome. |
chess3.1.2.mapfile.tsv |
- |
Cross-Reference |
This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz |
GRCh38 |
Assembled Transcripts |
Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
|
genes |
transcripts |
protein_coding |
19838 |
99201 |
lncRNA |
17624 |
34709 |
pseudogene |
16774 |
17263 |
other |
4269 |
7190 |
alt_scaffolds |
5250 |
10088 |
Release 3.1.1
This release addresses an inconsistencies and incompleteness of the rRNA annotation specific to the CHM13 version of the CHESS annotation.
Changelog
- rRNA annotation moved over from the official RefSeq CHM13 annotation, including all annotated copies
- New gene and transcript IDs were assigned consistent with the CHS nomenclature
- rRNA1 and rRNA2 genes on chrM assigned matching CHSID for GRCh38 and CHM13
Statement
Chess Release 3.1.1
Files
Filenames |
Genome |
Content |
Description |
chess3.1.1.GRCh38.gff.gz, chess3.1.1.GRCh38.gtf.gz, chess3.1.1.GRCh38.bb.gz |
GRCh38 |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.1.1.CHM13.gff.gz, chess3.1.1.CHM13.gtf.gz, chess3.1.1.CHM13.bb.gz |
CHM13 |
CHESS gene annotation on CHM13 |
This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.1.1.GRCh38.primary.gff.gz, chess3.1.1.GRCh38.primary.gtf.gz |
GRCh38 |
CHESS gene annotation excluding alternative scaffolds |
This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.1.1.GRCh38.protein.fa.gz |
GRCh38 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome. |
chess3.1.1.CHM13.protein.fa.gz |
CHM13 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome. |
chess3.1.1.mapfile.tsv |
- |
Cross-Reference |
This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz |
GRCh38 |
Assembled Transcripts |
Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
|
genes |
transcripts |
protein_coding |
19838 |
99201 |
lncRNA |
17624 |
34709 |
pseudogene |
16774 |
17263 |
other |
4269 |
7190 |
alt_scaffolds |
5250 |
10088 |
Release 3.1.0
This release addresses several major and minor inconsistencies in the formatting of the CHESS annotation.
Changelog
- Introducing gene features to the GFF3 files, full with RefSeq descriptors
- All valid and complete ORFs now include the stop codon in the CDS coordinates. Some transcripts have been extended up to 3 positions to include the missing stop codon
- Fixed duplicated gene IDs on the CHM13 version of the annotation. Gene copies identified by LiftOff are now assigned their own CHESS ID and the LiftOff metadata is stored in the auxiliary tags
- Protein sequences based on the CHM13 genome sequence are now also included
- Minor improvements to the comment lines
Statement
Chess Release 3.1.0
Files
Filenames |
Genome |
Content |
Description |
chess3.1.0.GRCh38.gff.gz, chess3.1.0.GRCh38.gtf.gz, chess3.1.0.GRCh38.bb.gz |
GRCh38 |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.1.0.CHM13.gff.gz, chess3.1.0.CHM13.gtf.gz, chess3.1.0.CHM13.bb.gz |
CHM13 |
CHESS gene annotation on CHM13 |
This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.1.0.GRCh38.primary.gff.gz, chess3.1.0.GRCh38.primary.gtf.gz |
GRCh38 |
CHESS gene annotation excluding alternative scaffolds |
This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.1.0.GRCh38.protein.fa.gz |
GRCh38 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome. |
chess3.1.0.CHM13.protein.fa.gz |
CHM13 |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome. |
chess3.1.0.mapfile.tsv |
- |
Cross-Reference |
This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz |
GRCh38 |
Assembled Transcripts |
Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
|
genes |
transcripts |
protein_coding |
19838 |
99201 |
lncRNA |
17624 |
34709 |
pseudogene |
16774 |
17263 |
other |
4269 |
7190 |
alt_scaffolds |
5250 |
10088 |
CHESS 3.0.1
First release of the upcoming new major version of the CHESS annotation.
Changelog
- Updated simplified source values
- Gene records in GTF/GFF
- BED14 files
- TPM and sample count attributes simplified and made consistent for all assembled transcripts
- Proper comment appended
- 24 Erroneous pseudogene transcripts removed
Statement
Chess Release 3.0.1
Files
Filenames |
Content |
Description |
chess3.0.1.gff.gz, chess3.0.1.gtf.gz, chess3.0.1.bb.gz |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.0.1.CHM13.gff.gz, chess3.0.1.CHM13.gtf.gz, chess3.0.1.CHM13.bb.gz |
CHESS gene annotation on CHM13 |
This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.0.1.primary.gff.gz, chess3.0.1.primary.gtf.gz |
CHESS gene annotation excluding alternative scaffolds |
This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.0.1.protein.fa.gz |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. |
chess3.0.1.mapfile.tsv |
Cross-Reference |
This tab-separated file contains a list of transcript identifiers in CHESS 3.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz |
Assembled Transcripts |
Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
|
genes |
transcripts |
protein_coding |
19838 |
99201 |
lncRNA |
17624 |
34709 |
pseudogene |
16774 |
17263 |
other |
4269 |
7190 |
alt_scaffolds |
5250 |
10088 |
CHESS 3.0
First release of the upcoming new major version of the CHESS annotation.
Changelog
- Re-assembly and improved analysis of the dataset based on new methods described in a new paper.
Statement
Chess Release 3.0
Files
Filenames |
Content |
Description |
chess3.0.gff.gz, chess3.0.gtf.gz |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess3.0.CHM13.gff.gz, chess3.0.CHM13.gtf.gz |
CHESS gene annotation on CHM13 |
This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome. |
chess3.0.primary.gff.gz, chess3.0.primary.gtf.gz |
CHESS gene annotation excluding alternative scaffolds |
This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12. |
chess3.0.protein.fa.gz |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. |
chess3.0.mapfile.tsv |
Cross-Reference |
This tab-separated file contains a list of transcript identifiers in CHESS 3.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) . |
assembled.gtf.gz |
Assembled Transcripts |
Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset. |
Summary
|
genes |
transcripts |
protein_coding |
19838 |
99201 |
lncRNA |
17624 |
34709 |
pseudogene |
16774 |
17263 |
other |
4281 |
7204 |
alt_scaffolds |
5258 |
10098 |
CHESS_v2.2
Changelog
- Fixed wrong parent ids detected for several refseq transcripts
- Added CDS annotations for novel isoforms in known protein-coding genes,which are start codon and intron-chain compatible with at least one known open reading frame from Gencode or Refseq
- Fixed CDS coordinate overlaps
- Re-added missing RefSeq CDS assignments
- Re-added missing attributes from Gencode and Refseq
Statement
Chess Release 2.2
Files
Filename |
Content |
Description |
chess2.2.gff.gz |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess2.2.genes |
CHESS gene list |
This file is a table showing all 42,611 genes in CHESS release 2.2, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene. |
chess2.2.protein.fa.gz |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided. |
chess2.2_assembly.gff.gz |
Gene annotation for transcriptome assembly |
This is a subset of the gene annotation GFF file (chess2.2.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks. |
chess2.2_and_refseq.gff.gz |
CHESS plus RefSeq gene annotations |
This is a superset of chess2.2.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS. |
Summary
|
genes |
transcripts |
protein_coding |
20352 |
266331 |
lncRNA |
18887 |
49892 |
other |
3372 |
7035 |
total |
42611 |
323258 |
novel_protein_coding |
224 |
317 |
novel_lncRNAs |
2671 |
3333 |
CHESS_v2.1
Statement
Chess Release 2.1
Files
Filename |
Content |
Description |
chess2.1.gff.gz |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess2.1.genes |
CHESS gene list |
This file is a table showing all 42,611 genes in CHESS release 2.1, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene. |
chess2.1.protein.fa.gz |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided. |
chess2.1_assembly.gff.gz |
Gene annotation for transcriptome assembly |
This is a subset of the gene annotation GFF file (chess2.1.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks. |
chess2.1_and_refseq.gff.gz |
CHESS plus RefSeq gene annotations |
This is a superset of chess2.1.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS. |
Summary
|
genes |
transcripts |
protein_coding |
20352 |
266331 |
lncRNA |
18887 |
49892 |
other |
3372 |
7035 |
total |
42611 |
323258 |
novel_protein_coding |
224 |
317 |
novel_lncRNAs |
2671 |
3333 |
CHESS_v2.0
Statement
Chess Release 2.0
Files
Filename |
Content |
Description |
chess2.0.gff.gz |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess2.0.genes |
CHESS gene list |
This file is a table showing all 43,162 genes in CHESS release 2.0, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene. |
chess2.0.protein.fa.gz |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided. |
chess2.0_assembly.gff.gz |
Gene annotation for transcriptome assembly |
This is a subset of the gene annotation GFF file (chess2.0.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks. |
chess2.0_and_refseq.gff.gz |
CHESS plus RefSeq gene annotations |
This is a superset of chess2.0.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS. |
Summary
|
genes |
transcripts |
protein_coding |
21306 |
267478 |
lncRNA |
18484 |
49314 |
other |
3372 |
7035 |
total |
43162 |
323827 |
novel_protein_coding |
1178 |
1446 |
novel_lncRNAs |
2268 |
2755 |
CHESS_v1.0
Statement
Initial release of CHESS human genome annotation
Files
Filename |
Content |
Description |
chess1.0.gff.gz |
CHESS gene annotation |
This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p7. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci. |
chess1.0.genes |
CHESS gene list |
This file is a table showing all 39,582 genes in CHESS release 1.0, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene. |
chess1.0.protein.fa.gz |
CHESS proteins |
This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided. |
chess1.0_assembly.gff.gz |
Gene annotation for transcriptome assembly |
This is a subset of the gene annotation GFF file (chess1.0.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks. |
chess1.0_and_refseq.gff.gz |
CHESS plus RefSeq gene annotations |
This is a superset of chess1.0.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS. |
Summary
|
genes |
transcripts |
protein_coding |
21635 |
304113 |
lncRNA |
15985 |
41396 |
other |
1962 |
6493 |
total |
39582 |
352002 |
novel_protein_coding |
1476 |
3543 |
novel_lncRNAs |
1276 |
2256 |