Skip to content

Releases: chess-genome/chess

Release 3.1.3

31 Jul 16:02
Compare
Choose a tag to compare

This release introduces transcriptome assembly and quantification oriented versions of the annotation, removing alternative scaffolds as well as duplicated transcripts. CHM13 version of the annotation has tRNAs copied over from RefSeq.

Changelog

  • Duplicated transcripts removed from .assembly.* files. Representative transcripts were chosen if 1. CDS matches MANE; 2. CDS is closest to MANE; 3. CDS maximizes Tukey's Median of the ILPIs of CDSs at each locus; 4. random choice
  • tRNAs copied over to CHM13 version of CHESS from RefSeq

Statement

Chess Release 3.1.3

Files

Filenames Genome Content Description
chess3.1.3.GRCh38.gff.gz, chess3.1.3.GRCh38.gtf.gz, chess3.1.3.GRCh38.bb.gz GRCh38 CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.1.3.CHM13.gff.gz, chess3.1.3.CHM13.gtf.gz, chess3.1.3.CHM13.bb.gz CHM13 CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.1.3.GRCh38.primary.gff.gz, chess3.1.3.GRCh38.primary.gtf.gz, chess3.1.3.GRCh38.primary.bb.gz GRCh38 CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.1.3.GRCh38.assembly.gff.gz, chess3.1.3.GRCh38.assembly.gtf.gz, chess3.1.3.GRCh38.assembly.bb.gz GRCh38 CHESS gene annotation excluding alternative scaffolds and duplicate transcripts This file contains the assembly gene set described in the CHESS paper but excludes annotations of any alternative scaffolds and retains a single copy of each transcript duplicate. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.1.3.CHM13.assembly.gff.gz, chess3.1.3.CHM13.assembly.gtf.gz, chess3.1.3.CHM13.assembly.bb.gz CHM13 CHESS gene annotation excluding alternative scaffolds and duplicate transcripts This file contains the assembly gene set described in the CHESS paper but only retains a single copy of each transcript duplicate. All genes and transcripts are mapped onto the CHM13 human reference genome.
chess3.1.3.GRCh38.protein.fa.gz GRCh38 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome.
chess3.1.3.CHM13.protein.fa.gz CHM13 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome.
chess3.1.3.mapfile.tsv - Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz GRCh38 Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4269 7190
alt_scaffolds 5250 10088

Release 3.1.2

13 Jul 03:57
Compare
Choose a tag to compare

This release addresses invalid new gene_id assignment for gene duplications identified by LiftOff on the CHM13 reference genome.

Changelog

  1. Release 3.1.0 assigned gene_ids to gene duplications reported by LiftOff. An edge case resulted in multiple transcripts of the same duplication being assigned unique gene_ids. Those records were merged under the same gene_id this time.

Statement

Chess Release 3.1.2

Files

Filenames Genome Content Description
chess3.1.2.GRCh38.gff.gz, chess3.1.2.GRCh38.gtf.gz, chess3.1.2.GRCh38.bb.gz GRCh38 CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.1.2.CHM13.gff.gz, chess3.1.2.CHM13.gtf.gz, chess3.1.2.CHM13.bb.gz CHM13 CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.1.2.GRCh38.primary.gff.gz, chess3.1.2.GRCh38.primary.gtf.gz GRCh38 CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.1.2.GRCh38.protein.fa.gz GRCh38 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome.
chess3.1.2.CHM13.protein.fa.gz CHM13 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome.
chess3.1.2.mapfile.tsv - Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz GRCh38 Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4269 7190
alt_scaffolds 5250 10088

Release 3.1.1

24 Jun 21:02
Compare
Choose a tag to compare

This release addresses an inconsistencies and incompleteness of the rRNA annotation specific to the CHM13 version of the CHESS annotation.

Changelog

  1. rRNA annotation moved over from the official RefSeq CHM13 annotation, including all annotated copies
  2. New gene and transcript IDs were assigned consistent with the CHS nomenclature
  3. rRNA1 and rRNA2 genes on chrM assigned matching CHSID for GRCh38 and CHM13

Statement

Chess Release 3.1.1

Files

Filenames Genome Content Description
chess3.1.1.GRCh38.gff.gz, chess3.1.1.GRCh38.gtf.gz, chess3.1.1.GRCh38.bb.gz GRCh38 CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.1.1.CHM13.gff.gz, chess3.1.1.CHM13.gtf.gz, chess3.1.1.CHM13.bb.gz CHM13 CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.1.1.GRCh38.primary.gff.gz, chess3.1.1.GRCh38.primary.gtf.gz GRCh38 CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.1.1.GRCh38.protein.fa.gz GRCh38 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome.
chess3.1.1.CHM13.protein.fa.gz CHM13 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome.
chess3.1.1.mapfile.tsv - Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz GRCh38 Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4269 7190
alt_scaffolds 5250 10088

Release 3.1.0

20 Jun 06:11
Compare
Choose a tag to compare

This release addresses several major and minor inconsistencies in the formatting of the CHESS annotation.

Changelog

  1. Introducing gene features to the GFF3 files, full with RefSeq descriptors
  2. All valid and complete ORFs now include the stop codon in the CDS coordinates. Some transcripts have been extended up to 3 positions to include the missing stop codon
  3. Fixed duplicated gene IDs on the CHM13 version of the annotation. Gene copies identified by LiftOff are now assigned their own CHESS ID and the LiftOff metadata is stored in the auxiliary tags
  4. Protein sequences based on the CHM13 genome sequence are now also included
  5. Minor improvements to the comment lines

Statement

Chess Release 3.1.0

Files

Filenames Genome Content Description
chess3.1.0.GRCh38.gff.gz, chess3.1.0.GRCh38.gtf.gz, chess3.1.0.GRCh38.bb.gz GRCh38 CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.1.0.CHM13.gff.gz, chess3.1.0.CHM13.gtf.gz, chess3.1.0.CHM13.bb.gz CHM13 CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.1.0.GRCh38.primary.gff.gz, chess3.1.0.GRCh38.primary.gtf.gz GRCh38 CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.1.0.GRCh38.protein.fa.gz GRCh38 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome.
chess3.1.0.CHM13.protein.fa.gz CHM13 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome.
chess3.1.0.mapfile.tsv - Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz GRCh38 Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4269 7190
alt_scaffolds 5250 10088

CHESS 3.0.1

26 Apr 22:21
Compare
Choose a tag to compare

First release of the upcoming new major version of the CHESS annotation.

Changelog

  1. Updated simplified source values
  2. Gene records in GTF/GFF
  3. BED14 files
  4. TPM and sample count attributes simplified and made consistent for all assembled transcripts
  5. Proper comment appended
  6. 24 Erroneous pseudogene transcripts removed

Statement

Chess Release 3.0.1

Files

Filenames Content Description
chess3.0.1.gff.gz, chess3.0.1.gtf.gz, chess3.0.1.bb.gz CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.0.1.CHM13.gff.gz, chess3.0.1.CHM13.gtf.gz, chess3.0.1.CHM13.bb.gz CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.0.1.primary.gff.gz, chess3.0.1.primary.gtf.gz CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.0.1.protein.fa.gz CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes.
chess3.0.1.mapfile.tsv Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4269 7190
alt_scaffolds 5250 10088

CHESS 3.0

05 Oct 15:34
Compare
Choose a tag to compare

First release of the upcoming new major version of the CHESS annotation.

Changelog

  1. Re-assembly and improved analysis of the dataset based on new methods described in a new paper.

Statement

Chess Release 3.0

Files

Filenames Content Description
chess3.0.gff.gz, chess3.0.gtf.gz CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.0.CHM13.gff.gz, chess3.0.CHM13.gtf.gz CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.0.primary.gff.gz, chess3.0.primary.gtf.gz CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.0.protein.fa.gz CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes.
chess3.0.mapfile.tsv Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4281 7204
alt_scaffolds 5258 10098

CHESS_v2.2

19 Apr 20:17
Compare
Choose a tag to compare

Changelog

  1. Fixed wrong parent ids detected for several refseq transcripts
  2. Added CDS annotations for novel isoforms in known protein-coding genes,which are start codon and intron-chain compatible with at least one known open reading frame from Gencode or Refseq
  3. Fixed CDS coordinate overlaps
  4. Re-added missing RefSeq CDS assignments
  5. Re-added missing attributes from Gencode and Refseq

Statement

Chess Release 2.2

Files

Filename Content Description
chess2.2.gff.gz CHESS gene annotation This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess2.2.genes CHESS gene list This file is a table showing all 42,611 genes in CHESS release 2.2, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene.
chess2.2.protein.fa.gz CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided.
chess2.2_assembly.gff.gz Gene annotation for transcriptome assembly This is a subset of the gene annotation GFF file (chess2.2.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks.
chess2.2_and_refseq.gff.gz CHESS plus RefSeq gene annotations This is a superset of chess2.2.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS.

Summary

genes transcripts
protein_coding 20352 266331
lncRNA 18887 49892
other 3372 7035
total 42611 323258
novel_protein_coding 224 317
novel_lncRNAs 2671 3333

CHESS_v2.1

06 Sep 22:47
aa75576
Compare
Choose a tag to compare

Statement

Chess Release 2.1

Files

Filename Content Description
chess2.1.gff.gz CHESS gene annotation This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess2.1.genes CHESS gene list This file is a table showing all 42,611 genes in CHESS release 2.1, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene.
chess2.1.protein.fa.gz CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided.
chess2.1_assembly.gff.gz Gene annotation for transcriptome assembly This is a subset of the gene annotation GFF file (chess2.1.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks.
chess2.1_and_refseq.gff.gz CHESS plus RefSeq gene annotations This is a superset of chess2.1.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS.

Summary

genes transcripts
protein_coding 20352 266331
lncRNA 18887 49892
other 3372 7035
total 42611 323258
novel_protein_coding 224 317
novel_lncRNAs 2671 3333

CHESS_v2.0

20 Apr 17:22
Compare
Choose a tag to compare

Statement

Chess Release 2.0

Files

Filename Content Description
chess2.0.gff.gz CHESS gene annotation This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess2.0.genes CHESS gene list This file is a table showing all 43,162 genes in CHESS release 2.0, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene.
chess2.0.protein.fa.gz CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided.
chess2.0_assembly.gff.gz Gene annotation for transcriptome assembly This is a subset of the gene annotation GFF file (chess2.0.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks.
chess2.0_and_refseq.gff.gz CHESS plus RefSeq gene annotations This is a superset of chess2.0.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS.

Summary

genes transcripts
protein_coding 21306 267478
lncRNA 18484 49314
other 3372 7035
total 43162 323827
novel_protein_coding 1178 1446
novel_lncRNAs 2268 2755

CHESS_v1.0

12 Apr 00:12
Compare
Choose a tag to compare

Statement

Initial release of CHESS human genome annotation

Files

Filename Content Description
chess1.0.gff.gz CHESS gene annotation This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p7. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess1.0.genes CHESS gene list This file is a table showing all 39,582 genes in CHESS release 1.0, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene.
chess1.0.protein.fa.gz CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided.
chess1.0_assembly.gff.gz Gene annotation for transcriptome assembly This is a subset of the gene annotation GFF file (chess1.0.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks.
chess1.0_and_refseq.gff.gz CHESS plus RefSeq gene annotations This is a superset of chess1.0.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS.

Summary

genes transcripts
protein_coding 21635 304113
lncRNA 15985 41396
other 1962 6493
total 39582 352002
novel_protein_coding 1476 3543
novel_lncRNAs 1276 2256