-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seqnames in GRCh38 Graph (minigraph-cactus) to match gene annotation #28
Comments
attn: @jeizenga |
Are you able to share the GTF that you were using? Even the first few hundred lines would probably be sufficient. |
You can download it from this link (obtained from the gencode webpage): https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.chr_patch_hapl_scaff.annotation.gtf.gz |
Hello, I was wondering if you found a solution to this issue. I'm getting the same error and I tried multiple annotations, such as the Gencode one mentioned here, as well as annotations from ncbi and ucsc. In all cases the code crashes with the same error mentioned above |
Hello, |
Hi, apologies for the delay--my union has been on strike and I'm only just returning to work. TLDR you can prepend The GFA you're pointing to stores the reference genome as a particular "sample" alongside other samples that have identifiers like HG0xxxx. The combination of a sample+haplotype+contig is specified using the PanSN naming specification, which look something like this:
The first field is the sample identifier ( |
Hello, I was wondering if the version of annotation matters here? |
Different versions necessarily give different results, since they have different transcript sets. The contig naming requirements should be the same though. |
Hello, I am running
vg autoindex
to splice the minigraph-cactus full pangenome according to GENCODE v44 gene annotations in order to map RNA-seq reads. I have two questions:vg autoindex \ --workflow mpmap \ --prefix data/00_autoindex/splicedpangenome \ --gfa /gpfs/projects/bsc83/Data/assemblies/pangenome/minigraph_cactus/hprc-v1.1-mc-grch38.full.gfa \ --tx-gff /gpfs/projects/bsc83/Data/gene_annotations/gencode/v44/modified/gencode.v44.chr_patch_hapl_scaff.annotation_chr2GRCh38#chr.gtf \ --tmp-dir temporary \ --threads 112 \ --verbosity 2
Error:
Saving GBWT and GBWTGraph to temporary/vg-ikdYP8/dir-MgGI5j/d0cc1cf507d88bdebe898d1ba90127a241a83700.gbz [IndexRegistry]: Adding splice junctions to GBZ-format graph. ERROR: Chromosome path "chr1" not found in graph or haplotypes index (line 6).
When I first saw this I thought that it was the typical error where chromosomes are differently formatted (chr1 or 1) so I looked in the minigraph-cactus reference and found
SN:Z:GRCh38#chr1
so I changed the seqnames in the gene annotation from chr1 to GRCh38#chr1 but still I keep getting the same error. Which seqnames is this pangenome reference using?Thanks
The text was updated successfully, but these errors were encountered: