Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddGeneCoord.pl fails to populate HLA-HFE coordinates #30

Closed
dmiller15 opened this issue Jun 6, 2024 · 3 comments
Closed

AddGeneCoord.pl fails to populate HLA-HFE coordinates #30

dmiller15 opened this issue Jun 6, 2024 · 3 comments

Comments

@dmiller15
Copy link

I've been testing out the software, and I noticed a discrepancy in HLA-HFE between using a BAM and FASTQ input. Where the FASTQ input would report high abundance and quality, the BAM input would report nothing. I looked through all the read assignments from the FASTQ results, and each one maps to the HFE gene on chr6: https://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000010704;r=6:26087226-26098343.

In your manuscript HLA files (https://github.com/mourisl/T1K_manuscript_evaluation/blob/master/hlaidx_3_44_0.tar.gz) as well as ones I made following the documentation directions, HLA-HFE receives no mapping in the coordinate files:

>HLA-HFE*001:01:01 chr19 -1 -1 +
>HLA-HFE*001:01:02 chr19 -1 -1 +
>HLA-HFE*001:01:03 chr19 -1 -1 +

The reason for this lack of mapping is that the GENCODE and Ensembl GTFs just refer to this gene as HFE. AddGeneCoord.pl was looks for exactly HLA-HFE, comes up with no matches, and leaves the unmapped default.

When I manually alter the coordinates file to have the HLA-HFE mapping match the Ensembl HFE coordinates, the BAM and FASTQ runs agree on the abundance/quality.

To summarize:

  • When running with FASTQs, reads that were mapped to the HFE gene are assigned to HLA-HFE
  • Neither GENCODE nor Ensembl GTFs contain reference to a gene named HLA-HFE
  • Coordinate files do not map HLA-HFE to the HFE gene region
  • The mapped reads in the HFE gene region are not pulled during BAM extraction
  • BAM processing assigns no alleles/abundance/quality for HLA-HFE

I don't think this is an issue for any other HLA contig. The only others that remain unmapped are HLA-DRB3, HLA-DRB4, and HLA-Y. As far as I can tell none of these has a corresponding mapping in the genome.

@mourisl
Copy link
Owner

mourisl commented Jun 6, 2024

Thank you very much for finding this problem. I'll check whether there are annotations containing HLA-H/F/E, or create our own list of gene coordinates and use that as the input for AddGeneCoord.pl.

@mourisl
Copy link
Owner

mourisl commented Jun 10, 2024

Thank you for finding this issue. I just added an option "--gtf-gene-name-mapping" to the AddGeneCoord.pl script. Its default value is "HFE:HLA-HFE" and we can use comma-split string to represent other gene name mappings. This will internally map the gene name in the GTF to the name specified by the user. Hope this can help resolve this issue.

@dmiller15
Copy link
Author

Thanks for the quick response. I am now seeing coordinates for HLA-HFE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants