Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For the same sample, HLA-C was mistyped at 2M reads but correct at 1M and 3M reads #40

Closed
liu930724 opened this issue Nov 13, 2024 · 4 comments

Comments

@liu930724
Copy link

Hi, while analyzing another HLA reference sample(HLA-C*05:01:01), we observed tha the HLA-C typing results were incorrect at 2M reads (HLA-C*08:02:01). But when we switch to other data sizes of 1-3M, even if we only change it to 2.1M, all of the results are correct.
image

T1K v1.1.7-r225 was used and different numbers of reads were obtained through the --reads_to_process parameter of fastp.

running command:

run-t1k -1 21_1.trimmed.fq -2 21_2.trimmed.fq -t 30 --preset hla -f T1K_ref_dna_seq.fa --cov 30 -o HLA-1101-FA01_21

log of 2M reads:

[Wed Nov 13 13:36:52 2024] run-t1k v1.1.7-r225 begins.
[Wed Nov 13 13:36:52 2024] SYSTEM CALL: /r5/u/tianliu/2.pipeline/T1K-master/fastq-extractor -t 30 -f /mnt/data65/tianliu2/project/1.HLA/test/t1k/hlaidx/T1K_ref_dna_seq.fa -o /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_candidate  -1 /mnt/data65/tianliu2/project/1.HLA/t1k_241104/1.trimmed/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_1.trimmed.fq -2 /mnt/data65/tianliu2/project/1.HLA/t1k_241104/1.trimmed/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_2.trimmed.fq
[Wed Nov 13 13:36:54 2024] Start to extract candidate reads from read files.
[Wed Nov 13 13:38:51 2024] Finish extracting reads.
[Wed Nov 13 13:38:51 2024] SYSTEM CALL: /r5/u/tianliu/2.pipeline/T1K-master/genotyper  --cov 30 -s 0.97 -o /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21 -t 30 -f /mnt/data65/tianliu2/project/1.HLA/test/t1k/hlaidx/T1K_ref_dna_seq.fa -1 /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_candidate_1.fq -2 /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_candidate_2.fq
[Wed Nov 13 13:38:55 2024] Found 858361 read fragments. Start read assignment.
[Wed Nov 13 14:05:46 2024] Finish read end assignments.
[Wed Nov 13 14:06:21 2024] Finish read fragment assignments. 393416 read fragments can be assigned (average 563.99 alleles/read).
[Wed Nov 13 14:07:04 2024] Finish allele quantification in 102 EM iterations.
[Wed Nov 13 14:08:20 2024] Genotyping finishes.
[Wed Nov 13 14:08:23 2024] SYSTEM CALL: /r5/u/tianliu/2.pipeline/T1K-master/analyzer  -s 0.97 -o /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21 -t 30 -f /mnt/data65/tianliu2/project/1.HLA/test/t1k/hlaidx/T1K_ref_dna_seq.fa -a /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_allele.tsv -1 /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_aligned_1.fa -2 /mnt/data65/tianliu2/project/1.HLA/t1k_241104/5.t1k/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21/20241101170317_RA01230401003_5P240928009US292653DX_PE150-HLA-1101-FA01_21_aligned_2.fa
[Wed Nov 13 14:08:23 2024] Found 462282 read fragments. Start read assignment.
[Wed Nov 13 14:08:25 2024] Finish read end assignments.
[Wed Nov 13 14:08:25 2024] Finish read fragment assignments. 415149 read fragments can be assigned (average 1.62 alleles/read).
[Wed Nov 13 14:08:25 2024] Finish allele quantification in 4 EM iterations.
[Wed Nov 13 14:08:30 2024] Post analysis finishes.
[Wed Nov 13 14:08:31 2024] Finish.

It seems different from the last issue about HLA-C typing error, so I open a new issue. Thank you in advance for your time and help.

@mourisl
Copy link
Owner

mourisl commented Nov 13, 2024

Seems the abundances for the true alleles and the wrong alleles are not very high, so there could be some tricky issues. As before, could you please share the candidate reads, and I can look into it. Thank you!

Meanwhile, the current github v1.0.7-r225 also fixes an issue in the t1k-build regarding the exonization in an HLA-C allele. Could you please recreate the T1K's reference using t1k-build from the hla.dat file, and it may fix this issue.

@liu930724
Copy link
Author

After recreating the T1K reference, the result for HLA-C with 2M reads is correct now.
It's possible that this is the issue, we will test with more samples in the future. Thank you!

@mourisl
Copy link
Owner

mourisl commented Nov 14, 2024

Is the abundance estimation comparable between C01:02 and C05:01 in the new run? If they still differ a lot, I think there are still some hidden issues.

@liu930724
Copy link
Author

image
The abundances of C01:02 and C05:01 are different but consistent with the trend of different reads amounts. And the abundance of C05:01 is much higher than C08:02.

We amplified the full length of HLA genes, which may be the reason for the large difference in abundance of different alleles. With this in mind, this difference in abundance is acceptable. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants