Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with codon #8

Open
pdoris opened this issue Jun 8, 2023 · 6 comments
Open

Problem with codon #8

pdoris opened this issue Jun 8, 2023 · 6 comments

Comments

@pdoris
Copy link

pdoris commented Jun 8, 2023

Much of the pipeline proceeds until seq is translating presumably detected Ig sequences. It repeatedly terminates with the "Codon 'GET' is invalid" error

pdoris$ python run_iterative_igdetective.py /Volumes/ms_imm_doris/Rat_references/BN-HiFi/Final_curated_BN-HiFi_assembly/BN_final.curated_primary.no_mt.unscrubbed.fa /Users/pdoris/IgDetective-1.1.0/BN
/usr/local/bin/minimap2
==== Aligning human IG genes...
Aligning IGLV genes (datafiles/human_reference_genes/IGLV.fa)...
Aligning IGLJ genes (datafiles/human_reference_genes/IGLJ.fa)...
Aligning IGKV genes (datafiles/human_reference_genes/IGKV.fa)...
Aligning IGHJ genes (datafiles/human_reference_genes/IGHJ.fa)...
Aligning IGHV genes (datafiles/human_reference_genes/IGHV.fa)...
Aligning IGKJ genes (datafiles/human_reference_genes/IGKJ.fa)...
Aligning IGHC genes (datafiles/human_reference_genes/IGHC.fa)...
Aligning IGKC genes (datafiles/human_reference_genes/IGKC.fa)...
Aligning IGLC genes (datafiles/human_reference_genes/IGLC.fa)...
==== Identifying IG contigs...
==== Running RSS-based IgDetective for IGH...
Contig: CHR_6, contig range: (137515617, 147156653), approx locus length: 9641036
Running: python py/IGDetective.py -i /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/combined_contigs_IGH.fasta -o /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/predicted_genes_IGH -m 1 -l IGH
==== Running RSS-based IgDetective for IGK...
Contig: CHR_4, contig range: (97247562, 104902484), approx locus length: 7654922
Running: python py/IGDetective.py -i /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/combined_contigs_IGK.fasta -o /Users/pdoris/IgDetective-1.1.0/BN/denovo_search/predicted_genes_IGK -m 1 -l IGK
==== Running RSS-based IgDetective for IGL...
==== Iterative processing IGHV genes...
Running minimap...
Alignment of IG genes datafiles/combined_reference_genes/IGHV.fa to /Volumes/ms_imm_doris/Rat_references/BN-HiFi/Final_curated_BN-HiFi_assembly/BN_final.curated_primary.no_mt.unscrubbed.fa
Processing SAM file...
/Users/pdoris/opt/anaconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py:2804: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
warnings.warn(
Traceback (most recent call last):
File "/Users/pdoris/IgDetective-1.1.0/run_iterative_igdetective.py", line 290, in
main(genome_fasta, output_dir, ig_gene_dir)
File "/Users/pdoris/IgDetective-1.1.0/run_iterative_igdetective.py", line 259, in main
AlignGenesIteratively(ref_gene_fasta, igdetective_tsv, genome_fasta, iter_dir, gene)
File "/Users/pdoris/IgDetective-1.1.0/run_iterative_igdetective.py", line 134, in AlignGenesIteratively
gene_finding_tools.main(genome_fasta, ref_gene_fasta, iter0_dir)
File "/Users/pdoris/IgDetective-1.1.0/py/extract_aligned_genes.py", line 147, in main
aa_seq = str(Seq(alignment.gene_seq).translate())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pdoris/opt/anaconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 1448, in translate
_translate_str(str(self), table, stop_symbol, to_stop, cds, gap=gap)
File "/Users/pdoris/opt/anaconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 2836, in _translate_str
raise CodonTable.TranslationError(
Bio.Data.CodonTable.TranslationError: Codon 'GET' is invalid
(igdetective) IMM-MAC-184391:IgDetective-1.1.0 pdoris$

@yana-safonova
Copy link
Member

Hi,

Thank you for reporting the issue! Have you been running IgDetective on Mac OS? I was able to reproduce the problem on my laptop. It looks like the "Alignment" class from BioPython package has a different structure as compared to what we expect from it.

It is either caused by differences between BioPython versions or the way it works on Linux and Mac OS systems.

I will work on unifying the versions, in the meantime please feel free to send me genomes you'd like to process.

Best regards,
Yana

@pdoris
Copy link
Author

pdoris commented Jun 9, 2023

Yes. this was on a Mac. A fix would be wonderful!

@pdoris
Copy link
Author

pdoris commented Jun 9, 2023

I installed it in Linux and got the same error:

(igdetective) c305-005.ls6(198)$ python run_iterative_igdetective.py /work/06127/pdoris/BN_HiFi_curated.fa /work/06127/pdoris/BN
/work/06127/pdoris/miniconda3/envs/igdetective/bin/minimap2
WARN: output directory /work/06127/pdoris/BN exists and will be overwritten!
==== Aligning human IG genes...
Aligning IGLC genes (datafiles/human_reference_genes/IGLC.fa)...
Aligning IGHC genes (datafiles/human_reference_genes/IGHC.fa)...
Aligning IGHJ genes (datafiles/human_reference_genes/IGHJ.fa)...
Aligning IGKJ genes (datafiles/human_reference_genes/IGKJ.fa)...
Aligning IGLJ genes (datafiles/human_reference_genes/IGLJ.fa)...
Aligning IGKC genes (datafiles/human_reference_genes/IGKC.fa)...
Aligning IGLV genes (datafiles/human_reference_genes/IGLV.fa)...
Aligning IGKV genes (datafiles/human_reference_genes/IGKV.fa)...
Aligning IGHV genes (datafiles/human_reference_genes/IGHV.fa)...
==== Identifying IG contigs...
==== Running RSS-based IgDetective for IGH...
Contig: CHR_6, contig range: (137515617, 147156653), approx locus length: 9641036
Running: python py/IGDetective.py -i /work/06127/pdoris/BN/denovo_search/combined_contigs_IGH.fasta -o /work/06127/pdoris/BN/denovo_search/predicted_genes_IGH -m 1 -l IGH
==== Running RSS-based IgDetective for IGK...
Contig: CHR_4, contig range: (97247562, 104902484), approx locus length: 7654922
Running: python py/IGDetective.py -i /work/06127/pdoris/BN/denovo_search/combined_contigs_IGK.fasta -o /work/06127/pdoris/BN/denovo_search/predicted_genes_IGK -m 1 -l IGK
==== Running RSS-based IgDetective for IGL...
==== Iterative processing IGHV genes...
Running minimap...
Alignment of IG genes datafiles/combined_reference_genes/IGHV.fa to /work/06127/pdoris/BN_HiFi_curated.fa
Processing SAM file...
/work/06127/pdoris/miniconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py:2804: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
warnings.warn(
Traceback (most recent call last):
File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/run_iterative_igdetective.py", line 300, in
main(genome_fasta, output_dir, ig_gene_dir)
File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/run_iterative_igdetective.py", line 269, in main
AlignGenesIteratively(ref_gene_fasta, igdetective_tsv, genome_fasta, iter_dir, gene)
File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/run_iterative_igdetective.py", line 144, in AlignGenesIteratively
gene_finding_tools.main(genome_fasta, ref_gene_fasta, iter0_dir)
File "/work/06127/pdoris/miniconda3/envs/igdetective/IgDetective-main/py/extract_aligned_genes.py", line 149, in main
aa_seq = str(Seq(alignment.gene_seq).translate())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/06127/pdoris/miniconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 1448, in translate
_translate_str(str(self), table, stop_symbol, to_stop, cds, gap=gap)
File "/work/06127/pdoris/miniconda3/envs/igdetective/lib/python3.11/site-packages/Bio/Seq.py", line 2836, in _translate_str
raise CodonTable.TranslationError(
Bio.Data.CodonTable.TranslationError: Codon 'GET' is invalid

@yana-safonova
Copy link
Member

Thank you for checking how it works on linux! I am now confident that this issue can be explained by the differences between Biopython versions. It looks like we used an older version than you do. I will add a fix next week.

@pdoris
Copy link
Author

pdoris commented Jun 27, 2023

Hi Yana....

any progress with the fix?

Peter

@StefanLelieveld
Copy link

Hi Yana and Peter,

Using BioPython v1.81 I encountered the same error as you described here, Peter. As Yana mentioned, the issue seems to be related to the version of BioPython and how the data is being processed. Using an older version of BioPython solved it: I created a python venv where I installed BioPython version 1.77. That resolved this issue for me.

In more detail: Debugging using BioPython v1 .81 shows that the Alignment.gene_seq Stings start with the word "TARGET" leading to the error that "GET" is an invalid codon.

Screenshot 2024-03-22 at 14 57 36

Stefan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants