You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug gimme scan appears to be incompatible with custom PWMs that have spaces in the motif description. Note that JASPAR itself returns files that are formatted like this, for example: https://jaspar.elixir.no/api/v1/matrix/MA0002.1.jaspar . My motif database has those spaces, too.
To Reproduce
See error logs.
Expected behavior
gimme should support PWM files with spaces in the motif description (or give more helpful error messages).
Error logs
I reproduced the behaviour for two input files:
$ gimme scan in/IFIH1.fasta -g ../reference_dbs/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa -p ../5/in2/consensus_pwms_stripped.jaspar
# GimmeMotifs version 0.18.0
# Input: in/IFIH1.fasta
# Motifs: ../5/in2/consensus_pwms_stripped.jaspar
# FPR: 0.01 (../reference_dbs/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa)
# Scoring: logodds score
Scanning: 0%| | 0/1 [00:00<?, ? sequences/s]
Traceback (most recent call last):
File "/home/mabe/.conda/envs/memegimme/bin/gimme", line 12, in <module>
cli(sys.argv[1:])
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/cli.py", line 755, in cli
args.func(args)
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/commands/pfmscan.py", line 20, in pfmscan
scan_to_file(
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 402, in scan_to_file
for line in command_scan(
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 287, in command_scan
for row in it:
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 224, in scan_normal
for i, result in enumerate(result_it):
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/base.py", line 573, in scan
for result in it:
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/base.py", line 637, in _scan_sequences
motifs = [(m, thresholds[m.id]) for m in read_motifs(self.motifs)]
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/base.py", line 637, in <listcomp>
motifs = [(m, thresholds[m.id]) for m in read_motifs(self.motifs)]
KeyError: 'AC0081:NFIA_NFIC:SMAD AC0081:NFIA/NFIC:SMAD'
Scanning: 0%|
$ gimme scan in/IFIH1.fasta -g ../reference_dbs/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa -p ../5/in2/consensus_pwms.jaspar
2024-03-20 15:57:51,945 - WARNING - multiple motifs with same id: AC0001:GATA_PROP:GATA AC0001:GATA/PROP:GATA
<....SNIP....>
2024-03-20 15:57:53,062 - WARNING - multiple motifs with same id: AC0637:AHR:bHLH AC0637:AHR:bHLH
# GimmeMotifs version 0.18.0
# Input: in/IFIH1.fasta
# Motifs: ../5/in2/consensus_pwms.jaspar
# FPR: 0.01 (../reference_dbs/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa)
# Scoring: logodds score
2024-03-20 15:57:53,302 - WARNING - multiple motifs with same id: AC0001:GATA_PROP:GATA AC0001:GATA/PROP:GATA
2024-03-20 15:57:53,304 - WARNING - multiple motifs with same id: AC0002:PROP_ALX:Homeodomain AC0002:PROP/ALX:Homeodomain
2024-03-20 15:57:53,306 - WARNING - multiple motifs with same id: AC0003:HNF1A_HNF1B:Homeodomain AC0003:HNF1A/HNF1B:Homeodomain
2024-03-20 15:57:53,308 - WARNING - multiple motifs with same id: AC0004:ZSCAN:C2H2_ZF AC0004:ZSCAN:C2H2_ZF
2024-03-20 15:57:53,310 - WARNING - multiple motifs with same id: AC0005:POU3F_POU1F:Homeodomain,POU AC0005:POU3F/POU1F:Homeodomain,POU
2024-03-20 15:57:53,311 - WARNING - multiple motifs with same id: AC0006:MEOX:Homeodomain AC0006:MEOX:Homeodomain
2024-03-20 15:57:53,313 - WARNING - multiple motifs with same id: AC0007:BARX_NKX:Homeodomain AC0007:BARX/NKX:Homeodomain
2024-03-20 15:57:53,315 - WARNING - multiple motifs with same id: AC0008:VENTX:Homeodomain AC0008:VENTX:Homeodomain
2024-03-20 15:57:53,316 - WARNING - multiple motifs with same id: AC0009:PAX_VSX:Homeodomain AC0009:PAX/VSX:Homeodomain
<....SNIP....>
2024-03-20 15:57:59,796 - WARNING - multiple motifs with same id: AC0637:AHR:bHLH AC0637:AHR:bHLH
Traceback (most recent call last):
File "/home/mabe/.conda/envs/memegimme/bin/gimme", line 12, in <module>
cli(sys.argv[1:])
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/cli.py", line 755, in cli
args.func(args)
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/commands/pfmscan.py", line 20, in pfmscan
scan_to_file(
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 402, in scan_to_file
for line in command_scan(
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 287, in command_scan
for row in it:
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/__init__.py", line 224, in scan_normal
for i, result in enumerate(result_it):
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/base.py", line 573, in scan
for result in it:
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/base.py", line 636, in _scan_sequences
thresholds = self.get_gc_thresholds(seqs, zscore=zscore)
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/gimmemotifs/scanner/base.py", line 605, in get_gc_thresholds
maxt = pd.Series([m.max_score for m in motifs], index=_threshold.columns)
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/pandas/core/series.py", line 461, in __init__
com.require_length_match(data, index)
File "/home/mabe/.conda/envs/memegimme/lib/python3.10/site-packages/pandas/core/common.py", line 571, in require_length_match
raise ValueError(
ValueError: Length of values (1274) does not match length of index (1273)
The pwm files are attached: consensus_pwms.zip . Without specifying my own matrices, it works.
Installation information (please complete the following information):
OS: [Ubuntu 22.04.4 LTS]
Installation [conda]
Version [0.18.0]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
gimme scan
appears to be incompatible with custom PWMs that have spaces in the motif description. Note that JASPAR itself returns files that are formatted like this, for example: https://jaspar.elixir.no/api/v1/matrix/MA0002.1.jaspar . My motif database has those spaces, too.To Reproduce
See error logs.
Expected behavior
gimme should support PWM files with spaces in the motif description (or give more helpful error messages).
Error logs
I reproduced the behaviour for two input files:
The pwm files are attached: consensus_pwms.zip . Without specifying my own matrices, it works.
Installation information (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: