-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
homopolish modpolish Index error #47
Comments
Hi @giriarteS, We haven't tested modpolish on fungi yet. Have you ever successfully run the original |
I run homopolish polish but I got the following for each contig: [2022/05/25 17:24] INFO: Stage: Select closely-related genomes |
I successfully run homopolish polish: |
We found the error was due to an insufficient number of related genomes (<5), a bug in modpolish but not original homopolish. It looks to us there are six in GenBank but only one in RefSeq. Therefore, as you have done, the local db option |
I polished my genomes with several rounds of racon, then medaka, then homopolish. If I want to use illumina reads to polish with Pilon, should homopolish be run before or after pilon? |
It was trained on Medaka-polished genomes. We would recommend running homopolish right after Medaka and then followed by pilon. |
Can modpolish be used after racon+medaka? I am trying to polish my assembly (flye+6 rounds of racon+medaka) with modpolish but I get the following error:
"python3 /mnt/DATA/git_repos/homopolish/homopolish.py modpolish -a medaka/consensus.fasta -q /mnt/Backup/Genomes_test/raw_dna/minion/F.verticillioides/MRC826E/MRC826E_trim_fastq.gz -s /mnt/DATA/git_repos/homopolish/fungi.msh -o homopolish -t 50
Downloaded GCA_003316995.2_ASM331699v2_genomic.fna.gz
Downloaded GCA_003316975.2_ASM331697v2_genomic.fna.gz
Downloaded GCA_011035275.1_ASM1103527v1_genomic.fna.gz
Downloaded GCA_011036225.1_ASM1103622v1_genomic.fna.gz
Downloaded GCA_000149555.1_ASM14955v1_genomic.fna.gz
Downloaded GCF_000149555.1_ASM14955v1_genomic.fna.gz
Downloaded GCA_003317015.2_ASM331701v2_genomic.fna.gz
Downloaded GCA_011036015.1_ASM1103601v1_genomic.fna.gz
Downloaded GCA_011036905.1_ASM1103690v1_genomic.fna.gz
Reference = [/mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_003316995.2_ASM331699v2_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_003316975.2_ASM331697v2_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_011035275.1_ASM1103527v1_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_011036225.1_ASM1103622v1_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_000149555.1_ASM14955v1_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCF_000149555.1_ASM14955v1_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_003317015.2_ASM331701v2_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_011036015.1_ASM1103601v1_genomic.fna.gz.fna.gz, /mnt/Backup/Genomes_test/assembly/flye/F.verticillioides/MRC826E/homopolish/debug/homologous_sequences/GCA_011036905.1_ASM1103690v1_genomic.fna.gz.fna.gz]
Query = [medaka/consensus.fasta]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = homopolish/debug/ANI.txt
INFO [thread 0], skch::main, Count of threads executing parallel_for : 1
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 30851510
INFO [thread 0], skch::Sketch::index, unique minimizers = 4853959
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 1098191) ... (7088, 1)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 26.7354 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1 : 75.2743 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.0381457 sec
INFO [thread 0], skch::main, ready to exit the loop
INFO, skch::main, parallel_for execution finished
load Homo
Reference = [homopolish/debug/All_homologous_sequences.fna.gz]
Query = [medaka/consensus.fasta]
Kmer size = 16
Fragment length = 3000
Threads = 1
ANI output file = homopolish/debug/truth_ANI.txt
INFO [thread 0], skch::main, Count of threads executing parallel_for : 1
INFO [thread 0], skch::Sketch::build, window size for minimizer sampling = 24
INFO [thread 0], skch::Sketch::build, minimizers picked from reference = 30851510
INFO [thread 0], skch::Sketch::index, unique minimizers = 4853959
INFO [thread 0], skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 1098191) ... (7088, 1)
INFO [thread 0], skch::Sketch::computeFreqHist, consider all minimizers during lookup.
INFO [thread 0], skch::main, Time spent sketching the reference : 26.885 sec
INFO [thread 0], skch::main, Time spent mapping fragments in query #1 : 75.9234 sec
INFO [thread 0], skch::main, Time spent post mapping : 0.0205193 sec
INFO [thread 0], skch::main, ready to exit the loop
INFO, skch::main, parallel_for execution finished
[M::mm_idx_gen::0.8991.00] collected minimizers
[M::mm_idx_gen::1.0612.05] sorted minimizers
[M::main::1.0612.05] loaded/built the index for 83 target sequence(s)
[M::mm_mapopt_update::1.1341.98] mid_occ = 100
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 83
[M::mm_idx_stat::1.1761.94] distinct minimizers: 4248851 (99.24% are singletons); average occurrences: 1.018; average spacing: 9.991
[M::worker_pipeline::19.96514.14] mapped 49912 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -cx asm5 --cs=long -t 50 medaka/consensus.fasta homopolish/debug/All_homologous_sequences.fna.gz
[M::main] Real time: 20.085 sec; CPU: 282.508 sec; Peak RSS: 4.439 GB
127.97561192512512
Traceback (most recent call last):
File "/mnt/DATA/git_repos/homopolish/homopolish.py", line 102, in
main()
File "/mnt/DATA/git_repos/homopolish/homopolish.py", line 77, in main
getPos(fixData,FLAGS.debug)
File "/mnt/DATA/git_repos/homopolish/modules/mod_polish.py", line 80, in getPos
H_misAry,H_AllAry = getPileUpAry(fixData,fixData.output_dir+"debug",fixData.output_dir+"debug/All_homologous_sequences.fna.gz")
File "/mnt/DATA/git_repos/homopolish/modules/mod_polish.py", line 146, in getPileUpAry
misAry,posData_ary = MismatchPileup(fixData.output_dir+"debug/truth.paf",len(fixData.seq))#get SNP position ATCG array
File "/mnt/DATA/git_repos/homopolish/modules/mod_polish.py", line 440, in MismatchPileup
coverage[start_pos] += 1
IndexError: index 2852431 is out of bounds for axis 0 with size 2852431"
Gloria
The text was updated successfully, but these errors were encountered: