Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bedtools getfasta error #89

Open
C-YONG opened this issue Jul 24, 2024 · 17 comments
Open

bedtools getfasta error #89

C-YONG opened this issue Jul 24, 2024 · 17 comments

Comments

@C-YONG
Copy link

C-YONG commented Jul 24, 2024

Hello, software developer. I have three ONT datasets in total. The first two ran normally, but the third one encountered the following error. What could be the reason for this?

[24/07/2024 15:27:11] - NanoVar started
[24/07/2024 15:27:11] - Checking integrity of input files - Pass
[24/07/2024 15:27:12] - Analyzing read alignments and detecting SVs - Done
[24/07/2024 15:52:39] - Clustering SV breakends - Done
[24/07/2024 16:08:29] - Correcting DUP and detecting TE - Done
[24/07/2024 16:13:00] - Generating VCF files and report - Traceback (most recent call last):
File "/home/chiyong/miniconda3/envs/nanovar/bin/nanovar", line 635, in
main()
File "/home/chiyong/miniconda3/envs/nanovar/bin/nanovar", line 486, in main
run.vcf_report() #
^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/nanovar/nv_characterize.py", line 205, in vcf_report
alt_seq = get_alt_seq(self.dir, self.out_nn, self.refpath)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/nanovar/nv_alt_seq.py", line 46, in get_alt_seq
fasta = bed.sequence(fi=ref_path, nameOnly=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/pybedtools/bedtool.py", line 907, in decorated
result = method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/pybedtools/bedtool.py", line 388, in wrapped
stream = call_bedtools(
^^^^^^^^^^^^^^
File "/home/chiyong/miniconda3/envs/nanovar/lib/python3.11/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

    bedtools getfasta -nameOnly -fo /tmp/pybedtools.fvq2ujze.tmp -fi 03_genome/genome_NV/genome1.3.fna -bed /tmp/pybedtools.ml_5c4kf.tmp

Error message was:
Feature (LR584250.1:14620732-14620733) beyond the length of LR584250.1 size (12913240 bp). Skipping.

Writing to /tmp/bcftools.VhKjGh

@LaurieLecomte
Copy link

Hi, I encountered the same issue using version 1.7.0 with mapped PacBio ccs reads. I was able to run a previous version of NanoVar (1.3.8) on the same data without any problem.

I use the following command, which throws the 'feature beyond the length' message at the VCF generation step:
nanovar $BAM $GENOME_NV $CALLS_DIR/nanovar/$SAMPLE -x pacbio-ccs -t $CPU -f $EXCL_BED --debug

The last files outputted are cluster.tsv and parse2.tsv (empty).

Installation was done with conda as described in the README.

Thank you for any guidance or advice.

@cytham
Copy link
Owner

cytham commented Jul 26, 2024

Hi @C-YONG @LaurieLecomte, thanks for reporting this. I'll need some time to check on this and will revert in the next few days. Thank you for your patience.

@cytham
Copy link
Owner

cytham commented Jul 27, 2024

Hi @C-YONG and @LaurieLecomte, please try the v1.7.1-dev branch and let me know if it fixes the issue.

You can clone the branch by:
git clone -b v1.7.1-dev https://github.com/cytham/nanovar.git

@LaurieLecomte
Copy link

Hi @cytham, thank you very much for the quick response and the update.

Unfortunately, I got the exact same error and outputs using the v1.7.1-dev version.

I initially suspected that the bug might come from Inter-Ins(2) and InterTx, but after investigating a bit on my end, it looks like some calls of Inv(2) and Intra-Ins(2) also give the same error.

I made a custom version of the make_bed() function from v1.7.0 to output an intermediary bed file featuring SV type and the original id~chrom:start-end string from parse1.tsv, and below I pasted a few of the lines representing sites that threw the beyond chromosome length error. For Intra-Ins(2) and Inv(2) sites, in the id~chrom:start-end strings, the start position is greater than the end position:

CM055694.1      56110901        56110902        EC435   Inter-Ins(2)    EC435~CM055694.1:56110902~CM055721.1:17888839
CM055703.1      67666256        67666257        B895F   Intra-Ins(2)    B895F~CM055703.1:67666257-49693072
CM055703.1      67666256        67666257        B895F   Intra-Ins(2)    B895F~CM055703.1:67666257-49693072
CM055703.1      67666256        67666257        D6D6D   Intra-Ins(2)    D6D6D~CM055703.1:67666257-49693072
CM055703.1      67666256        67666257        D6D6D   Intra-Ins(2)    D6D6D~CM055703.1:67666257-49693072
CM055704.1      52471483        52471484        E5B58   Inv(2)  E5B58~CM055704.1:52471484-3246722
CM055705.1      46612017        46612018        D22C1   InterTx D22C1~CM055705.1:46612018~CM055722.1:15467696
CM055707.1      43435500        43435501        E9F58   Intra-Ins(2)    E9F58~CM055707.1:43435501-24087228
CM055708.1      43435500        43435501        C66E2   InterTx C66E2~CM055708.1:43435501~CM055707.1:24087228
CM055711.1      62885703        62885704        71010   Inv(2)  71010~CM055711.1:62885704-42442184
CM055711.1      81492618        81492619        F0477   Inv(2)  F0477~CM055711.1:81492619-4879966

These start positions are also larger than the corresponding chromosome length, hence the error encountered.

I got the same result with make_bed() from v1.7.1, which gave the same output without Inter-Ins and InterTx (as expected).

Hope this helps.

cytham added a commit that referenced this issue Jul 30, 2024
@cytham
Copy link
Owner

cytham commented Jul 30, 2024

Hi @LaurieLecomte, thanks for looking into this. Indeed, Intra-Ins and Inv could be causing the problem, my bad for missing that out. I have removed them in the latest v1.7.1-dev branch. Could you please try again?

The fact that start positions are greater than end positions in some cases shouldn't affect the BED file creation. So I am thinking that it could be another bug somewhere for Intra-Ins, Inv, etc. But thanks a lot for looking into this, you have been of great help.

Also, could you let me know the chromosome lengths of those you showed? (e.g. CM055703.1, CM055704.1, CM055705.1, CM055707.1, CM055708.1, CM055711.1)

@LaurieLecomte
Copy link

Hi @cytham, thanks for the update.

I tried the latest branch and everything ran without any issue, yielding complete VCFs and no errors.

Here are the chromosome lengths for the few problematic sites I shared in my previous comment (from the ASM2944872v1 assembly):

CM055694.1      60416927
CM055703.1      52113258
CM055704.1      42625803
CM055705.1      50983633
CM055707.1      43302116
CM055708.1      48152825
CM055711.1      46201713

@cytham
Copy link
Owner

cytham commented Aug 1, 2024

Thanks @LaurieLecomte. I'm trying to figure out the root cause for this. Is it possible to share the debug files with me? Specifically, subdata.tsv, detect.tsv, and parse1.tsv? Thanks

@LaurieLecomte
Copy link

Hi @cytham, thank you for looking into this. I uploaded the files from a run with version 1.7.0 to Dropbox along with corresponding log files. Let me know if you also need input files (mapped reads and/or reference genome).

@Flooooooooooooower
Copy link

Hello, unfortunately, I encountered a "getfasta error" when using both versions 1.7.1 and 1.7.0 of nanovar with HG002 HiFi data. Do you have any solutions for this issue?

nanovar -x pacbio-ccs -t 30 -f ~/01.Reference/Human_GRCh37/nanovar_gap.bed  ./HG002.HIFI.sort.bam   ~/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa Nanovar_wd

@cytham
Copy link
Owner

cytham commented Aug 9, 2024

hi @Flooooooooooooower, can you provide some logs please?

@Flooooooooooooower
Copy link

hi @Flooooooooooooower, can you provide some logs please?

Hello, thank you for your prompt response. Below is my log file.

 nanovar  --debug -l 50  -x pacbio-ccs -t 30 -f ~/01.Reference/Human_GRCh37/nanovar_gap.bed  ./HG002.HIFI.sort.bam   ~/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa Nanovar_wd
[09/08/2024 13:12:21] - NanoVar started
[09/08/2024 13:12:21] - Checking integrity of input files - Pass
[09/08/2024 13:13:03] - Analyzing read alignments and detecting SVs - Done
[09/08/2024 13:48:55] - Clustering SV breakends - Done
[09/08/2024 14:42:37] - Correcting DUP and detecting TE - Done
[09/08/2024 15:26:23] - Generating VCF files and report - Traceback (most recent call last):
  File "/home/xxx/miniconda3/envs/myenv/bin/nanovar", line 635, in <module>
    main()
  File "/home/xxx/miniconda3/envs/myenv/bin/nanovar", line 486, in main
    run.vcf_report() #
    ^^^^^^^^^^^^^^^^
  File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/nanovar/nv_characterize.py", line 205, in vcf_report
    alt_seq = get_alt_seq(self.dir, self.out_nn, self.refpath)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/nanovar/nv_alt_seq.py", line 46, in get_alt_seq
    fasta = bed.sequence(fi=ref_path, nameOnly=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/pybedtools/bedtool.py", line 907, in decorated
    result = method(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/pybedtools/bedtool.py", line 388, in wrapped
    stream = call_bedtools(
             ^^^^^^^^^^^^^^
  File "/home/xxx/miniconda3/envs/myenv/lib/python3.11/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
    raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

        bedtools getfasta -nameOnly -fo /tmp/pybedtools.xjnohstx.tmp -fi /home/xxx/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa -bed /tmp/pybedtools.l0vhhi9x.tmp

Error message was:
[NanoVar-090824-1312.log](https://github.com/user-attachments/files/16557678/NanoVar-090824-1312.log)

Feature (19:121485432-121485433) beyond the length of 19 size (59128983 bp).  Skipping.

NanoVar-090824-1312.log

@cytham
Copy link
Owner

cytham commented Aug 10, 2024

@Flooooooooooooower I see your logs are from v1.7.0, can I please confirm that you get the same error for v1.7.1? thanks

@Flooooooooooooower
Copy link

@Flooooooooooooower I see your logs are from v1.7.0, can I please confirm that you get the same error for v1.7.1? thanks

Hi,Below is the log file of 1.7.1.
[11/08/2024 16:27:02] - NanoVar-v1.7.1 started
[11/08/2024 16:27:02] - Checking integrity of input files - Pass
[11/08/2024 16:27:23] - Analyzing read alignments and detecting SVs - Done
[11/08/2024 17:49:41] - Clustering SV breakends - Done
[11/08/2024 18:35:33] - Correcting DUP and detecting TE - Done
[11/08/2024 18:55:58] - Generating VCF files and report - Traceback (most recent call last):
File "/opt2/conda/bin/nanovar", line 638, in
main()
File "/opt2/conda/bin/nanovar", line 489, in main
run.vcf_report() #
File "/opt2/conda/lib/python3.10/site-packages/nanovar/nv_characterize.py", line 205, in vcf_report
alt_seq = get_alt_seq(self.dir, self.out_nn, self.refpath)
File "/opt2/conda/lib/python3.10/site-packages/nanovar/nv_alt_seq.py", line 46, in get_alt_seq
fasta = bed.sequence(fi=ref_path, nameOnly=True)
File "/opt2/conda/lib/python3.10/site-packages/pybedtools/bedtool.py", line 907, in decorated
result = method(self, *args, **kwargs)
File "/opt2/conda/lib/python3.10/site-packages/pybedtools/bedtool.py", line 388, in wrapped
stream = call_bedtools(
File "/opt2/conda/lib/python3.10/site-packages/pybedtools/helpers.py", line 456, in call_bedtools
raise BEDToolsError(subprocess.list2cmdline(cmds), stderr)
pybedtools.helpers.BEDToolsError:
Command was:

    bedtools getfasta -nameOnly -fo /tmp/pybedtools.1886eg8o.tmp -fi /home/xxx/01.Reference/Human_GRCh37/02.ngmlr/GCF_000001405.25_GRCh37.p13_genomic.fa -bed /tmp/pybedtools.74ascn07.tmp

Error message was:
Feature (NT_167214.1:79186948-79186949) beyond the length of NT_167214.1 size (161802 bp). Skipping.
Feature (NT_167214.1:31149863-31149864) beyond the length of NT_167214.1 size (161802 bp). Skipping.
Feature (NT_167214.1:76807186-76807187) beyond the length of NT_167214.1 size (161802 bp). Skipping.
Feature (NT_167214.1:76807695-76807696) beyond the length of NT_167214.1 size (161802 bp). Skipping.
Feature (NT_167214.1:127650692-127650693) beyond the length of NT_167214.1 size (161802 bp). Skipping.

NanoVar-110824-1627.log

@jiadong324
Copy link

I also got the same error while using v1.7.0.
image

@cytham Is this issue resolved?

@cytham
Copy link
Owner

cytham commented Sep 19, 2024

Hi @jiadong324, do you still get the error with the v1.7.1-dev branch?

@jiadong324
Copy link

I just finished the test with v1.7.1-dev. No more errors! When will you release the v1.7.1.

@cytham cytham mentioned this issue Sep 26, 2024
@cytham
Copy link
Owner

cytham commented Sep 29, 2024

NanoVar v1.8.1 is now available from PyPI and Conda, thanks for your understanding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants