Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in antismash_overview : invalid literal "<" for int in start position of region field #220

Closed
OmkarSaMo opened this issue Jan 23, 2023 · 1 comment · Fixed by #226

Comments

@OmkarSaMo
Copy link
Contributor

OmkarSaMo commented Jan 23, 2023

The following issue appears in quite a few cases while running rule antismash_overview.

INFO     23/01 12:22:39   Getting antismash regions from record: JAPENF010000002
INFO     23/01 12:22:39   Grabbing information from region 1.1
Traceback (most recent call last):
  File "/datadrive/data3/g1115-paper/workflow/bgcflow/bgcflow/data/get_antismash_overview.py", line 128, in <module>
    get_antismash_overview(sys.argv[1], sys.argv[2], sys.argv[3])
  File "/datadrive/data3/g1115-paper/workflow/bgcflow/bgcflow/data/get_antismash_overview.py", line 110, in get_antismash_overview
    output_cluster["region_length"] = int(output_cluster["end_pos"]) - int(
                                                                       ^^^^
ValueError: invalid literal for int() with base 10: '<0'

Here is the screenshot of the region filed in the antiSMASH results genbank file. The location is <1..32213 which is causing the issue.

@SJShaw - is it normal that the region field location of antiSMASH genbank file starts with "<" in few cases instead of a number?

     region          <1..32213
                     /candidate_cluster_numbers="1"
                     /candidate_cluster_numbers="2"
                     /contig_edge="True"
                     /product="other"
                     /product="NRPS"
                     /region_number="1"
                     /rules="(LmbU or Neocarzinostat or fom1 or bcpB or frbD or
                     mitE or vlmB or prnB or CaiA or bacilysin or orf2_PTase)"
                     /rules="cds(Condensation and (AMP-binding or A-OX))"
                     /tool="antismash"

PFA Json file for recreating the issue
NBC_00340.zip

@OmkarSaMo OmkarSaMo changed the title Error in antismash_overview : invalid literal for int() with base 10 Error in antismash_overview : invalid literal "<" for int in start position of region field Jan 23, 2023
@SJShaw
Copy link

SJShaw commented Jan 23, 2023

It's entirely normal if the first gene in the region has an inexact start position (BeforePosition). The same can be true for the end position and AfterPostion.

Don't attempt to parse a genbank location as if it's boring old ints, it'll bite you. Use the JSON and fetch the areas key for more machine readable data:
"areas": [{"start": 0, "end": 32213, "products": ["other"...
So for that JSON you attached:

import json
with open(json_file) as handle:
  data = json.load(handle)
for area in data["records"][0]["areas"]:
  print(area["start"], area["end"])

outputs:

0 32213
32980 106973
...

Use whatever JSON parser you like instead (like jq on the command line).

@matinnuhamunada matinnuhamunada linked a pull request Mar 14, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants