Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError in rank_ids when no sample comparison distances meet threshold criteria #11

Open
kylacochrane opened this issue Sep 24, 2024 · 0 comments

Comments

@kylacochrane
Copy link
Collaborator

kylacochrane commented Sep 24, 2024

Description

I encountered an issue where the program throws an UnboundLocalError for the variable rank_ids when none of the distances between query samples and reference samples meet the defined threshold criteria.

Error Message

Traceback (most recent call last): File "/usr/local/bin/gas", line 10, in <module> sys.exit(main()) ^^^^^^ File "/usr/local/lib/python3.11/site-packages/genomic_address_service/main.py", line 43, in main exec('genomic_address_service.' + task + '.run()') File "<string>", line 1, in <module> File "/usr/local/lib/python3.11/site-packages/genomic_address_service/call.py", line 109, in run obj = assign(dist_file,membership_file,threshold_map,linkage_method) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/genomic_address_service/classes/assign.py", line 86, in __init__ self.assign() File "/usr/local/lib/python3.11/site-packages/genomic_address_service/classes/assign.py", line 254, in assign a[i] = self.nomenclature_cluster_tracker[rank_ids[i]] ^^^^^^^^ UnboundLocalError: cannot access local variable 'rank_ids' where it is not associated with a value

Steps to Reproduce:

  1. Run the gas call command with a --dists file where the distances between the query and reference samples are significantly larger than the defined thresholds (e.g., distances in the thousands, thresholds at 10, 5, and 0).
  • reference_clusters.txt
    id address level_1 level_2 level_3
    SE01 3.3.5 3 3 5
    SE02 3.3.4 3 3 4
    SE03 2.2.3 2 2 3
    SE04 1.1.1 1 1 1
    SE04 1.1.1 1 1 2

  • results.txt (from profile_dists)
    query_id ref_id dist
    SH01 SH01 0
    SH01 SE04 3346
    SH01 SE05 3346
    SH01 SE03 3350
    SH01 SE02 3359
    SH01 SE01 3360
    SH01 SH02 3369
    SH02 SH02 0
    SH02 SE02 22
    SH02 SE01 23
    SH02 SE03 43
    SH02 SE04 45
    SH02 SE05 45
    SH02 SH01 3369

  1. The error occurs when all distance values exceed the set thresholds, causing rank_ids to not be assigned a value.

Analysis

It appears that when none of the distances meet the threshold criteria, the variable rank_ids is not properly initialized or assigned, causing the error.

  • This issue is avoided when at least one pair of samples has a distance that falls within the threshold, resulting in cluster addresses being assigned to all query samples.

Suggested Fix

It would be helpful to add error handling or checks to prevent the unbound error by ensuring that rank_ids is properly initialized, even when no sample comparisons fall within the thresholds.

Additional Context

This error was encountered while processing Salmonella enterica samples (sourced from NCBI) through the mikrokondo, followed by running a subset through gasclustering to assign cluster addresses using gas mcluster. Two remaining samples (SH01, SH02) lacked assigned cluster addresses, leading to the execution of the gasnomenclature , where the gas call command was used.

Notably, increasing the thresholds (--gm_thresholds "3500,1000,500") allowed the query sample to be successfully assigned. Alternatively, rerunning the samples through gasclustering and gasnomenclature with adjusted parameters (--pd_distm scaled and --gm_threshold "50,20,0") also resulted in a successful assignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant