Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintenance extension to stub out from NCBI #875

Merged
merged 3 commits into from
Apr 9, 2021

Conversation

andrewkern
Copy link
Member

@andrewkern andrewkern commented Apr 9, 2021

this PR adds add-species-ncbi to the maintenance module. it takes as an argument the species name in binomial nomenclature, e.g., "Drosophila simulans". Quotes must be included around the search term

should work in the meantime to help folks stub stuff out from NCBI

@codecov
Copy link

codecov bot commented Apr 9, 2021

Codecov Report

Merging #875 (15235ff) into main (60c9004) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #875   +/-   ##
=======================================
  Coverage   99.08%   99.08%           
=======================================
  Files          55       55           
  Lines        2410     2410           
  Branches      284      284           
=======================================
  Hits         2388     2388           
  Misses         14       14           
  Partials        8        8           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60c9004...15235ff. Read the comment docs.

@jeromekelleher
Copy link
Member

Looks great @andrewkern, but I think the mapping of scientific names to UIDs is a bit too weak. I tried it for Mus Musculus and it didn't work. I'll tack on a commit with some updates.

@andrewkern
Copy link
Member Author

Mus musculus worked fine for me? right now the name has to be in quotes...

$ python -m maintenance add-species-ncbi "Mus musculus"
2021-04-09 06:56:11,789 [28477] INFO     maint: Adding new species MusMus for NCBI ID Mus musculus
2021-04-09 06:56:11,789 [28477] INFO     maint: Writing genome data for MusMus Mus musculus
2021-04-09 06:56:12,156 [28477] INFO     ncbi: Searching NCBI for Mus musculus
2021-04-09 06:56:12,156 [28477] INFO     ncbi: found 39 ids
2021-04-09 06:56:26,513 [28477] INFO     ncbi: most recent id: 7358741
2021-04-09 06:56:28,268 [28477] INFO     maint: MusMus: name=Mus musculus, common_name=Mus musculus
2021-04-09 06:56:28,268 [28477] INFO     maint: Writing species definition stub to stdpopsim/catalog/MusMus/species.py
2021-04-09 06:56:28,319 [28477] INFO     maint: Writing species test stub to tests/test_MusMus.py

@jeromekelleher
Copy link
Member

I've added a commit which changes to using the UID @andrewkern:

$ python3 -m maintenance add-species-ncbi --force 7358741
2021-04-09 15:06:51,385 [20461] INFO     maint: Adding new species MusMus for NCBI ID 7358741
2021-04-09 15:06:51,385 [20461] INFO     maint: Writing genome data for MusMus 7358741
2021-04-09 15:06:51,385 [20461] INFO     ncbi: Getting genome data for id: 7358741
2021-04-09 15:06:53,925 [20461] INFO     maint: MusMus: name=Mus musculus, common_name=Mus musculus (house mouse)
2021-04-09 15:06:53,925 [20461] INFO     maint: Writing species definition stub to stdpopsim/catalog/MusMus/species.py
2021-04-09 15:06:54,003 [20461] INFO     maint: Writing species test stub to tests/test_MusMus.py

This seems a bit more reliable than searching for the last modified record anyway?

@jeromekelleher
Copy link
Member

I tried "Mus musculus" and it didn't work for me - it found loads of matches and returned some random accession with no chromosomes.

@andrewkern
Copy link
Member Author

weird! I got this in my genome_data.py. what assembly did it grab for you? i'm sorting assemblies by date currently to pull the most recent one.

data = {
    "assembly_accession": "GCF_000001635.27",
    "assembly_name": "GRCm39",
    "chromosomes": {
        "1": {"length": "195974786", "synonyms": []},
        "2": {"length": "181755017", "synonyms": []},
        "3": {"length": "159745316", "synonyms": []},
        "4": {"length": "156862662", "synonyms": []},
        "5": {"length": "153496487", "synonyms": []},
        "6": {"length": "149588044", "synonyms": []},
        "7": {"length": "145171164", "synonyms": []},
        "8": {"length": "130127694", "synonyms": []},
        "9": {"length": "124359700", "synonyms": []},
        "10": {"length": "130530862", "synonyms": []},
        "11": {"length": "121973369", "synonyms": []},
        "12": {"length": "120092757", "synonyms": []},
        "13": {"length": "120883175", "synonyms": []},
        "14": {"length": "125139656", "synonyms": []},
        "15": {"length": "104073951", "synonyms": []},
        "16": {"length": "98008968", "synonyms": []},
        "17": {"length": "95294699", "synonyms": []},
        "18": {"length": "90720763", "synonyms": []},
        "19": {"length": "61420004", "synonyms": []},
        "X": {"length": "170035695", "synonyms": []},
        "Y": {"length": "92212126", "synonyms": []},
    },
}

@jeromekelleher
Copy link
Member

jeromekelleher commented Apr 9, 2021

weird! I got this in my genome_data.py. what assembly did it grab for you? i'm sorting assemblies by date currently to pull the most recent one.

I'm not sure there's much point in trying to automate this @andrewkern - it'll never work in all cases. It's easy enough to find the UID, once you know where to look. This stuff will just keep breaking every time we update if we don't have a fixed ID.

I've added a commit to this PR - which does this. How about we merge that much and see how it goes?

@andrewkern
Copy link
Member Author

yes sounds good to me.

@jeromekelleher jeromekelleher merged commit ed48aad into popsim-consortium:main Apr 9, 2021
@jeromekelleher
Copy link
Member

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants