Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate species addition #790

Merged

Conversation

jeromekelleher
Copy link
Member

@jeromekelleher jeromekelleher commented Mar 7, 2021

Depends on #789 (so quite noisy until that's merged) #789 now merged.

Automate a lot of the drudge work involved in adding new things to the catalog. Use it like

python3 -m maintenance add-species mus_musculus

ping @andrewkern, @petrelharp, @grahamgower

@codecov
Copy link

codecov bot commented Mar 7, 2021

Codecov Report

Merging #790 (5bea24f) into main (4187772) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #790      +/-   ##
==========================================
+ Coverage   99.49%   99.53%   +0.04%     
==========================================
  Files          54       55       +1     
  Lines        2383     2384       +1     
  Branches      279      281       +2     
==========================================
+ Hits         2371     2373       +2     
+ Misses          6        5       -1     
  Partials        6        6              
Impacted Files Coverage Δ
stdpopsim/catalog/DroMel/genome_data.py 100.00% <ø> (ø)
stdpopsim/catalog/EscCol/species.py 100.00% <ø> (ø)
stdpopsim/catalog/HomSap/species.py 100.00% <ø> (ø)
stdpopsim/__init__.py 94.73% <100.00%> (-1.27%) ⬇️
stdpopsim/catalog/AraTha/__init__.py 100.00% <100.00%> (ø)
stdpopsim/catalog/BosTau/__init__.py 100.00% <100.00%> (ø)
stdpopsim/catalog/CanFam/__init__.py 100.00% <100.00%> (ø)
stdpopsim/catalog/DroMel/__init__.py 100.00% <100.00%> (ø)
stdpopsim/catalog/EscCol/__init__.py 100.00% <100.00%> (ø)
stdpopsim/catalog/HomSap/__init__.py 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4187772...5bea24f. Read the comment docs.

Copy link
Member

@grahamgower grahamgower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

- Change ID for EscCol to
  escherichia_coli_str_k_12_substr_mg1655_gca_000005845. Ensembl seems
  to have added the gca_ suffix.j
- Update the DroMel assembly version from BDGP6.28 to BDGP6.32
@jeromekelleher jeromekelleher force-pushed the update-species-addition branch from 7dd3cf6 to 24f56e5 Compare March 8, 2021 11:15
Also automate adding a species to the catalog.
@jeromekelleher jeromekelleher force-pushed the update-species-addition branch from 24f56e5 to 5bea24f Compare March 8, 2021 17:02
@petrelharp
Copy link
Contributor

petrelharp commented Mar 9, 2021

Small bit of data: to try something besides mus_musculus I went to one of the sites listed here and tried out the first one, Abington island giant tortoise, Chelonoidis abingdonii. My first guess at the "ensembl ID" was "chelonoidis_abingdonii" but that threw an error; then I noticed that the ensembl ID is up here:
Screenshot from 2021-03-09 09-57-40
which is ASM359739v1... but that throws the error

ValueError: Cannot extract six character id from ASM359739v1

Looking further down the list, this situation seems common. The ensembl ID is not usually genus_species.

Also, it's totally not clear where to actually find the ensembl ID.

@andrewkern andrewkern merged commit 833d982 into popsim-consortium:main Mar 9, 2021
@jeromekelleher
Copy link
Member Author

Small bit of data: to try something besides mus_musculus I went to one of the sites listed here and tried out the first one, Abington island giant tortoise, Chelonoidis abingdonii. My first guess at the "ensembl ID" was "chelonoidis_abingdonii" but that threw an error; then I noticed that the ensembl ID is up here:
Screenshot from 2021-03-09 09-57-40
which is ASM359739v1... but that throws the error

ValueError: Cannot extract six character id from ASM359739v1

Looking further down the list, this situation seems common. The ensembl ID is not usually genus_species.

Also, it's totally not clear where to actually find the ensembl ID.

Can you open an issue for this please? Should be an easy enough fix.

@jeromekelleher jeromekelleher deleted the update-species-addition branch March 10, 2021 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants