Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finish QC for CanFam #1242

Closed
13 of 14 tasks
petrelharp opened this issue May 11, 2022 · 6 comments
Closed
13 of 14 tasks

finish QC for CanFam #1242

petrelharp opened this issue May 11, 2022 · 6 comments

Comments

@petrelharp
Copy link
Contributor

petrelharp commented May 11, 2022

PR for new species: #375 - this was already QC'ed in #652 but before we had checks in there for recombination and mutation rates (see other species) - so, most of these things below are probably done already!

If you volunteer to QC this species, please use the checklist below.
While this list is intended to be comprehensive, it may not be exhaustive.
Where relevant, the QC reviewer should identify that parameter values match
those given in the linked citation(s).

The QC reviewer should start a pull request that fills out the test stubs
with independently obtained values. The reviewer may look at the python code
for rationale provided in comments, but should ignore the actual code
as much as possible - comments in the code should give enough information
that it's obvious how to get the correct value from the provided references.
(In particular, we shouldn't copy-paste the value from the code into the test!)

  • Recombination rate.
    • This might be genome-wide, or per-chromosome. Both are fine.
    • Check there's a comment describing where it came from, and/or how calculated.
    • From a publication? Check the value(s) match the publication.
    • Calculated somehow? Average over a recombination map? Redo the calculation.
  • Mutation rate.
    • This might be genome-wide, or per-chromosome. Both are fine.
    • Check there's a comment describing where it came from, and/or how calculated.
    • From a publication? Check the value(s) match the publication.
    • Calculated somehow? Redo the calculation.
  • Recombination map (if present).
    • Does it match the assembly? Liftover is fine, if clearly stated.
    • Is the description/long_description a good summary of how the map was created?
  • Population size.
  • Generation time.

For each citation, check:

  • Doi link.
  • Is publication a preprint? Is there a peer-reviewed publication instead?
  • Is the year correct.
  • Is the author correct (spelling, accents/ligatures/etc.)

Citations are required for:

  • Genome reference assembly.
  • Mutation rate.
  • Recombination rate.
  • Recombination map(s) (if relevant).
  • Population size.
  • Generation time.

The final PR should:

  • fill out the test stubs
  • delete the pytest.mark.skip lines that make the tests not run
  • make sure they pass, talking to the original author to figure out discrepancies
@grahamgower
Copy link
Member

See also #540. Is it mainly the tests for recombination/mutation rates that need doing @petrelharp?

@petrelharp
Copy link
Contributor Author

Yeah, I think that's it.

@petrelharp
Copy link
Contributor Author

I'll start doing this one.

@petrelharp
Copy link
Contributor Author

The mutation rate citations are:


_SkoglundEtAl = stdpopsim.Citation(
        # Ancient wolf genome reveals an early divergence of domestic dog
        # ancestors and admixture into high-latitude breeds.
        author="Skoglund et al.",
        year=2015,
        doi="https://doi.org/10.1016/j.cub.2015.04.019")
_FranzEtAl = stdpopsim.Citation(
        # Genomic and archaeological evidence suggest a dual origin of
        # domestic dogs.
        author="Franz et al.",
        year=2016,
        doi="https://doi.org/10.1126/science.aaf3161")

@petrelharp
Copy link
Contributor Author

petrelharp commented May 16, 2022

It all looks good, except there's no citation for generation time, only the comment:

        # Everyone uses 3 years for generation time because everyone else uses it.
        # It's likely higher, at least in wolves:
        # https://academic.oup.com/mbe/article/35/6/1366/4990884
        # Reasoning behind a generation time of 3 years:
        # Consider two use cases for CanFam simulations:
        # (1) for domestic dog simulations, and (2) for wolf+dog simulations
        # (or ancestral dogs).
        # In case (1), maybe 3 year generations are more appropriate because of human
        # intervention in breeding. In case (2), you might want to match what other
        # studies have done (thus using 3 year generations), or you might want to
        # consider what is known about modern wolves.

I'm going to let this slide, although we could find a paper that actually uses 3 years and cite that one - but it's not clear that's an appropriate use of citations.

@petrelharp
Copy link
Contributor Author

Closed in #1258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants