Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom cohorts in metamist #615

Merged
merged 177 commits into from
Apr 30, 2024
Merged

Custom cohorts in metamist #615

merged 177 commits into from
Apr 30, 2024

Conversation

vivbak
Copy link
Contributor

@vivbak vivbak commented Nov 16, 2023

Quick Links
Confluence Documentation
Original Google Docs Scoping Document
Task breakdown on Jira, Epic titled 'Custom Cohorts'

Overview
This PR aims to make a cohort an explicit entity in metamist. There are several benefits to this approach, including granular sequencing group selection, improved data security, streamlining reruns, handling complex workflows, improved reproducibility, etc.

A cohort refers to a curated, immutable, group of sequencing groups (SGs) that share common characteristics or criteria. These cohorts are explicitly defined and managed, allowing users to tailor their analyses to specific subsets of sequencing data. Users can create cohorts based on various criteria such as project/dataset names, inclusion/exclusion of specific sequencing groups, sample type, sequencing group type, or by referencing a previous cohort ID.

A cohort_template serves as a predefined set of criteria used for creating cohorts. It encapsulates the specific conditions or filters that define a cohort's composition. Templates are stored in the system and can be reused to generate cohorts with consistent criteria.

This PR creates both entities, as well as the endpoints to support their use.

It also creates a cohort builder script create_custom_cohort.py so that analysts can create cohorts via the analysis-runner.

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2023

Codecov Report

Attention: Patch coverage is 74.38650% with 167 lines in your changes are missing coverage. Please review.

Project coverage is 76.42%. Comparing base (8ee989f) to head (d6f4ac4).
Report is 1 commits behind head on dev.

❗ Current head d6f4ac4 differs from pull request most recent head dfe3c20. Consider uploading reports for the commit dfe3c20 to get more accurate results

Files Patch % Lines
api/graphql/schema.py 43.20% 46 Missing ⚠️
models/utils/cohort_id_format.py 34.37% 21 Missing ⚠️
db/python/tables/analysis.py 41.17% 20 Missing ⚠️
db/python/tables/cohort.py 69.69% 20 Missing ⚠️
models/models/cohort.py 71.15% 15 Missing ⚠️
api/routes/cohort.py 47.61% 11 Missing ⚠️
models/utils/cohort_template_id_format.py 65.62% 11 Missing ⚠️
db/python/layers/cohort.py 91.56% 7 Missing ⚠️
db/python/layers/analysis.py 33.33% 6 Missing ⚠️
api/graphql/filters.py 66.66% 2 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##              dev     #615      +/-   ##
==========================================
- Coverage   76.47%   76.42%   -0.06%     
==========================================
  Files         148      155       +7     
  Lines       11919    12431     +512     
==========================================
+ Hits         9115     9500     +385     
- Misses       2804     2931     +127     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vivbak vivbak changed the title Custom cohorts Custom cohorts in metamist Nov 16, 2023
@vivbak vivbak requested review from illusional and milo-hyben April 22, 2024 06:53
Copy link
Contributor

@milo-hyben milo-hyben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job!

The creation time of the record inserted by upsert_sample() will
be reported in UTC, so we need to compare against today in UTC.
Otherwise tests fail when run locally before lunchtimeish as it
is still "yesterday" in UTC, so lt=today unexpectedly returns
the just-created record.
Copy link
Contributor

@illusional illusional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! A few minor comments, but happy for you to ignore / resolve most - and don't feel I need to see it again.

api/graphql/schema.py Show resolved Hide resolved
api/routes/cohort.py Outdated Show resolved Hide resolved
db/python/layers/cohort.py Show resolved Hide resolved
db/python/layers/cohort.py Outdated Show resolved Hide resolved
db/python/tables/cohort.py Outdated Show resolved Hide resolved
db/python/tables/cohort.py Show resolved Hide resolved
test/test_cohort.py Outdated Show resolved Hide resolved
test/test_cohort.py Outdated Show resolved Hide resolved
Fix the script's template_id type, reflecting CohortBody's corresponding
member's change from str to int in d705285.
@jmarshall
Copy link
Contributor

jmarshall commented Apr 23, 2024

I've added some basic tests for scripts/create_custom_cohort.py — which necessitated some minor changes to the script to catch up with some refactoring on the branch.

There seems to be still a mixture of singular/plural across the various parts of the code base. In particular, I don't really understand why on line 80 of test/test_cohort_builder.py this isn't sample_type (because criteria appears to be a metamist.model.cohort_criteria.CohortCriteria, which appears to have a sample_type field and no sample_types field!):

self.assertListEqual(criteria.sample_types, ['blood'])

And in all the other tests, use CohortCriteriaInternal directly.
@vivbak
Copy link
Contributor Author

vivbak commented Apr 29, 2024

Please pause merge -- awaiting approval from @jmarshall

@vivbak vivbak merged commit 6f46082 into dev Apr 30, 2024
5 checks passed
@vivbak vivbak deleted the custom-cohorts branch April 30, 2024 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants