-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom cohorts in metamist #615
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #615 +/- ##
==========================================
- Coverage 76.47% 76.42% -0.06%
==========================================
Files 148 155 +7
Lines 11919 12431 +512
==========================================
+ Hits 9115 9500 +385
- Misses 2804 2931 +127 ☔ View full report in Codecov by Sentry. |
…-metadata into custom-cohorts
…m rich to raw, to be on the route. Handle raw ids only from layers onward. Update tests accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent job!
The creation time of the record inserted by upsert_sample() will be reported in UTC, so we need to compare against today in UTC. Otherwise tests fail when run locally before lunchtimeish as it is still "yesterday" in UTC, so lt=today unexpectedly returns the just-created record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! A few minor comments, but happy for you to ignore / resolve most - and don't feel I need to see it again.
Fix the script's template_id type, reflecting CohortBody's corresponding member's change from str to int in d705285.
I've added some basic tests for scripts/create_custom_cohort.py — which necessitated some minor changes to the script to catch up with some refactoring on the branch. There seems to be still a mixture of singular/plural across the various parts of the code base. In particular, I don't really understand why on line 80 of test/test_cohort_builder.py this isn't self.assertListEqual(criteria.sample_types, ['blood']) |
And in all the other tests, use CohortCriteriaInternal directly.
…n as noone will be able to use it at present
…-metadata into custom-cohorts
Please pause merge -- awaiting approval from @jmarshall |
Actually call the underlying route so real data can be returned. In create_custom_cohort.py, add a return value for ease of testing and fix get_cohort_spec() type annotations.
…-metadata into custom-cohorts
Quick Links
Confluence Documentation
Original Google Docs Scoping Document
Task breakdown on Jira, Epic titled 'Custom Cohorts'
Overview
This PR aims to make a
cohort
an explicit entity in metamist. There are several benefits to this approach, including granular sequencing group selection, improved data security, streamlining reruns, handling complex workflows, improved reproducibility, etc.A
cohort
refers to a curated, immutable, group of sequencing groups (SGs) that share common characteristics or criteria. These cohorts are explicitly defined and managed, allowing users to tailor their analyses to specific subsets of sequencing data. Users can create cohorts based on various criteria such as project/dataset names, inclusion/exclusion of specific sequencing groups, sample type, sequencing group type, or by referencing a previous cohort ID.A
cohort_template
serves as a predefined set of criteria used for creating cohorts. It encapsulates the specific conditions or filters that define a cohort's composition. Templates are stored in the system and can be reused to generate cohorts with consistent criteria.This PR creates both entities, as well as the endpoints to support their use.
It also creates a cohort builder script
create_custom_cohort.py
so that analysts can create cohorts via the analysis-runner.