Custom cohorts in metamist #615

vivbak · 2023-11-16T10:01:28Z

Quick Links
Confluence Documentation
Original Google Docs Scoping Document
Task breakdown on Jira, Epic titled 'Custom Cohorts'

Overview
This PR aims to make a cohort an explicit entity in metamist. There are several benefits to this approach, including granular sequencing group selection, improved data security, streamlining reruns, handling complex workflows, improved reproducibility, etc.

A cohort refers to a curated, immutable, group of sequencing groups (SGs) that share common characteristics or criteria. These cohorts are explicitly defined and managed, allowing users to tailor their analyses to specific subsets of sequencing data. Users can create cohorts based on various criteria such as project/dataset names, inclusion/exclusion of specific sequencing groups, sample type, sequencing group type, or by referencing a previous cohort ID.

A cohort_template serves as a predefined set of criteria used for creating cohorts. It encapsulates the specific conditions or filters that define a cohort's composition. Templates are stored in the system and can be reused to generate cohorts with consistent criteria.

This PR creates both entities, as well as the endpoints to support their use.

It also creates a cohort builder script create_custom_cohort.py so that analysts can create cohorts via the analysis-runner.

codecov-commenter · 2023-11-16T10:07:04Z

Codecov Report

Attention: Patch coverage is 74.38650% with 167 lines in your changes are missing coverage. Please review.

Project coverage is 76.42%. Comparing base (8ee989f) to head (d6f4ac4).
Report is 1 commits behind head on dev.

❗ Current head d6f4ac4 differs from pull request most recent head dfe3c20. Consider uploading reports for the commit dfe3c20 to get more accurate results

Files	Patch %	Lines
api/graphql/schema.py	43.20%	46 Missing ⚠️
models/utils/cohort_id_format.py	34.37%	21 Missing ⚠️
db/python/tables/analysis.py	41.17%	20 Missing ⚠️
db/python/tables/cohort.py	69.69%	20 Missing ⚠️
models/models/cohort.py	71.15%	15 Missing ⚠️
api/routes/cohort.py	47.61%	11 Missing ⚠️
models/utils/cohort_template_id_format.py	65.62%	11 Missing ⚠️
db/python/layers/cohort.py	91.56%	7 Missing ⚠️
db/python/layers/analysis.py	33.33%	6 Missing ⚠️
api/graphql/filters.py	66.66%	2 Missing ⚠️
... and 5 more

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #615      +/-   ##
==========================================
- Coverage   76.47%   76.42%   -0.06%     
==========================================
  Files         148      155       +7     
  Lines       11919    12431     +512     
==========================================
+ Hits         9115     9500     +385     
- Misses       2804     2931     +127

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…-metadata into custom-cohorts

…m rich to raw, to be on the route. Handle raw ids only from layers onward. Update tests accordingly

milo-hyben

Excellent job!

The creation time of the record inserted by upsert_sample() will be reported in UTC, so we need to compare against today in UTC. Otherwise tests fail when run locally before lunchtimeish as it is still "yesterday" in UTC, so lt=today unexpectedly returns the just-created record.

illusional

Amazing work! A few minor comments, but happy for you to ignore / resolve most - and don't feel I need to see it again.

api/graphql/schema.py

api/routes/cohort.py

db/python/layers/cohort.py

db/python/tables/cohort.py

test/test_cohort.py

Fix the script's template_id type, reflecting CohortBody's corresponding member's change from str to int in d705285.

jmarshall · 2024-04-23T07:15:04Z

I've added some basic tests for scripts/create_custom_cohort.py — which necessitated some minor changes to the script to catch up with some refactoring on the branch.

There seems to be still a mixture of singular/plural across the various parts of the code base. In particular, I don't really understand why on line 80 of test/test_cohort_builder.py this isn't sample_type (because criteria appears to be a metamist.model.cohort_criteria.CohortCriteria, which appears to have a sample_type field and no sample_types field!):

self.assertListEqual(criteria.sample_types, ['blood'])

And in all the other tests, use CohortCriteriaInternal directly.

…n as noone will be able to use it at present

…-metadata into custom-cohorts

…ct and templates

vivbak · 2024-04-29T04:00:55Z

Please pause merge -- awaiting approval from @jmarshall

Actually call the underlying route so real data can be returned. In create_custom_cohort.py, add a return value for ease of testing and fix get_cohort_spec() type annotations.

…-metadata into custom-cohorts

vivbak added 5 commits November 15, 2023 15:16

In progress

5ceb5f6

In progress

d7ef222

Cohort GraphQL Skeleton -- functional now

da2f9db

Merge branch 'dev' into custom-cohorts

c2ff6c9

This should go in another PR

6f1d9da

vivbak changed the title ~~Custom cohorts~~ Custom cohorts in metamist Nov 16, 2023

Merge branch 'dev' into custom-cohorts

a956cee

vivbak assigned daniaki Nov 21, 2023

daniaki and others added 21 commits November 23, 2023 18:00

ignore venv

25155ab

Updated docker mariadb setup instructions

7a5d978

support for name, derived_from & other fields

d5103a1

Fix error if applying ord to int

9d655e9

Add all fields to model

f69cde0

Set empty dict to fix GraphQL non-null error

5d24669

Fix error applying ord to int

32fc088

Add name to cohort table

7dc373d

version bump

312c6f8

Formatting; cohort GQL schema

9fb6a06

cohort layer update

57afc9f

cohort db table update

203a49b

CohortBuilder route

60798c1

Single SG ID form

5af95cc

SG ID from project(s) form

c5cdb17

Cohort Builder page

58c80d4

Re-usable SG scrolling table

003ffff

types definition

4cdfde0

Add link to nav bar to cohort builder

3f27185

Add substring search

f153df2

Merge branch 'custom-cohorts' of github.com:populationgenomics/sample…

4db1ca7

…-metadata into custom-cohorts

vivbak added 3 commits April 22, 2024 08:48

Fetch assay external ids too

7db7f7b

Move escape_like_terms to utils, apply to contains filter

1b7632b

Implement Internal and External models for all objects. Move transfor…

3c2af5f

…m rich to raw, to be on the route. Handle raw ids only from layers onward. Update tests accordingly

vivbak requested review from illusional and milo-hyben April 22, 2024 06:53

milo-hyben approved these changes Apr 22, 2024

View reviewed changes

illusional approved these changes Apr 23, 2024

View reviewed changes

Add basic tests for scripts/create_custom_cohort.py

b10f10e

Fix the script's template_id type, reflecting CohortBody's corresponding member's change from str to int in d705285.

Add separate CohortCriteria/Template.to_internal() tests

6f78b9b

And in all the other tests, use CohortCriteriaInternal directly.

illusional mentioned this pull request Apr 24, 2024

Refactor families + add family participants to graphql #740

Merged

vivbak mentioned this pull request Apr 28, 2024

Use data loader when querying analyses for cohort #745

Open

vivbak added 5 commits April 29, 2024 09:27

Switch get_project_write_connection to get_project_readonly_connectio…

9aaaa10

…n as noone will be able to use it at present

Merge branch 'custom-cohorts' of github.com:populationgenomics/sample…

ed2d840

…-metadata into custom-cohorts

Return [] if no template meets criteria, switch return order of proje…

b7b9d1a

…ct and templates

Cohort ID should be None in dry-run mode

f52af30

sample_types -> sample_type

c347309

jmarshall added 5 commits April 29, 2024 16:16

Merge dev (in particular, the empty list SQL fix) into custom-cohorts

49b146d

Only run _query_cohort_ids query if :analysis_ids will be non-empty

5f5cf23

Improve mocking in test_cohort_builder.py tests

0d1757f

Actually call the underlying route so real data can be returned. In create_custom_cohort.py, add a return value for ease of testing and fix get_cohort_spec() type annotations.

Escape metacharacters in icontains query string (and add tests)

f793927

Make --dry-run an argumentless flag option

b8e1252

jmarshall approved these changes Apr 30, 2024

View reviewed changes

vivbak added 2 commits April 30, 2024 12:30

Bump version: 6.9.1 → 6.10.0

9fcc2a9

Merge branch 'custom-cohorts' of github.com:populationgenomics/sample…

dfe3c20

…-metadata into custom-cohorts

vivbak merged commit 6f46082 into dev Apr 30, 2024
5 checks passed

vivbak deleted the custom-cohorts branch April 30, 2024 02:38

jmarshall mentioned this pull request May 1, 2024

Forbid fields other than those defined by each model class #752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom cohorts in metamist #615

Custom cohorts in metamist #615

vivbak commented Nov 16, 2023 •

edited

Loading

codecov-commenter commented Nov 16, 2023 •

edited

Loading

milo-hyben left a comment

illusional left a comment

jmarshall commented Apr 23, 2024 •

edited

Loading

vivbak commented Apr 29, 2024 •

edited

Loading

Custom cohorts in metamist #615

Custom cohorts in metamist #615

Conversation

vivbak commented Nov 16, 2023 • edited Loading

codecov-commenter commented Nov 16, 2023 • edited Loading

Codecov Report

milo-hyben left a comment

Choose a reason for hiding this comment

illusional left a comment

Choose a reason for hiding this comment

jmarshall commented Apr 23, 2024 • edited Loading

vivbak commented Apr 29, 2024 • edited Loading

vivbak commented Nov 16, 2023 •

edited

Loading

codecov-commenter commented Nov 16, 2023 •

edited

Loading

jmarshall commented Apr 23, 2024 •

edited

Loading

vivbak commented Apr 29, 2024 •

edited

Loading