[Pipeline integration] Meta-data persisting #4030

Vince-janv · 2024-12-18T10:55:58Z

Description

Pipeline-specific meta-data needs to be stored in the database. This can be data like, reference genome, family relations, tissue_type etc. This is information that pipelines require to perform their analysis and should be unambiguously fetched from statusDB.

Acceptance criteria

Meta-data can be added per case
Meta-data can be added per sample, within a case
Adding 1-2 fields for a pipeline should not make the database significantly more complex

Notes

Current examples are:

Case level:

Reference genome

Sample level

organism/organism_id
_phenotype_groups
_phenotype_terms
is_tumour?
status
mother_id
father_id

Options

Single table inheritance:

We need a discriminant column.

Things needed when adding a new pipeline

A new Case class in models.py
A new CaseSample class in models.py
A new Sample (maybe?) class in models.py

No inheritance

Just add necessary columns

Joined table inheritance

Things needed when adding a new pipeline

New table containing the columns unique to that pipeline

JSON storage on cases. Discarded

Just add a "meta-data" json column in each table and pass a dict to it

Things needed when adding a new pipeline

Nothing

"Flex columns"

Vince-janv · 2025-01-27T10:00:20Z

All meta-data fetched for pipelines

BALSAMIC

is_tumour
bed_file_path (currently fetched from LIMS)
prep_category
sex

BALSAMIC_UMI

is_tumour
bed_file_path (currently fetched from LIMS)
prep_category
sex

FLUFFY

lane (in sequencing)
flow cell id
index1
index2
control
library_nM (from LIMS i think)
sequencing_date

MICROSALT

project_id (ACCXXXX) from LIMS
ticket_id
organism
reference genome
date_arrival
date_sequencing
date_libprep (currently fetched from LIMS)
method_libprep (currently fetched from LIMS)
method_sequencing (currently fetched from LIMS)

MIP_DNA

panel
bed_file_path (currently fetched from LIMS)
mother
father
status

MIP_RNA

panel
bed_file_path (currently fetched from LIMS)
status

MUTANT

customer
ticket
method_libprep (currently fetched from LIMS)
method_sequencing (currently fetched from LIMS)
date_arrival
date_libprep
date_sequencing
selection_criteria (currently fetched from LIMS)
region_code (currently fetched from LIMS)
lab_code (currently fetched from LIMS)
primer (currently fetched from LIMS)

RAREDISEASE

bed_file_path (currently fetched from LIMS)
prep_category
sex
father
mother
status
panel

RNAFUSION

reference_genome

TAXPROFILER

instrument_platform
priority

TOMTE

reference_genome
tissue_type
panel

Vince-janv · 2025-02-04T12:54:54Z

Fields to add to models:

Case

MIP-DNA

panel

Raredisease

panel
reference_genome

RNAFUSION

reference_genome

Tomte

reference_genome
panel

CaseSample

Microsalt

reference_genome (microbial)

MIP-DNA

mother
father
status

MIP_RNA

status

Raredisease

father
mother
status

Tomte

Look into mother/father
Status

Organised per field:

panel

•	Case: MIP-DNA, Raredisease, Tomte

reference_genome

•	Case: Raredisease, Tomte, RNAFUSION
•	CaseSample: Microsalt (microbial)

organism

•	CaseSample: Microsalt

mother

•	CaseSample: MIP-DNA, Raredisease

father

•	CaseSample: MIP-DNA, Raredisease

status

•	CaseSample: MIP-DNA, MIP_RNA, Raredisease

Vince-janv · 2025-02-04T14:29:21Z

There is a clear split between som pipelines aimed at genetic disorders (mip-dna, mip-rna, tomte, raredisease) which need quite a few fields, whereas many others don't need any new fields.

Questions

Should we have all models inherit from the base-case?
Should we use the "existing_column" option in the polymorphism?

Vince-janv · 2025-02-06T14:07:16Z

Discussion during meeting 2025-02-05

Discussion started with how, in practice, we want to implement single-table inheritance. Questions like:
- How to type-hint a multiple case-types sharing a field (like reference genome)
- Which modules would this affect
This led to the question, what would polymorphism using single-table inheritance solve?
- Looking at model definitions it's easier to understand which fields are used for what
- When working with a child model you know which fields are present
These point above are not something we currently perceive as an issue and more importantly they are not the acceptence criteria.

Conclusion

We will not introduce any polymorphism
To satisfy 3NF we will add a column reference_genome to the Case table
The existing reference_genome table on Sample will be renamed microbial_reference_genome and all non microbial samples will have it set to null

TBD

What do we populate the new reference_genome field with?

Vince-janv added this to the Standardised pipeline integration milestone Dec 18, 2024

Vince-janv changed the title ~~[Pipeline integration] Ordering & and meta-data persisting (DRAFT)~~ [Pipeline integration] Ordering & and meta-data persisting Dec 18, 2024

Vince-janv changed the title ~~[Pipeline integration] Ordering & and meta-data persisting~~ [Pipeline integration] Meta-data persisting Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline integration] Meta-data persisting #4030

[Pipeline integration] Meta-data persisting #4030

Vince-janv commented Dec 18, 2024 •

edited

Loading

Vince-janv commented Jan 27, 2025 •

edited

Loading

Vince-janv commented Feb 4, 2025 •

edited

Loading

Vince-janv commented Feb 4, 2025

Vince-janv commented Feb 6, 2025

[Pipeline integration] Meta-data persisting #4030

[Pipeline integration] Meta-data persisting #4030

Comments

Vince-janv commented Dec 18, 2024 • edited Loading

Description

Acceptance criteria

Notes

Current examples are:

Case level:

Sample level

Options

Single table inheritance:

No inheritance

Joined table inheritance

JSON storage on cases. Discarded

"Flex columns"

Vince-janv commented Jan 27, 2025 • edited Loading

All meta-data fetched for pipelines

BALSAMIC

BALSAMIC_UMI

FLUFFY

MICROSALT

MIP_DNA

MIP_RNA

MUTANT

RAREDISEASE

RNAFUSION

TAXPROFILER

TOMTE

Vince-janv commented Feb 4, 2025 • edited Loading

Fields to add to models:

Case

MIP-DNA

Raredisease

RNAFUSION

Tomte

CaseSample

Microsalt

MIP-DNA

MIP_RNA

Raredisease

Tomte

Organised per field:

panel

reference_genome

organism

mother

father

status

Vince-janv commented Feb 4, 2025

Questions

Vince-janv commented Feb 6, 2025

Discussion during meeting 2025-02-05

Conclusion

TBD

Vince-janv commented Dec 18, 2024 •

edited

Loading

Vince-janv commented Jan 27, 2025 •

edited

Loading

Vince-janv commented Feb 4, 2025 •

edited

Loading