Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline integration] Meta-data persisting #4030

Open
3 tasks
Vince-janv opened this issue Dec 18, 2024 · 4 comments
Open
3 tasks

[Pipeline integration] Meta-data persisting #4030

Vince-janv opened this issue Dec 18, 2024 · 4 comments

Comments

@Vince-janv
Copy link
Contributor

Vince-janv commented Dec 18, 2024

Description

Pipeline-specific meta-data needs to be stored in the database. This can be data like, reference genome, family relations, tissue_type etc. This is information that pipelines require to perform their analysis and should be unambiguously fetched from statusDB.

Acceptance criteria

  • Meta-data can be added per case
  • Meta-data can be added per sample, within a case
  • Adding 1-2 fields for a pipeline should not make the database significantly more complex

Notes

Current examples are:

Case level:

  • Reference genome

Sample level

  • organism/organism_id
  • _phenotype_groups
  • _phenotype_terms
  • is_tumour?
  • status
  • mother_id
  • father_id

Options

Single table inheritance:

  • We need a discriminant column.

Things needed when adding a new pipeline

  • A new Case class in models.py
  • A new CaseSample class in models.py
  • A new Sample (maybe?) class in models.py

No inheritance

  • Just add necessary columns

Joined table inheritance

Things needed when adding a new pipeline

  • New table containing the columns unique to that pipeline

JSON storage on cases. Discarded

  • Just add a "meta-data" json column in each table and pass a dict to it

Things needed when adding a new pipeline

  • Nothing

"Flex columns"

@Vince-janv Vince-janv changed the title [Pipeline integration] Ordering & and meta-data persisting (DRAFT) [Pipeline integration] Ordering & and meta-data persisting Dec 18, 2024
@Vince-janv Vince-janv changed the title [Pipeline integration] Ordering & and meta-data persisting [Pipeline integration] Meta-data persisting Jan 15, 2025
@Vince-janv
Copy link
Contributor Author

Vince-janv commented Jan 27, 2025

All meta-data fetched for pipelines

BALSAMIC

  • is_tumour
  • bed_file_path (currently fetched from LIMS)
  • prep_category
  • sex

BALSAMIC_UMI

  • is_tumour
  • bed_file_path (currently fetched from LIMS)
  • prep_category
  • sex

FLUFFY

  • lane (in sequencing)
  • flow cell id
  • index1
  • index2
  • control
  • library_nM (from LIMS i think)
  • sequencing_date

MICROSALT

  • project_id (ACCXXXX) from LIMS
  • ticket_id
  • organism
  • reference genome
  • date_arrival
  • date_sequencing
  • date_libprep (currently fetched from LIMS)
  • method_libprep (currently fetched from LIMS)
  • method_sequencing (currently fetched from LIMS)

MIP_DNA

  • panel
  • bed_file_path (currently fetched from LIMS)
  • mother
  • father
  • status

MIP_RNA

  • panel
  • bed_file_path (currently fetched from LIMS)
  • status

MUTANT

  • customer
  • ticket
  • method_libprep (currently fetched from LIMS)
  • method_sequencing (currently fetched from LIMS)
  • date_arrival
  • date_libprep
  • date_sequencing
  • selection_criteria (currently fetched from LIMS)
  • region_code (currently fetched from LIMS)
  • lab_code (currently fetched from LIMS)
  • primer (currently fetched from LIMS)

RAREDISEASE

  • bed_file_path (currently fetched from LIMS)
  • prep_category
  • sex
  • father
  • mother
  • status
  • panel

RNAFUSION

  • reference_genome

TAXPROFILER

  • instrument_platform
  • priority

TOMTE

  • reference_genome
  • tissue_type
  • panel

@Vince-janv
Copy link
Contributor Author

Vince-janv commented Feb 4, 2025

Fields to add to models:

Case

MIP-DNA

  • panel

Raredisease

  • panel
  • reference_genome

RNAFUSION

  • reference_genome

Tomte

  • reference_genome
  • panel

CaseSample

Microsalt

  • reference_genome (microbial)

MIP-DNA

  • mother
  • father
  • status

MIP_RNA

  • status

Raredisease

  • father
  • mother
  • status

Tomte

  • Look into mother/father
  • Status

Organised per field:

panel

•	Case: MIP-DNA, Raredisease, Tomte

reference_genome

•	Case: Raredisease, Tomte, RNAFUSION
•	CaseSample: Microsalt (microbial)

organism

•	CaseSample: Microsalt

mother

•	CaseSample: MIP-DNA, Raredisease

father

•	CaseSample: MIP-DNA, Raredisease

status

•	CaseSample: MIP-DNA, MIP_RNA, Raredisease

@Vince-janv
Copy link
Contributor Author

There is a clear split between som pipelines aimed at genetic disorders (mip-dna, mip-rna, tomte, raredisease) which need quite a few fields, whereas many others don't need any new fields.

Questions

  • Should we have all models inherit from the base-case?
  • Should we use the "existing_column" option in the polymorphism?

@Vince-janv
Copy link
Contributor Author

Discussion during meeting 2025-02-05

  • Discussion started with how, in practice, we want to implement single-table inheritance. Questions like:
    • How to type-hint a multiple case-types sharing a field (like reference genome)
    • Which modules would this affect
  • This led to the question, what would polymorphism using single-table inheritance solve?
    • Looking at model definitions it's easier to understand which fields are used for what
    • When working with a child model you know which fields are present
  • These point above are not something we currently perceive as an issue and more importantly they are not the acceptence criteria.

Conclusion

  • We will not introduce any polymorphism
  • To satisfy 3NF we will add a column reference_genome to the Case table
  • The existing reference_genome table on Sample will be renamed microbial_reference_genome and all non microbial samples will have it set to null

TBD

  • What do we populate the new reference_genome field with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant