Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raredisease configurator - Create services for file creation #4260

Draft
wants to merge 18 commits into
base: dev-start-pipelines
Choose a base branch
from

Conversation

diitaz93
Copy link
Contributor

@diitaz93 diitaz93 commented Mar 3, 2025

Description

Alternate version of #4255

Need to solve: The Nextflow configurators need to create the following files:

  1. Sample sheet
  2. Config file
  3. Params file

Some pipelines need to create more files.

The process of file creation follows a common structure:

  1. Define a Path
  2. Create content
  3. Write content to a file in the path

The content creation is specific for each pipeline and for each file type.

The file creation process was defined in the 3 steps mentioned above. In its abstract form, content creation (step 2) requires a class called FileContentCreator, which is implemented for every pipeline and file type.

Added

Changed

Fixed

How to prepare for test

  • Ssh to relevant server (depending on type of change)
  • Use stage: us
  • Paxa the environment: paxa
  • Install on stage (example for Hasta):
    bash /home/proj/production/servers/resources/hasta.scilifelab.se/update-tool-stage.sh -e S_cg -t cg -b [THIS-BRANCH-NAME] -a

How to test

  • Do ...

Expected test outcome

  • Check that ...
  • Take a screenshot and attach or copy/paste the output.

Review

  • Tests executed by
  • "Merge and deploy" approved by
    Thanks for filling in who performed the code review and the test!

This version is a

  • MAJOR - when you make incompatible API changes
  • MINOR - when you add functionality in a backwards compatible manner
  • PATCH - when you make backwards compatible bug fixes or documentation/instructions

Implementation Plan

  • Document in ...
  • Deploy this branch on ...
  • Inform to ...

from cg.store.store import Store


class NextflowConfigFileContentCreator(FileContentCreator):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this structure is the same for all (Nextflow) pipelines

from cg.store.models import BedVersion, Case, Sample
from cg.store.store import Store


Copy link
Contributor Author

@diitaz93 diitaz93 Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The params file needs an implementation per pipeline because each one uses a different pydantic model to parse parameters (L47)

import copy
import re


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are common functions that all params_content_creators will use


LOG = logging.getLogger(__name__)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file requires one implementation per pipeline because uses a different base model for sample sheet entries (L46)

from cg.store.models import Case
from cg.store.store import Store


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to create another layer of inheritance, as Nextflow pipelines have a big common functionality. The Nextflow pipeline configurations will inherit from this class

)

@abstractmethod
def _do_pipeline_specific_actions(self, case_id: str) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has a clear implementation for Tomte, Nallo, Raredisease. For Taxprofiler and RNAFusion, we just need to implement it with pass

Copy link
Contributor

@islean islean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue looking after lunch!

def __init__(self, store: Store, lims: LimsAPI, params: str):
self.store = store
self.lims = lims
self.params = params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "params"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a path to a file in hasta. It is pipeline-specific. We get it from the RareDiseaseConfig class in the CGConfig

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR and probably not something we can change without cooperation from biodev, but it would be nice to have it type hinted as a Path and a name called parameters_file or something.

Comment on lines +5 to +8
class FileContentCreator(ABC):

@abstractmethod
def create(self, case_path: Path) -> any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not quite get this interface. In part, I thought the motivation for having the very general "FileContentCreator" was that it should be reuseable outside of the whole analysis starter context? But then specifying a case_path as input is a bit too specific for the purpose. It also seems to create issues where we have to get the case_id from the path name, where we really should know the case name already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more on it - what is a FileContentCreator really? Anything that outputs anything that can be put into a file? That seems a bit like having class called "StringWriter" or something? Is it not a bit too abstract to carry meaning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good point. It could be too abstract to use a template method. I will consider this

Comment on lines +87 to +90
def _get_case_priority(self, case_id: str) -> str:
"""Get case priority."""
case: Case = self.store.get_case_by_internal_id(case_id)
return case.slurm_priority
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about whether this method returns the case priority or the slurm QOS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should this be in the nextflowconfigurator? It feels like a store method or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also confused about the return type. I will deal with that later. Maybe it is a good idea to make it a store method

def _create_sample_sheet(self, case_id: str) -> None:
"""Create sample sheet for case."""
create_file(
content_creator=self.sample_sheet_content_creator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sample sheet content creator seems to be an unresolved attribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as it is pipeline-specific, it will be defined in the children

@@ -14,6 +14,7 @@ class RarediseaseQCMetrics(QCMetrics):
total_reads: int


# TODO: Move these to models folder in appropriate service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great idea for moving them!

@@ -362,8 +362,8 @@ def create_params_file(self, case_id: str, dry_run: bool) -> None:
)

def replace_values_in_params_file(self, workflow_parameters: dict) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why we continue using the NfAnalysisApi interhiting from the AnalysisApi. I was under the assumption that this whole pattern got reworked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be reworked and we will not be using this code anymore. The deprecation is the last step and we still need this code for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called nextflow_config_creator.py? and in its own folder like params and samplesheet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the renaming makes sense. The folder may not be needed as there is no other config_content_creator

@@ -1,8 +1,10 @@
from abc import ABC

from cg.services.analysis_starter.configurator.abstract_model import CaseConfig


class Configurator(ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now I think the configurator can be a wrapper around the detailed implementations of the FileContentCreators perhaps.
Maybe it is worthwhile to evaluate this approach as it might be able to reduce multi-inheritiance of classes which can be cumbersome to go through.

Suggested change
class Configurator(ABC):
class Configurator:
def __init__(self, path_creator: analysis_path_manager, config_creator: FileContentCreator, params_creator: FileContentCreator, sample_sheet_creator: FileContentCreator):
self.path_creator = path_creator
self.config_creator = config_creator
self.params_creator = params_creator
self.sample_sheet_creator = sample_sheet_creator
# other required things such a store, lims etc.
# you can give an optional service in case there is something mega special to setup for a pipeline
def create_configurations(self, case_id) -> CaseConfig:
self.config_creator.create(case_id) # let them internally figure out what they need based on the case
self.params_creator.create(case_id)
self.sample_sheet_creator(case_id)
return CaseConfig #however this gets created

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configurator is also the parent of all the non-Nextflow pipelines, which unfortunately don't follow this pattern. There are no config, sample sheets nor params files in MicroSALT, for example

Comment on lines +92 to +95
def _get_case_workflow(self, case_id: str) -> Workflow:
"""Get case workflow."""
case: Case = self.store.get_case_by_internal_id(case_id)
return Workflow(case.data_analysis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link

sonarqubecloud bot commented Mar 5, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants