-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raredisease configurator - Create services for file creation #4260
base: dev-start-pipelines
Are you sure you want to change the base?
Conversation
from cg.store.store import Store | ||
|
||
|
||
class NextflowConfigFileContentCreator(FileContentCreator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this structure is the same for all (Nextflow) pipelines
from cg.store.models import BedVersion, Case, Sample | ||
from cg.store.store import Store | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The params file needs an implementation per pipeline because each one uses a different pydantic model to parse parameters (L47)
import copy | ||
import re | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are common functions that all params_content_creators will use
|
||
LOG = logging.getLogger(__name__) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file requires one implementation per pipeline because uses a different base model for sample sheet entries (L46)
from cg.store.models import Case | ||
from cg.store.store import Store | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to create another layer of inheritance, as Nextflow pipelines have a big common functionality. The Nextflow pipeline configurations will inherit from this class
cg/services/analysis_starter/configurator/implementations/raredisease.py
Show resolved
Hide resolved
) | ||
|
||
@abstractmethod | ||
def _do_pipeline_specific_actions(self, case_id: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a clear implementation for Tomte, Nallo, Raredisease. For Taxprofiler and RNAFusion, we just need to implement it with pass
cg/services/analysis_starter/configurator/implementations/raredisease.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will continue looking after lunch!
def __init__(self, store: Store, lims: LimsAPI, params: str): | ||
self.store = store | ||
self.lims = lims | ||
self.params = params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "params"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a path to a file in hasta. It is pipeline-specific. We get it from the RareDiseaseConfig class in the CGConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR and probably not something we can change without cooperation from biodev, but it would be nice to have it type hinted as a Path and a name called parameters_file or something.
class FileContentCreator(ABC): | ||
|
||
@abstractmethod | ||
def create(self, case_path: Path) -> any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not quite get this interface. In part, I thought the motivation for having the very general "FileContentCreator" was that it should be reuseable outside of the whole analysis starter context? But then specifying a case_path
as input is a bit too specific for the purpose. It also seems to create issues where we have to get the case_id from the path name, where we really should know the case name already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more on it - what is a FileContentCreator really? Anything that outputs anything that can be put into a file? That seems a bit like having class called "StringWriter" or something? Is it not a bit too abstract to carry meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point. It could be too abstract to use a template method. I will consider this
def _get_case_priority(self, case_id: str) -> str: | ||
"""Get case priority.""" | ||
case: Case = self.store.get_case_by_internal_id(case_id) | ||
return case.slurm_priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about whether this method returns the case priority or the slurm QOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should this be in the nextflowconfigurator? It feels like a store method or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also confused about the return type. I will deal with that later. Maybe it is a good idea to make it a store method
def _create_sample_sheet(self, case_id: str) -> None: | ||
"""Create sample sheet for case.""" | ||
create_file( | ||
content_creator=self.sample_sheet_content_creator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sample sheet content creator seems to be an unresolved attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, as it is pipeline-specific, it will be defined in the children
@@ -14,6 +14,7 @@ class RarediseaseQCMetrics(QCMetrics): | |||
total_reads: int | |||
|
|||
|
|||
# TODO: Move these to models folder in appropriate service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great idea for moving them!
cg/meta/workflow/nf_analysis.py
Outdated
@@ -362,8 +362,8 @@ def create_params_file(self, case_id: str, dry_run: bool) -> None: | |||
) | |||
|
|||
def replace_values_in_params_file(self, workflow_parameters: dict) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand why we continue using the NfAnalysisApi interhiting from the AnalysisApi. I was under the assumption that this whole pattern got reworked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be reworked and we will not be using this code anymore. The deprecation is the last step and we still need this code for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called nextflow_config_creator.py
? and in its own folder like params and samplesheet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the renaming makes sense. The folder may not be needed as there is no other config_content_creator
@@ -1,8 +1,10 @@ | |||
from abc import ABC | |||
|
|||
from cg.services.analysis_starter.configurator.abstract_model import CaseConfig | |||
|
|||
|
|||
class Configurator(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this now I think the configurator can be a wrapper around the detailed implementations of the FileContentCreators perhaps.
Maybe it is worthwhile to evaluate this approach as it might be able to reduce multi-inheritiance of classes which can be cumbersome to go through.
class Configurator(ABC): | |
class Configurator: | |
def __init__(self, path_creator: analysis_path_manager, config_creator: FileContentCreator, params_creator: FileContentCreator, sample_sheet_creator: FileContentCreator): | |
self.path_creator = path_creator | |
self.config_creator = config_creator | |
self.params_creator = params_creator | |
self.sample_sheet_creator = sample_sheet_creator | |
# other required things such a store, lims etc. | |
# you can give an optional service in case there is something mega special to setup for a pipeline | |
def create_configurations(self, case_id) -> CaseConfig: | |
self.config_creator.create(case_id) # let them internally figure out what they need based on the case | |
self.params_creator.create(case_id) | |
self.sample_sheet_creator(case_id) | |
return CaseConfig #however this gets created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configurator is also the parent of all the non-Nextflow pipelines, which unfortunately don't follow this pattern. There are no config, sample sheets nor params files in MicroSALT, for example
def _get_case_workflow(self, case_id: str) -> Workflow: | ||
"""Get case workflow.""" | ||
case: Case = self.store.get_case_by_internal_id(case_id) | ||
return Workflow(case.data_analysis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
|
Description
Alternate version of #4255
Need to solve: The Nextflow configurators need to create the following files:
Some pipelines need to create more files.
The process of file creation follows a common structure:
The content creation is specific for each pipeline and for each file type.
The file creation process was defined in the 3 steps mentioned above. In its abstract form, content creation (step 2) requires a class called FileContentCreator, which is implemented for every pipeline and file type.
Added
Changed
Fixed
How to prepare for test
us
paxa
How to test
Expected test outcome
Review
Thanks for filling in who performed the code review and the test!
This version is a
Implementation Plan