Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some methods and initialize task_config in task class #29

Closed

Conversation

DavidNew-NOAA
Copy link
Contributor

@DavidNew-NOAA DavidNew-NOAA commented Jun 4, 2024

Description

This PR causes the the "task_config" member AttrDict object to be created in the init method of the "Task" class. Previously, "task_config" has been created separately, but all applications that utilize the "Task" class expect "task_config" to exist, so it might as well be created at initialization. Example of task_config creation: https://github.com/NOAA-EMC/global-workflow/blob/67b833e0c7bc390865d453588b4609a1a7ede981/ush/python/pygfs/task/atm_analysis.py#L59

This PR also adds two class methods to the "Task" class:

  1. j2yaml_to_filehandler: Parses a Jinja2-templated YAML and then passes the resulting dictionary to the file wxflow FileHandler. This is a common task when staging files for various jobs that can be consolidated into a class method. Example: https://github.com/NOAA-EMC/global-workflow/blob/67b833e0c7bc390865d453588b4609a1a7ede981/ush/python/pygfs/task/atm_analysis.py#L78
  2. extend_task_config: Takes a dictionary or AttrDict object and merges it with the "task_config". This is a common task done when initializing Task objects. Example: https://github.com/NOAA-EMC/global-workflow/blob/67b833e0c7bc390865d453588b4609a1a7ede981/ush/python/pygfs/task/atm_analysis.py#L59

This PR is part of a broader refactoring of pygfs in the Global Workflow.

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • pynorms
  • pytests

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 25.00000% with 12 lines in your changes missing coverage. Please review.

Project coverage is 47.91%. Comparing base (8406bee) to head (7b5f330).
Report is 8 commits behind head on develop.

Files with missing lines Patch % Lines
src/wxflow/task.py 25.00% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #29      +/-   ##
===========================================
- Coverage    47.93%   47.91%   -0.03%     
===========================================
  Files           18       18              
  Lines         1650     1657       +7     
  Branches       335      336       +1     
===========================================
+ Hits           791      794       +3     
- Misses         799      803       +4     
  Partials        60       60              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DavidNew-NOAA DavidNew-NOAA marked this pull request as ready for review June 4, 2024 18:22
Comment on lines 107 to 112
def stage_files(self, path: str) -> None:
"""
Pass dictionary, created by parsing Jinja2-templated YAML, to file handler
"""

FileHandler(parse_j2yaml(path, self.task_config)).sync()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this belongs here in this form.
If we want a stage method in the task, its input should be the a fully formed list and not something that needs to be rendered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's basically what FileHandler does. Theres so many places where we parse a jinja2-templated yaml and then pass to the filehandler, I figured it would be useful to make a class method out of it. I can remove it though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But, if there is an issue in the parsing, and one needs to print the resolved yaml, one will have to come in wxflow and do debugging.
Besides, passing only the necessary keys to the yaml path, rather than the entire task_config. would help in debugging.

@@ -60,6 +62,9 @@ def __init__(self, config: Dict, *args, **kwargs):
self.runtime_config['previous_cycle'] = add_to_datetime(self.runtime_config.current_cycle, -to_timedelta(f"{self.config['assim_freq']}H"))
logger.debug(f"previous cycle: {self.runtime_config['previous_cycle']}")

# Combine config and runtime_config into single task_config attribute-dictionary
self.task_config = AttrDict(**self.config, **self.runtime_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not force users to strictly use only self.task_config
I think we should make config and runtime_config internal (private) and only return self.task_config.

Change:
L35:

self._config = AttrDict(config)

and L44

self._runtime_config = AttrDict()

and then update the script to use these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good idea. I will have to make a companion PR for the Global Workflow though since config and runtime_config are referenced in several places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Unfortunately, it wasn't caught in the global-workflow at the time this happened, but this is necessary before we promote too many kinds of _config's.

Comment on lines 100 to 105
def extend_task_config(self, local_dict: Union[Dict, AttrDict]) -> None:
"""
Extend task_config attribute-dictionary with another dictionary
"""

self.task_config = AttrDict(**self.task_config, **local_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks provenance.
The intention is to create a task_config in the init. By allowing this, users can add/edit/remove items from task_config and it gets increasingly difficult to find out where what was added/removed.
As much as possible, we should extract only required pieces of information. If we turn these into global variables, it can get complicated quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users are already doing this when they create task_config in places like the atm_analysis.py (https://github.com/NOAA-EMC/global-workflow/blob/205d0c2b13e2d7755cec75bf8c978ab20d453862/ush/python/pygfs/task/atm_analysis.py#L59) where config, runtime_config, and local_dict are combined into one task config. Even if I remove this method, there's nothing stopping someone from modifying task_config.

One possibility is to make task_config also private but accessible (read-only) using a public method, something like mytask.get_task_config()['foo']. In the case of the atm_analysis.py subclass __init__, one can just initialize it with like so:

super().__init__( AttrDict( **config, **local_dict ) )

and instead just create local_dict first before calling the above. Thus the user has one shot to add to task_config before it becomes read-only.

@DavidNew-NOAA DavidNew-NOAA marked this pull request as draft June 6, 2024 15:09
@DavidNew-NOAA DavidNew-NOAA deleted the feature/task_class_utils branch June 7, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants