🐛 Inaccurate full generated pipeline configs #1779

shnizzedy · 2022-09-23T21:22:34Z

Describe the bug

This function

Lines 11 to 316 in 9586d71

    
           def create_yaml_from_template( 
        
               d, template=DEFAULT_PIPELINE_FILE, include_all=False 
        
           ): 
        
               """Save dictionary to a YAML file, keeping the structure 
        
               (such as first level comments and ordering) from the template 
        
               It may not be fully robust to YAML structures, but it works 
        
               for C-PAC config files! 
        
               Parameters 
        
               ---------- 
        
               d : dict or Configuration 
        
               template : str 
        
                   path to template or name of preconfig 
        
               include_all : bool 
        
                   include every key, even those that are unchanged 
        
               Examples 
        
               -------- 
        
               >>> import yaml 
        
               >>> from CPAC.utils.configuration import Configuration 
        
               >>> Configuration(yaml.safe_load(create_yaml_from_template({}))).dict( 
        
               ...     ) == Configuration({}).dict() 
        
               True 
        
               >>> Configuration(yaml.safe_load(create_yaml_from_template({}, 
        
               ...     template='Lil BUB'))) 
        
               Traceback (most recent call last): 
        
                   ... 
        
               ValueError: 'Lil BUB' is not a valid path nor a defined preconfig. 
        
               """  # noqa: E501 # pylint: disable=line-too-long 
        
               def _count_indent(line): 
        
                   '''Helper method to determine indentation level 
        
                   Parameters 
        
                   ---------- 
        
                   line : str 
        
                   Returns 
        
                   ------- 
        
                   number_of_indents : int 
        
                   Examples 
        
                   -------- 
        
                   >>> _count_indent('No indent') 
        
                   0 
        
                   >>> _count_indent('    Four spaces') 
        
                   2 
        
                   ''' 
        
                   return (len(line) - len(line.lstrip())) // 2 
        
               def _create_import_dict(diff): 
        
                   '''Method to return a dict of only changes given a nested dict 
        
                   of (dict1_value, dict2_value) tuples 
        
                   Parameters 
        
                   ---------- 
        
                   diff : dict 
        
                       output of `dct_diff` 
        
                   Returns 
        
                   ------- 
        
                   dict 
        
                       dict of only changed values 
        
                   Examples 
        
                   -------- 
        
                   >>> _create_import_dict({'anatomical_preproc': { 
        
                   ...     'brain_extraction': {'extraction': { 
        
                   ...         'run': ([True], False), 
        
                   ...         'using': (['3dSkullStrip'], ['niworkflows-ants'])}}}}) 
        
                   {'anatomical_preproc': {'brain_extraction': {'extraction': {'run': False, 'using': ['niworkflows-ants']}}}} 
        
                   '''  # noqa: E501  # pylint: disable=line-too-long 
        
                   if isinstance(diff, tuple) and len(diff) == 2: 
        
                       return diff[1] 
        
                   if isinstance(diff, dict): 
        
                       i = {} 
        
                       for k in diff: 
        
                           try: 
        
                               j = _create_import_dict(diff[k]) 
        
                               if j != {}: 
        
                                   i[k] = j 
        
                           except KeyError: 
        
                               continue 
        
                       return i 
        
                   return diff 
        
               def _format_key(key, level): 
        
                   '''Helper method to format YAML keys 
        
                   Parameters 
        
                   ---------- 
        
                   key : str 
        
                   level : int 
        
                   Returns 
        
                   ------- 
        
                   yaml : str 
        
                   Examples 
        
                   -------- 
        
                   >>> _format_key('base', 0) 
        
                   '\nbase: ' 
        
                   >>> _format_key('indented', 2) 
        
                   '\n    indented:' 
        
                   ''' 
        
                   return f'\n{" " * level * 2}{key}: ' 
        
               def _format_list_items(l,  # noqa: E741  # pylint:disable=invalid-name 
        
                                      line_level): 
        
                   '''Helper method to handle lists in the YAML 
        
                   Parameters 
        
                   ---------- 
        
                   l : list 
        
                   line_level : int 
        
                   Returns 
        
                   ------- 
        
                   yaml : str 
        
                   Examples 
        
                   -------- 
        
                   >>> _format_list_items([1, 2, {'nested': 3}], 0) 
        
                   '  - 1\n  - 2\n  - nested: 3' 
        
                   >>> _format_list_items([1, 2, {'nested': [3, {'deep': [4]}]}], 1) 
        
                   '    - 1\n    - 2\n    - nested:\n      - 3\n      - deep:\n        - 4' 
        
                   '''  # noqa: E501  # pylint: disable=line-too-long 
        
                   # keep short, simple lists in square brackets 
        
                   if all(isinstance(item, (str, bool, int, float)) for item in l): 
        
                       if len(str(l)) < 50: 
        
                           return str(l).replace("'", '').replace('"', '') 
        
                   # list long or complex lists on lines with indented '-' lead-ins 
        
                   return '\n' + '\n'.join([ 
        
                       f'{indent(line_level)}{li}' for li in yaml.dump( 
        
                           yaml_bool(l) 
        
                       ).replace("'On'", 'On').replace("'Off'", 'Off').split('\n') 
        
                   ]).rstrip() 
        
               # set starting values 
        
               output = '' 
        
               comment = '' 
        
               space_match = r'^\s+.*' 
        
               level = 0 
        
               nest = [] 
        
               list_item = False 
        
               list_level = 0 
        
               line_level = 0 
        
               template_name = template 
        
               if isinstance(d, Configuration): 
        
                   d = d.dict() 
        
               try: 
        
                   template = load_preconfig(template) 
        
               except BadParameter as bad_parameter: 
        
                   if 'default' in template.lower(): 
        
                       template = DEFAULT_PIPELINE_FILE 
        
                   if not os.path.exists(template) or os.path.islink(template): 
        
                       raise ValueError(f'\'{template_name}\' is not a valid path nor a ' 
        
                                        'defined preconfig.') from bad_parameter 
        
               template_included = False 
        
               # load default values 
        
               d_default = Configuration(yaml.safe_load(open(template, 'r'))).dict() 
        
               if ( 
        
                   template == DEFAULT_PIPELINE_FILE or 
        
                   not dct_diff( 
        
                       yaml.safe_load(open(DEFAULT_PIPELINE_FILE, 'r')), d_default) 
        
               ): 
        
                   template_name = 'default' 
        
               # update values 
        
               if include_all: 
        
                   d_default.update(d) 
        
                   d = _create_import_dict(dct_diff({}, d_default)) 
        
               else: 
        
                   d = _create_import_dict(dct_diff(d_default, d)) 
        
               # generate YAML from template with updated values 
        
               template_dict = yaml.safe_load(open(template, 'r')) 
        
               with open(template, 'r') as f: 
        
                   for line in f: 
        
                       # persist comments and frontmatter 
        
                       if line.startswith('%') or line.startswith('---') or re.match( 
        
                           r'^\s*#.*$', line 
        
                       ): 
        
                           list_item = False 
        
                           line = line.strip('\n') 
        
                           comment += f'\n{line}' 
        
                       elif len(line.strip()): 
        
                           if re.match(space_match, line): 
        
                               line_level = _count_indent(line) 
        
                           else: 
        
                               line_level = 0 
        
                           # handle lists as a unit 
        
                           if list_item: 
        
                               if line_level < list_level - 1: 
        
                                   list_item = False 
        
                                   level = list_level 
        
                                   list_level = 0 
        
                           elif line.lstrip().startswith('-'): 
        
                               list_item = True 
        
                               list_level = line_level - 1 
        
                           else: 
        
                               # extract dict key 
        
                               key_group = re.match( 
        
                                   r'^\s*(([a-z0-9A-Z_]+://){0,1}' 
        
                                   r'[a-z0-9A-Z_/][\sa-z0-9A-Z_/\.-]+)\s*:', line) 
        
                               if key_group: 
        
                                   if not template_included: 
        
                                       # prepend comment from template 
        
                                       if len(comment.strip()): 
        
                                           comment = re.sub( 
        
                                               r'(?<=# based on )(.* pipeline)', 
        
                                               f'{template_name} pipeline', 
        
                                               comment 
        
                                           ) 
        
                                           output += comment 
        
                                           output += f'\nFROM: {template_name}\n' 
        
                                           comment = '' 
        
                                       template_included = True 
        
                                   key = key_group.group(1).strip() 
        
                                   # calculate key depth 
        
                                   if line_level == level: 
        
                                       if level > 0: 
        
                                           nest = nest[:-1] + [key] 
        
                                       else: 
        
                                           nest = [key] 
        
                                   elif line_level == level + 1: 
        
                                       nest += [key] 
        
                                   elif line_level < level: 
        
                                       nest = nest[:line_level] + [key] 
        
                                   # only include updated and new values 
        
                                   try: 
        
                                       # get updated value for key 
        
                                       value = lookup_nested_value(d, nest) 
        
                                       orig_value = lookup_nested_value(d_default, nest) 
        
                                       # Use 'On' and 'Off' for bools 
        
                                       if (isinstance(orig_value, bool) or ( 
        
                                           isinstance(orig_value, str) and 
        
                                           orig_value in {'On', 'Off'} 
        
                                       ) or (isinstance(orig_value, list) and all([( 
        
                                           isinstance(orig_item, bool) or ( 
        
                                               isinstance(orig_item, str) and 
        
                                               orig_item in {'On', 'Off'} 
        
                                           ) 
        
                                       ) for orig_item in orig_value]) 
        
                                       )): 
        
                                           value = yaml_bool(value) 
        
                                       # prepend comment from template 
        
                                       if len(comment.strip()): 
        
                                           output += comment 
        
                                       else: 
        
                                           output += '\n' 
        
                                       # write YAML 
        
                                       output += _format_key(key, line_level) 
        
                                       if isinstance(value, list): 
        
                                           output += _format_list_items( 
        
                                               value, line_level) 
        
                                       elif isinstance(value, dict): 
        
                                           for k in value.keys(): 
        
                                               try: 
        
                                                   lookup_nested_value(template_dict, 
        
                                                                       nest + [k]) 
        
                                               # include keys not in template 
        
                                               except KeyError: 
        
                                                   output += _format_key( 
        
                                                       k, line_level + 1) 
        
                                                   parsed = _format_list_items( 
        
                                                       value[k], 
        
                                                       line_level + 1 
        
                                                   ) if isinstance( 
        
                                                       value[k], list) else yaml_bool( 
        
                                                           value[k]) 
        
                                                   if isinstance(parsed, (int, float)): 
        
                                                       parsed = str(parsed) 
        
                                                   elif parsed is None: 
        
                                                       parsed = '' 
        
                                                   output += parsed if ( 
        
                                                       isinstance(parsed, str) 
        
                                                   ) else ( 
        
                                                       '\n' + indent(line_level + 1) + 
        
                                                       f'\n{indent(line_level + 1)}' + 
        
                                                       yaml.dump(parsed) + '\n') 
        
                                       else: 
        
                                           output += str(value) 
        
                                   except KeyError: 
        
                                       # clear comment for excluded key 
        
                                       comment = '\n' 
        
                                   # reset variables for loop 
        
                                   comment = '\n' 
        
                                   level = line_level 
        
                       elif len(comment) > 1 and comment[-2] != '\n': 
        
                           comment += '\n' 
        
               while '\n\n\n' in output: 
        
                   output = output.replace('\n\n\n', '\n\n') 
        
               return output.lstrip('\n').replace('null', '')

that generates a YAML file with comments from a template is not working fully correctly.

As far as I've been able to tell:

C-PAC/CPAC/utils/utils.py

Lines 2301 to 2425 in 9586d71

    
           def update_pipeline_values_1_8(d_old): 
        
               '''Function to update pipeline config values that changed from 
        
               C-PAC 1.7 to 1.8. 
        
               Parameters 
        
               ---------- 
        
               d_old : dict 
        
               Returns 
        
               ------- 
        
               d : dict 
        
                  updated 
        
               Examples 
        
               -------- 
        
               >>> update_pipeline_values_1_8({'segmentation': {'tissue_segmentation': { 
        
               ...     'using': ['FSL-FAST Thresholding', 'Customized Thresholding']}}}) 
        
               {'segmentation': {'tissue_segmentation': {'using': ['FSL-FAST'], 'FSL-FAST': {'thresholding': {'use': 'Custom'}}}}} 
        
               >>> update_pipeline_values_1_8({'segmentation': {'tissue_segmentation': { 
        
               ...     'using': ['FSL-FAST Thresholding']}}}) 
        
               {'segmentation': {'tissue_segmentation': {'using': ['FSL-FAST'], 'FSL-FAST': {'thresholding': {'use': 'Auto'}}}}} 
        
               '''  # noqa: E501  # pylint: disable=line-too-long 
        
               from CPAC.pipeline.schema import valid_options \ 
        
                   # pylint: disable=import-outside-toplevel 
        
               d = replace_in_strings(d_old.copy()) 
        
               d = _replace_changed_values( 
        
                   d, 
        
                   ['anatomical_preproc', 'brain_extraction', 'using'], 
        
                   [('AFNI', '3dSkullStrip'), ('FSL', 'BET'), ('unet', 'UNet')] 
        
               ) 
        
               d = _replace_changed_values( 
        
                   d, 
        
                   ['functional_preproc', 'func_masking', 'using'], 
        
                   [('3dAutoMask', 'AFNI'), ('BET', 'FSL')] 
        
               ) 
        
               try: 
        
                   seg_use_threshold = lookup_nested_value(d, [ 
        
                       'segmentation', 'tissue_segmentation', 'using']) 
        
               except KeyError: 
        
                   seg_use_threshold = [] 
        
               if not isinstance(seg_use_threshold, list): 
        
                   seg_use_threshold = [seg_use_threshold] 
        
               if 'FSL-FAST Thresholding' in seg_use_threshold: 
        
                   if 'using' in d['segmentation'].get( 
        
                       'tissue_segmentation', {} 
        
                   ): 
        
                       d['segmentation'][ 
        
                           'tissue_segmentation' 
        
                       ]['using'].append('FSL-FAST') 
        
                   else: 
        
                       d = set_nested_value(d, [ 
        
                           'segmentation', 'tissue_segmentation', 
        
                           'using'], ['FSL-FAST']) 
        
                   seg_use_threshold.remove('FSL-FAST Thresholding') 
        
               if 'Customized Thresholding' in seg_use_threshold: 
        
                   seg_use_threshold.remove('Customized Thresholding') 
        
                   d = set_nested_value(d, [ 
        
                       'segmentation', 'tissue_segmentation', 
        
                       'FSL-FAST', 'thresholding', 'use'], 'Custom') 
        
               else: 
        
                   d = set_nested_value(d, [ 
        
                       'segmentation', 'tissue_segmentation', 
        
                       'FSL-FAST', 'thresholding', 'use'], 'Auto') 
        
               for centr in ['degree_centrality', 'eigenvector_centrality', 
        
                             'local_functional_connectivity_density']: 
        
                   centr_keys = ['network_centrality', centr, 'weight_options'] 
        
                   try: 
        
                       centr_value = lookup_nested_value(d, centr_keys) 
        
                       if any(isinstance(v, bool) for v in centr_value): 
        
                           for i in range(2): 
        
                               if centr_value[i] is True: 
        
                                   centr_value[i] = valid_options['centrality'][ 
        
                                       'weight_options'][i] 
        
                           while False in centr_value: 
        
                               centr_value.remove(False) 
        
                           d = set_nested_value(d, centr_keys, centr_value) 
        
                   except KeyError: 
        
                       continue 
        
               seg_template_key = [ 
        
                   'segmentation', 'tissue_segmentation', 
        
                   'Template_Based', 'template_for_segmentation'] 
        
               try: 
        
                   seg_template = lookup_nested_value(d, seg_template_key) 
        
                   for replacement in [ 
        
                       ('EPI_template', valid_options['segmentation']['template'][0]), 
        
                       ('T1_template', valid_options['segmentation']['template'][1]) 
        
                   ]: 
        
                       seg_template = list_item_replace(seg_template, *replacement) 
        
                   while 'Off' in seg_template: 
        
                       seg_template.remove('Off') 
        
                   while False in seg_template: 
        
                       seg_template.remove(False) 
        
                   d = set_nested_value(d, seg_template_key, seg_template) 
        
                   d = remove_None(d, seg_template_key) 
        
               except KeyError: 
        
                   pass 
        
               distcor_key = ['functional_preproc', 'distortion_correction', 'using'] 
        
               try: 
        
                   lookup_nested_value(d, distcor_key) 
        
                   d = remove_None(d, distcor_key) 
        
               except KeyError: 
        
                   pass 
        
               if 'functional_registration' in d and isinstance( 
        
                   d['functional_registration'], dict 
        
               ): 
        
                   if '1-coregistration' in d['functional_registration']: 
        
                       coreg = d['functional_registration'].pop('1-coregistration') 
        
                       d = set_nested_value( 
        
                           d, ['registration_workflows', 'functional_registration', 
        
                               'coregistration'], 
        
                           coreg 
        
                       ) 
        
                   if not(bool(d['functional_registration'])): 
        
                       d.pop('functional_registration') 
        
               return update_values_from_list(d)

seems to compensate for most invalid configs that arise from the bug

The bug seems to occur when a list is followed by a top-level key (or maybe whenever a list is followed by a key more than one level higher in the hierarchy)

To reproduce

No response

Preconfig

Custom pipeline configuration

No response

Run command

No response

Expected behavior

Generated pipeline YAML matches the pipeline config that was run exactly

Acceptance criteria

Generated pipeline YAML matches the imported pipeline config exactly
CI test added to check for 👆

Screenshots

C-PAC version

v1.8.5-dev

Container platform

Singularity

Docker and/or Singularity version(s)

Singularity v3.8.7-1.el8

Additional context

Longterm, I think it'll be better to move the comments into the node blocks for a SSOT and draw the config comments from the node block comments, rather than generating them from the default template, but I think for now we can just patch this function and add a CI test.

shnizzedy added bug user-reported labels Sep 23, 2022

shnizzedy added 2 - High Priority RBC https://github.com/PennLINC/RBC labels Oct 7, 2022

shnizzedy added this to RBC to-dos Oct 7, 2022

shnizzedy moved this to Todo in RBC to-dos Oct 7, 2022

shnizzedy mentioned this issue Oct 17, 2022

🐛 Fix minified pipelines #1801

Merged

8 tasks

shnizzedy mentioned this issue Nov 22, 2022

✅ Add unit tests for SCA config parsing #1850

Merged

8 tasks

sgiavasis added this to the 1.8.5 release milestone Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Inaccurate full generated pipeline configs #1779

🐛 Inaccurate full generated pipeline configs #1779

shnizzedy commented Sep 23, 2022 •

edited

Loading

🐛 Inaccurate full generated pipeline configs #1779

🐛 Inaccurate full generated pipeline configs #1779

Comments

shnizzedy commented Sep 23, 2022 • edited Loading

Describe the bug

To reproduce

Preconfig

Custom pipeline configuration

Run command

Expected behavior

Acceptance criteria

Screenshots

C-PAC version

Container platform

Docker and/or Singularity version(s)

Additional context

shnizzedy commented Sep 23, 2022 •

edited

Loading