Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Inaccurate full generated pipeline configs #1779

Open
16 tasks
shnizzedy opened this issue Sep 23, 2022 · 0 comments
Open
16 tasks

🐛 Inaccurate full generated pipeline configs #1779

shnizzedy opened this issue Sep 23, 2022 · 0 comments
Labels
2 - High Priority bug RBC https://github.com/PennLINC/RBC user-reported
Milestone

Comments

@shnizzedy
Copy link
Member

shnizzedy commented Sep 23, 2022

Describe the bug

This function

def create_yaml_from_template(
d, template=DEFAULT_PIPELINE_FILE, include_all=False
):
"""Save dictionary to a YAML file, keeping the structure
(such as first level comments and ordering) from the template
It may not be fully robust to YAML structures, but it works
for C-PAC config files!
Parameters
----------
d : dict or Configuration
template : str
path to template or name of preconfig
include_all : bool
include every key, even those that are unchanged
Examples
--------
>>> import yaml
>>> from CPAC.utils.configuration import Configuration
>>> Configuration(yaml.safe_load(create_yaml_from_template({}))).dict(
... ) == Configuration({}).dict()
True
>>> Configuration(yaml.safe_load(create_yaml_from_template({},
... template='Lil BUB')))
Traceback (most recent call last):
...
ValueError: 'Lil BUB' is not a valid path nor a defined preconfig.
""" # noqa: E501 # pylint: disable=line-too-long
def _count_indent(line):
'''Helper method to determine indentation level
Parameters
----------
line : str
Returns
-------
number_of_indents : int
Examples
--------
>>> _count_indent('No indent')
0
>>> _count_indent(' Four spaces')
2
'''
return (len(line) - len(line.lstrip())) // 2
def _create_import_dict(diff):
'''Method to return a dict of only changes given a nested dict
of (dict1_value, dict2_value) tuples
Parameters
----------
diff : dict
output of `dct_diff`
Returns
-------
dict
dict of only changed values
Examples
--------
>>> _create_import_dict({'anatomical_preproc': {
... 'brain_extraction': {'extraction': {
... 'run': ([True], False),
... 'using': (['3dSkullStrip'], ['niworkflows-ants'])}}}})
{'anatomical_preproc': {'brain_extraction': {'extraction': {'run': False, 'using': ['niworkflows-ants']}}}}
''' # noqa: E501 # pylint: disable=line-too-long
if isinstance(diff, tuple) and len(diff) == 2:
return diff[1]
if isinstance(diff, dict):
i = {}
for k in diff:
try:
j = _create_import_dict(diff[k])
if j != {}:
i[k] = j
except KeyError:
continue
return i
return diff
def _format_key(key, level):
'''Helper method to format YAML keys
Parameters
----------
key : str
level : int
Returns
-------
yaml : str
Examples
--------
>>> _format_key('base', 0)
'\nbase: '
>>> _format_key('indented', 2)
'\n indented:'
'''
return f'\n{" " * level * 2}{key}: '
def _format_list_items(l, # noqa: E741 # pylint:disable=invalid-name
line_level):
'''Helper method to handle lists in the YAML
Parameters
----------
l : list
line_level : int
Returns
-------
yaml : str
Examples
--------
>>> _format_list_items([1, 2, {'nested': 3}], 0)
' - 1\n - 2\n - nested: 3'
>>> _format_list_items([1, 2, {'nested': [3, {'deep': [4]}]}], 1)
' - 1\n - 2\n - nested:\n - 3\n - deep:\n - 4'
''' # noqa: E501 # pylint: disable=line-too-long
# keep short, simple lists in square brackets
if all(isinstance(item, (str, bool, int, float)) for item in l):
if len(str(l)) < 50:
return str(l).replace("'", '').replace('"', '')
# list long or complex lists on lines with indented '-' lead-ins
return '\n' + '\n'.join([
f'{indent(line_level)}{li}' for li in yaml.dump(
yaml_bool(l)
).replace("'On'", 'On').replace("'Off'", 'Off').split('\n')
]).rstrip()
# set starting values
output = ''
comment = ''
space_match = r'^\s+.*'
level = 0
nest = []
list_item = False
list_level = 0
line_level = 0
template_name = template
if isinstance(d, Configuration):
d = d.dict()
try:
template = load_preconfig(template)
except BadParameter as bad_parameter:
if 'default' in template.lower():
template = DEFAULT_PIPELINE_FILE
if not os.path.exists(template) or os.path.islink(template):
raise ValueError(f'\'{template_name}\' is not a valid path nor a '
'defined preconfig.') from bad_parameter
template_included = False
# load default values
d_default = Configuration(yaml.safe_load(open(template, 'r'))).dict()
if (
template == DEFAULT_PIPELINE_FILE or
not dct_diff(
yaml.safe_load(open(DEFAULT_PIPELINE_FILE, 'r')), d_default)
):
template_name = 'default'
# update values
if include_all:
d_default.update(d)
d = _create_import_dict(dct_diff({}, d_default))
else:
d = _create_import_dict(dct_diff(d_default, d))
# generate YAML from template with updated values
template_dict = yaml.safe_load(open(template, 'r'))
with open(template, 'r') as f:
for line in f:
# persist comments and frontmatter
if line.startswith('%') or line.startswith('---') or re.match(
r'^\s*#.*$', line
):
list_item = False
line = line.strip('\n')
comment += f'\n{line}'
elif len(line.strip()):
if re.match(space_match, line):
line_level = _count_indent(line)
else:
line_level = 0
# handle lists as a unit
if list_item:
if line_level < list_level - 1:
list_item = False
level = list_level
list_level = 0
elif line.lstrip().startswith('-'):
list_item = True
list_level = line_level - 1
else:
# extract dict key
key_group = re.match(
r'^\s*(([a-z0-9A-Z_]+://){0,1}'
r'[a-z0-9A-Z_/][\sa-z0-9A-Z_/\.-]+)\s*:', line)
if key_group:
if not template_included:
# prepend comment from template
if len(comment.strip()):
comment = re.sub(
r'(?<=# based on )(.* pipeline)',
f'{template_name} pipeline',
comment
)
output += comment
output += f'\nFROM: {template_name}\n'
comment = ''
template_included = True
key = key_group.group(1).strip()
# calculate key depth
if line_level == level:
if level > 0:
nest = nest[:-1] + [key]
else:
nest = [key]
elif line_level == level + 1:
nest += [key]
elif line_level < level:
nest = nest[:line_level] + [key]
# only include updated and new values
try:
# get updated value for key
value = lookup_nested_value(d, nest)
orig_value = lookup_nested_value(d_default, nest)
# Use 'On' and 'Off' for bools
if (isinstance(orig_value, bool) or (
isinstance(orig_value, str) and
orig_value in {'On', 'Off'}
) or (isinstance(orig_value, list) and all([(
isinstance(orig_item, bool) or (
isinstance(orig_item, str) and
orig_item in {'On', 'Off'}
)
) for orig_item in orig_value])
)):
value = yaml_bool(value)
# prepend comment from template
if len(comment.strip()):
output += comment
else:
output += '\n'
# write YAML
output += _format_key(key, line_level)
if isinstance(value, list):
output += _format_list_items(
value, line_level)
elif isinstance(value, dict):
for k in value.keys():
try:
lookup_nested_value(template_dict,
nest + [k])
# include keys not in template
except KeyError:
output += _format_key(
k, line_level + 1)
parsed = _format_list_items(
value[k],
line_level + 1
) if isinstance(
value[k], list) else yaml_bool(
value[k])
if isinstance(parsed, (int, float)):
parsed = str(parsed)
elif parsed is None:
parsed = ''
output += parsed if (
isinstance(parsed, str)
) else (
'\n' + indent(line_level + 1) +
f'\n{indent(line_level + 1)}' +
yaml.dump(parsed) + '\n')
else:
output += str(value)
except KeyError:
# clear comment for excluded key
comment = '\n'
# reset variables for loop
comment = '\n'
level = line_level
elif len(comment) > 1 and comment[-2] != '\n':
comment += '\n'
while '\n\n\n' in output:
output = output.replace('\n\n\n', '\n\n')
return output.lstrip('\n').replace('null', '')
that generates a YAML file with comments from a template is not working fully correctly.

As far as I've been able to tell:

  1. C-PAC/CPAC/utils/utils.py

    Lines 2301 to 2425 in 9586d71

    def update_pipeline_values_1_8(d_old):
    '''Function to update pipeline config values that changed from
    C-PAC 1.7 to 1.8.
    Parameters
    ----------
    d_old : dict
    Returns
    -------
    d : dict
    updated
    Examples
    --------
    >>> update_pipeline_values_1_8({'segmentation': {'tissue_segmentation': {
    ... 'using': ['FSL-FAST Thresholding', 'Customized Thresholding']}}})
    {'segmentation': {'tissue_segmentation': {'using': ['FSL-FAST'], 'FSL-FAST': {'thresholding': {'use': 'Custom'}}}}}
    >>> update_pipeline_values_1_8({'segmentation': {'tissue_segmentation': {
    ... 'using': ['FSL-FAST Thresholding']}}})
    {'segmentation': {'tissue_segmentation': {'using': ['FSL-FAST'], 'FSL-FAST': {'thresholding': {'use': 'Auto'}}}}}
    ''' # noqa: E501 # pylint: disable=line-too-long
    from CPAC.pipeline.schema import valid_options \
    # pylint: disable=import-outside-toplevel
    d = replace_in_strings(d_old.copy())
    d = _replace_changed_values(
    d,
    ['anatomical_preproc', 'brain_extraction', 'using'],
    [('AFNI', '3dSkullStrip'), ('FSL', 'BET'), ('unet', 'UNet')]
    )
    d = _replace_changed_values(
    d,
    ['functional_preproc', 'func_masking', 'using'],
    [('3dAutoMask', 'AFNI'), ('BET', 'FSL')]
    )
    try:
    seg_use_threshold = lookup_nested_value(d, [
    'segmentation', 'tissue_segmentation', 'using'])
    except KeyError:
    seg_use_threshold = []
    if not isinstance(seg_use_threshold, list):
    seg_use_threshold = [seg_use_threshold]
    if 'FSL-FAST Thresholding' in seg_use_threshold:
    if 'using' in d['segmentation'].get(
    'tissue_segmentation', {}
    ):
    d['segmentation'][
    'tissue_segmentation'
    ]['using'].append('FSL-FAST')
    else:
    d = set_nested_value(d, [
    'segmentation', 'tissue_segmentation',
    'using'], ['FSL-FAST'])
    seg_use_threshold.remove('FSL-FAST Thresholding')
    if 'Customized Thresholding' in seg_use_threshold:
    seg_use_threshold.remove('Customized Thresholding')
    d = set_nested_value(d, [
    'segmentation', 'tissue_segmentation',
    'FSL-FAST', 'thresholding', 'use'], 'Custom')
    else:
    d = set_nested_value(d, [
    'segmentation', 'tissue_segmentation',
    'FSL-FAST', 'thresholding', 'use'], 'Auto')
    for centr in ['degree_centrality', 'eigenvector_centrality',
    'local_functional_connectivity_density']:
    centr_keys = ['network_centrality', centr, 'weight_options']
    try:
    centr_value = lookup_nested_value(d, centr_keys)
    if any(isinstance(v, bool) for v in centr_value):
    for i in range(2):
    if centr_value[i] is True:
    centr_value[i] = valid_options['centrality'][
    'weight_options'][i]
    while False in centr_value:
    centr_value.remove(False)
    d = set_nested_value(d, centr_keys, centr_value)
    except KeyError:
    continue
    seg_template_key = [
    'segmentation', 'tissue_segmentation',
    'Template_Based', 'template_for_segmentation']
    try:
    seg_template = lookup_nested_value(d, seg_template_key)
    for replacement in [
    ('EPI_template', valid_options['segmentation']['template'][0]),
    ('T1_template', valid_options['segmentation']['template'][1])
    ]:
    seg_template = list_item_replace(seg_template, *replacement)
    while 'Off' in seg_template:
    seg_template.remove('Off')
    while False in seg_template:
    seg_template.remove(False)
    d = set_nested_value(d, seg_template_key, seg_template)
    d = remove_None(d, seg_template_key)
    except KeyError:
    pass
    distcor_key = ['functional_preproc', 'distortion_correction', 'using']
    try:
    lookup_nested_value(d, distcor_key)
    d = remove_None(d, distcor_key)
    except KeyError:
    pass
    if 'functional_registration' in d and isinstance(
    d['functional_registration'], dict
    ):
    if '1-coregistration' in d['functional_registration']:
    coreg = d['functional_registration'].pop('1-coregistration')
    d = set_nested_value(
    d, ['registration_workflows', 'functional_registration',
    'coregistration'],
    coreg
    )
    if not(bool(d['functional_registration'])):
    d.pop('functional_registration')
    return update_values_from_list(d)
    seems to compensate for most invalid configs that arise from the bug
  2. The bug seems to occur when a list is followed by a top-level key (or maybe whenever a list is followed by a key more than one level higher in the hierarchy)

To reproduce

No response

Preconfig

  • default
  • abcd-options
  • anat-only
  • blank
  • ccs-options
  • fmriprep-options
  • fx-options
  • monkey
  • monkey-ABCD
  • ndmg
  • nhp-macaque
  • preproc
  • rbc-options
  • rodent

Custom pipeline configuration

No response

Run command

No response

Expected behavior

Generated pipeline YAML matches the pipeline config that was run exactly

Acceptance criteria

  • Generated pipeline YAML matches the imported pipeline config exactly
  • CI test added to check for 👆

Screenshots

missing seed_based_correlation_analysis:

including seed_based_correlation_analysis:

C-PAC version

v1.8.5-dev

Container platform

Singularity

Docker and/or Singularity version(s)

Singularity v3.8.7-1.el8

Additional context

Longterm, I think it'll be better to move the comments into the node blocks for a SSOT and draw the config comments from the node block comments, rather than generating them from the default template, but I think for now we can just patch this function and add a CI test.

@shnizzedy shnizzedy added 2 - High Priority RBC https://github.com/PennLINC/RBC labels Oct 7, 2022
@shnizzedy shnizzedy moved this to Todo in RBC to-dos Oct 7, 2022
@sgiavasis sgiavasis added this to the 1.8.5 release milestone Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - High Priority bug RBC https://github.com/PennLINC/RBC user-reported
Projects
None yet
Development

No branches or pull requests

2 participants