[joss] Inconsistent naming causes cli arguments to be ignored #157

sneakers-the-rat · 2023-06-29T06:07:41Z

Trying to write a simple example test using the config and found a bug

Argument in argparse is named tau

Line 21 in 858cd1a

    
           parser.add_argument("--tau", default=0.5, type=float, help=f"{argdoc.TAU}. Defaults to 0.5")

passed to from_dict, which looks for tau_active instead, replacing whatever i feed to it with the default:

diart/src/diart/blocks/config.py

Line 110 in 858cd1a

tau = utils.get(data, "tau_active", None)

Same is true of rho -> rho_update and delta -> delta_new and max-speakers -> max_speakers and cpu -> device

Some additional things i noticed in these modules that aren't bugs per se but just some tips:

The initialization logic is written twice, once in __init__ and once in from_dict without clear purpose. this makes for very fragile and bug-prone code. I'm not sure what the from_dict method is supposed to do differently than just calling the class like PipelineConfig(**args) - my expectation as a user would be that they would be identical. In fact this seems to be the cause of the bug at hand - the **kwargs in __init__ is superfluous since they are ignored, but if it wasn't there and I called Pipelineconfig(**args) then I would have gotten an exception from passing an unexpected parameter.
The default values are actually defined three times - another time in the argparser itself. Definitely recommend having a single place these are defined.
you probably want @classmethod here:

diart/src/diart/blocks/config.py

Line 95 in 858cd1a

@staticmethod
try using the abc module for abstract classes like eg. -

diart/src/diart/blocks/config.py

Line 12 in 858cd1a

class BasePipelineConfig:
this signature should probably be data: dict since the method is named "from_dict" -

diart/src/diart/blocks/config.py

Line 30 in 858cd1a

def from_dict(data: Any) -> 'BasePipelineConfig':
dicts already have a get method, eg. {'a': 'dict'}.get('a', None) that works the same as utils.get
the __init__ methods are very heavy for a config class - instantiating two models. If this is just a config file you should probably just store the string values and instantiate the models elsewhere.

part of openjournals/joss-reviews#5266

The text was updated successfully, but these errors were encountered:

juanmc2005 · 2023-06-30T10:24:36Z

Hi @sneakers-the-rat,

tau_active and tau are aliases, you can see in the following lines that it checks for tau_active first, if that fails it looks for tau defaulting to 0.6 (see here). The same applies for the other arguments you mention.
Notice that the cli argument max-speakers is converted to max_speakers automatically by argparse.

I do agree on the comments about the initialization code and the repetitiveness. Do you have any suggestions for a pythonic way of addressing this?

sneakers-the-rat · 2023-06-30T22:34:18Z

Ah, my bad, something was causing my test to fail and it was late so i must not have read carefully.

Yes! I am unsure what the from_dict method is supposed to do differently than the __init__ method - i test that they are identical here: https://github.com/sneakers-the-rat/diart/blob/b1a0ccaa35f8b36aa30f978a4bcb16db69652a42/tests/test_config.py#L35

usually when you have a from_x method it's because the thing you're instantiating from is different in form than how you usually instantiate it - eg. a from_json method might take a path to a .json file with instantiation arguments. In this case, instantiating like MyClass(**{'arg1':'val1', 'arg2':'val2'}) is the same as MyClass(arg1='val1', arg2='val2')

So my suggestion would be to remove the from_dict method and use the same variable names everywhere.

I don't mean to harp on you about a minor thing, I am more focused on showing you how to make the code more testable by removing special cases, minimizing where things can go wrong, etc. I won't be able to review the entire package, so this is just one example of a strategy for avoiding bugs.

juanmc2005 · 2023-10-19T16:21:31Z

Implemented in #189

juanmc2005 added the refactoring Internal design improvements that don't change the API label Jun 30, 2023

juanmc2005 added this to the Version 0.8 milestone Jun 30, 2023

sneakers-the-rat mentioned this issue Oct 4, 2023

[REVIEW]: Diart: A Python Library for Real-Time Speaker Diarization openjournals/joss-reviews#5266

Closed

juanmc2005 mentioned this issue Oct 19, 2023

Remove PipelineConfig.from_dict() #189

Merged

juanmc2005 closed this as completed Oct 19, 2023

juanmc2005 mentioned this issue Oct 26, 2023

Version 0.8 #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[joss] Inconsistent naming causes cli arguments to be ignored #157

[joss] Inconsistent naming causes cli arguments to be ignored #157

sneakers-the-rat commented Jun 29, 2023 •

edited

Loading

juanmc2005 commented Jun 30, 2023

sneakers-the-rat commented Jun 30, 2023

juanmc2005 commented Oct 19, 2023

[joss] Inconsistent naming causes cli arguments to be ignored #157

[joss] Inconsistent naming causes cli arguments to be ignored #157

Comments

sneakers-the-rat commented Jun 29, 2023 • edited Loading

juanmc2005 commented Jun 30, 2023

sneakers-the-rat commented Jun 30, 2023

juanmc2005 commented Oct 19, 2023

sneakers-the-rat commented Jun 29, 2023 •

edited

Loading