Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[joss] Inconsistent naming causes cli arguments to be ignored #157

Closed
sneakers-the-rat opened this issue Jun 29, 2023 · 3 comments
Closed
Labels
refactoring Internal design improvements that don't change the API
Milestone

Comments

@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Jun 29, 2023

Trying to write a simple example test using the config and found a bug

Argument in argparse is named tau

parser.add_argument("--tau", default=0.5, type=float, help=f"{argdoc.TAU}. Defaults to 0.5")

passed to from_dict, which looks for tau_active instead, replacing whatever i feed to it with the default:

tau = utils.get(data, "tau_active", None)

Same is true of rho -> rho_update and delta -> delta_new and max-speakers -> max_speakers and cpu -> device

Some additional things i noticed in these modules that aren't bugs per se but just some tips:

  • The initialization logic is written twice, once in __init__ and once in from_dict without clear purpose. this makes for very fragile and bug-prone code. I'm not sure what the from_dict method is supposed to do differently than just calling the class like PipelineConfig(**args) - my expectation as a user would be that they would be identical. In fact this seems to be the cause of the bug at hand - the **kwargs in __init__ is superfluous since they are ignored, but if it wasn't there and I called Pipelineconfig(**args) then I would have gotten an exception from passing an unexpected parameter.
  • The default values are actually defined three times - another time in the argparser itself. Definitely recommend having a single place these are defined.
  • you probably want @classmethod here:
    @staticmethod
  • try using the abc module for abstract classes like eg. -
    class BasePipelineConfig:
  • this signature should probably be data: dict since the method is named "from_dict" -
    def from_dict(data: Any) -> 'BasePipelineConfig':
  • dicts already have a get method, eg. {'a': 'dict'}.get('a', None) that works the same as utils.get
  • the __init__ methods are very heavy for a config class - instantiating two models. If this is just a config file you should probably just store the string values and instantiate the models elsewhere.

part of openjournals/joss-reviews#5266

@juanmc2005
Copy link
Owner

Hi @sneakers-the-rat,

tau_active and tau are aliases, you can see in the following lines that it checks for tau_active first, if that fails it looks for tau defaulting to 0.6 (see here). The same applies for the other arguments you mention.
Notice that the cli argument max-speakers is converted to max_speakers automatically by argparse.

I do agree on the comments about the initialization code and the repetitiveness. Do you have any suggestions for a pythonic way of addressing this?

@juanmc2005 juanmc2005 added the refactoring Internal design improvements that don't change the API label Jun 30, 2023
@juanmc2005 juanmc2005 added this to the Version 0.8 milestone Jun 30, 2023
@sneakers-the-rat
Copy link
Contributor Author

Ah, my bad, something was causing my test to fail and it was late so i must not have read carefully.

Yes! I am unsure what the from_dict method is supposed to do differently than the __init__ method - i test that they are identical here: https://github.com/sneakers-the-rat/diart/blob/b1a0ccaa35f8b36aa30f978a4bcb16db69652a42/tests/test_config.py#L35

usually when you have a from_x method it's because the thing you're instantiating from is different in form than how you usually instantiate it - eg. a from_json method might take a path to a .json file with instantiation arguments. In this case, instantiating like MyClass(**{'arg1':'val1', 'arg2':'val2'}) is the same as MyClass(arg1='val1', arg2='val2')

So my suggestion would be to remove the from_dict method and use the same variable names everywhere.

I don't mean to harp on you about a minor thing, I am more focused on showing you how to make the code more testable by removing special cases, minimizing where things can go wrong, etc. I won't be able to review the entire package, so this is just one example of a strategy for avoiding bugs.

@juanmc2005
Copy link
Owner

Implemented in #189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Internal design improvements that don't change the API
Projects
None yet
Development

No branches or pull requests

2 participants