Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Mature Datamodels #88

Open
DropD opened this issue Jan 14, 2025 · 0 comments
Open

More Mature Datamodels #88

DropD opened this issue Jan 14, 2025 · 0 comments
Labels
enhancement New feature or request refactor

Comments

@DropD
Copy link
Collaborator

DropD commented Jan 14, 2025

Problem

The typing in pydantic models created from user config obviously allows for configs to exist which can not successfully be turned into a workflow. In addition, None is often used as a default value, when there seems to be a deeper meaning to "information not provided in the yaml file". Some of the downsides of this are:

  • Looking at the models it is not immediately clear which information is required, has to come from the user, can be read from elsewhere
  • Looking at the models it is not immediately clear what it means if certain values are (apparently) legitimately None. For example, does period=None, start_date != end_date != None mean "run at start date" or "never run" or "run at start and end date"?
  • Any code that operates on instances of models has to check for a whole load of None values if not more possible types for static analysis to be able to confirm it's not doing anything wrong, while many of those checks are not necessary, because validators have already taken care of it.
  • It is very hard to infer from the code what a meaningful unit test should be.

All of these things block us from making use of unit tests and static type analysis, which are major factors to keep up velocity in the mid-to-long term.

Proposed solution:

I propose to solve the problem incrementally by doing the following for each of the data models:

  1. write a doctest example which demonstrates how to obtain the class from yaml
  2. write more extensive unit tests for the desired state after all validators have run
  3. add unit tests for how they are used in workflow.Workflow creation
  4. ask the hard questions about whether None or an empty container is meaningful in attributes and if yes, what the meaning is.
  5. insert more meaningful sentinel values (enums, types) and additional validators as required
  6. insert additional "finalized" data models and canonicalization functionality if validators are not enough
  7. test everything that was added
  8. switch workflow.Workflow building to the new canonicalized types and adapt the unit tests

If at any point any of this requires changes to the existing integration tests, everyone should agree that it is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactor
Projects
None yet
Development

No branches or pull requests

3 participants