-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CLI hint code and types, more CLI hint arguments #66
Conversation
@@ -1,29 +1,43 @@ | |||
from typing_inspect import is_literal_type, get_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's teach the Hint type to have descriptions and be able to generate JSON schema information!
@@ -1,66 +0,0 @@ | |||
"""Defines hints supported by the job config parser.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the duplicative descriptions of hints that are no longer needed!
93.6800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you zoom down to the tests, I had to disable a few tests - we had code and tests supporting hint values for date/time values which don't actually exist in the spec. Now that we're doing validation using the correct list of supported hint values, those tests can't be run.
Adding those hint values makes sense, for sure, though. Adding those hint values and making sure all the different drivers support them correctly is likely to spiral just a bit from this already 56 file PR, so I'm proposing to comment out those tests and leave those hints as unsupported. I have a high priority follow-up backlog task to handle these. It's not normal at least in internal use to be passing in date/time hints, so I'm not terribly concerned about breaking any existing out-of-spec code with this.
Commented out tests:
- https://github.com/bluelabsio/records-mover/pull/66/files#diff-b9d3d8d84af11c791565c585049c18be
- https://github.com/bluelabsio/records-mover/pull/66/files#diff-4abbf34edf14477a63b4fe26b409b96e
Backlog task: https://app.asana.com/0/1128138765527694/1178155809546696
There are three major changes in this PR:
Some background:
Up until now, the only command line parameters for hints that we've accepted are those in the 'BootstrappingHints' type, which is typed as a TypedDict with a limited selection of keys.
We have an 'initial_hints' argument to various factory methods that accept this type. Having these separate from an optional 'records_format' argument allowed us to be told whether to do hint inference or not and still be able to override hint inference when we need to.
Because we didn't have Literal[] types for the relevant hints yet, these arguments were hand-crafted, so I didn't build all of them out. The ones I hand-crafted in were motivated by things the hint sniffing code didn't address and which commonly caused loading errors (generally empirically as folks ran into issues).
Later on I provided a 'variant' CLI argument that if provided would combine with the hints provided to fill in the 'records_format' argument to the factory methods. This was a little limited, as we still only hand selected arguments available, but it was nice for dealing with, say, an uncompresssed version of a common variant, or a slight delta on a commonly used and abused variant like 'csv'.
Now that we have the ability to easily generate the additional hints, and have types for everything, I'd like to go ahead and fully extend out the hints supported by the command line.
I made this a TypedDict with all of the hint keys included to help people who use type checkers, but we do have to be careful, as this is an external interface - just because we declared a type doesn't mean people will send us well-typed things! I'm casting those back down to our untyped dictionary type once we receive it so we don't make any bad assumptions handling that
data.
Before:
After: