-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loosen validation and mypy type requirements for date/time types, start increased testing #144
Conversation
'YYYY-MM-DD HH:MI:SS']: | ||
pass | ||
else: | ||
cant_handle_hint(fail_if_cant_handle_hint, 'datetimeformattz', hints) | ||
quiet_remove(unhandled_hints, 'datetimeformattz') | ||
|
||
if hints.timeonlyformat == 'HH24:MI:SS': | ||
if hints.timeonlyformat in ['HH24:MI:SS', 'HH:MI:SS']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes issues found during testing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, the Redshift spec for date/time formats defines HH as equivalent to HH24 - so there's no reason now to handle it similarly to the HH24 cases. These were reproduced with test cases defined in datetime_cases.py in tests/component below.
"YYYY-MM-DD HH:MI:SS", | ||
"YYYY-MM-DD HH12:MI AM", | ||
"MM/DD/YY HH24:MI", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all of this file is probably used in this PR - it's the full set of things needed to define the test cases for #140
if should_raise[dateformat]: | ||
pass | ||
else: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general intent with these tests is that as we add new things to the below datetime_cases.py module, tests will blow up and we'll need to go back and hand-add expectations of what each driver should produce for each input.
@@ -324,13 +324,14 @@ def add_load_job_csv_config(unhandled_hints: Set[str], | |||
quiet_remove(unhandled_hints, 'datetimeformat') | |||
|
|||
if hints.datetimeformattz in ['YYYY-MM-DD HH:MI:SSOF', | |||
'YYYY-MM-DD HH24:MI:SSOF', | |||
'YYYY-MM-DD HH:MI:SS']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YYYY-MM-DD HH24:MI:SS
would be a good additional item - I'll write this down and address it in #140 if not before, probably by adding it as an explicit test case in my list of test cases below.
Looks like I have a test failure, which is annoying. I’ll open up another PR related to that and probably merge master back into this branch afterwards. If you could review the PR in the meantime ignoring the test failure for the moment, I’ll hold off merging until that fixes the passing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are a bit of a logical brain-twister here but I like the consistency and the comments on datatypes definitely help!
Nice cleanup here and helpful context, as always.
…om:bluelabsio/records-mover into loosen_date_requirements_1
Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Vertica databases. Along the way some places for improvement were found!
Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Pandas. Along the way some places for improvement were found!
…147) * Improve date/time hint handling for Vertica based on better testing Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Vertica databases. Along the way some places for improvement were found! * Ratchet coverage * Ratchet mypy * Ratchet coverage * Ratchet mypy coverage
Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Redshift databases. Along the way some places for improvement were found!
Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Redshift databases. Along the way some places for improvement were found!
Based on the new testing approach in #144, this PR adds component tests for date/time types while loading CSV files in MySQL. The expectations for those test are based on the results of the integration tests in #140, which shows our MySQL driver is way too optimistic about what date/time hints it can import!
) * Improve date/time hint handling for Pandas based on better testing Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Pandas. Along the way some places for improvement were found! * Ratchet test coverage * Ratchet mypy coverage * Update records_mover/records/pandas/to_csv_options.py Co-authored-by: Chris Wegrzyn <[email protected]> Co-authored-by: Chris Wegrzyn <[email protected]>
* Improve date/time hint handling for MySQL based on better testing Based on the new testing approach in #144, this PR adds component tests for date/time types while loading CSV files in MySQL. The expectations for those test are based on the results of the integration tests in #140, which shows our MySQL driver is way too optimistic about what date/time hints it can import! * Make comment in complete sentences * Quick flake8 fix * csv variant proved to not be possible * Ratchet * Ratchet
…149) * Improve date/time hint handling for Redshift based on better testing Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Redshift databases. Along the way some places for improvement were found! * Improve date/time hint handling for Redshift based on better testing Based on the new testing approach in #144, this PR adds component tests for date/time types while loading/exporting CSV files in Redshift databases. Along the way some places for improvement were found! * Ratchet coverage * Remove original * Ratchet
The Records Spec uses hints to describe how dates, times, datetime, and datetimetz types are formatted in CSVs.
It does this by describing a set of allowed values. The format of the values were originally cribbed from Redshift's COPY statement.
The current set of actually allowed values for those types (dateformat/timeonlyformat/datetimeformattz/datetimeformat) are fairly arbitrary, incomplete, inconsistent with each other, and don't hold up as terribly consistent under scrutiny. They've also been a source of bugs over time.
To make those complete even insomuch as the patterns they currently try to support would involve the start of some combinatorial explosion:
I think this does not bode well for maintaining an allowlist of formats - instead, I think we're going to need to (like Redshift) support a general formatting language in the spec, and do translations as needed to the formatting languages of the different databases/tools that Records Mover integrates with.
In the face of that, I'd like to expand the allowed values in a future PR. The motivation there is to ensure we allow likely DMY-flavored formats, to more generally support more things users may need in the future.
This PR is step one of that. It adds one additional test as a beginning of comprehensive testing of the existing supported values, and makes it easy to add more testable values in the future.
To deal with the combinatorial explosion problem for the time being, I've loosened the internal constraints by changing the Literal types to strings, and fixed up the various things based on what the testing found.
This PR does not yet loosen up the spec in any way.
The rest of the tests can be found in #140, which is probably too big to review right now--this is the first PR chipped off of that boulder.