-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option in SQL Targets to coerce types based on observed record shape #1561
Comments
From what I can tell, the SDK does type validation by default. An input stream where integer columns contain a string, and corresponding error message:
The error message:
I hope to add functionality, and a setting tentatively named The pragmatic approach would be to alter the column to varchar (possibly via add column/drop column), and coerce the input, but we could conceivably make a more complex ruleset of int -> bigint -> number -> varchar -> clob. The other half of the solution must be that during the validation of the target schema (typically during subsequent runs), so that when it finds varchar in the target column it will accept the target type and keep coercing the integer (or whatever) to varchar. Type checking brings a performance hit, and many users might prefer if the target fails when something is wrong (so it must be a configurable setting). And this would not be feasible to do during batch loads. |
I have run into the JSON Validation errors with a couple of data type. @edgarrmondragon tracked down the root cause of the issue and created a draft PR that contains a possible way for developers to control the JSON Validation settings. Here is the link to the PR just in case it is helpful for this conversation. feat: Support custom JSON schema validation and string format checkers in targets |
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the |
Still relevant right @edgarrmondragon ? |
@tayloramurphy yeah this'd be nice to have |
This would be a simple-to-use option for end users who are trying to deal with "rogue" or incorrect type declarations in the upstream tap.
For instance, if the tap incorrectly defines one of its fields as an integer, but it receives a string, then we could give the user an option of auto-expanding the data type to be inclusive of the declared type and also the observed type. Since a string column can hold integers as well as strings, expanding the data type to a string type will allow the load to complete successfully.
Implementation wise, if built within the tap or mapper layer, this normally would result in a new
SCHEMA
message being emitted upon observance of a record that does not fit the declared schema. However, if built in the target, there's no need to emit a SCHEMA message. Instead, theSink
class per batch would expand data type negotiation to be inclusive of (1) declared type, (2) target column's already existing type, and (2) observed data type in the records. Currently this negotiation exists but it only considers the first two factors.Note:
cc @radbrt
The text was updated successfully, but these errors were encountered: