-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Establish batch semantics #655
Comments
Do I understand correctly the problem might happen when transforming data for the IMO we should align with the ClickHouse default logic for such cases. ClickHouse implements fail-fast pattern, even though it's configurable. From input_format_allow_errors_num docs:
I'd suggest avoiding the complexity of making this configurable (at least for now) and throwing an exception if a value cannot be parsed. In this case, a user can be informed about the problem and fix the data source. |
fail-fast invariably will slow down inserts - the client will need to parse the data twice. We could make this a switch to disable? |
IMO just failing whole batch is ok |
This occurs outside of columnar api appends and standard appends. Currently if we make an invalid append to a batch it means it can't be sent - even if previous rows were successful. We had a bug re this - see #798 Noting we considered: One proposal is that the Append of invalid data should fail but the batch should remain valid i.e. subsequent Send() should succeed. This is trickier as columns are added one by one. If the error occurs due to a column other than the first, we have to undo the previous changes. I suspect this isn't trivial as it will mean reversing the changes to the buffer. We could check to see all values are valid beforehand - this will incur an overhead though. |
This needs to be resolved and made consistent @mshustov Let discuss |
Giving my thoughts also on (2) (appending to batch might fail); My preference would be to just invalidate the entire batch and trying to |
Agree with @mshustov:
fail-fast is the right approach here, but I don't want to go way too far and introduce a configurable threshold. Will submit a PR with a test that validates against behavior. |
Some challenges with batches:
For columnar insertion we provide
Append
methods on columns e.g. https://github.com/ClickHouse/clickhouse-go/blob/v2/examples/native/write-columnar/main.go.For some types,
Append
can fail e.g. when parsing Strings as IP addresses. Currently, in the event of failure we have inconsistent semantics. Consider:The former is faster - it saves iterations, at the cost of trickier error recovery for the user.
Append to a batch can fail e.g. the enum doesn't map Batch.Send panics if AppendStruct couldn't map Enum #703. This causes a release of the connection and means subsequent use of the batch fails with a panic.
We should decide on the semantics, document them and be consistent.
@ernado @genzgd
The text was updated successfully, but these errors were encountered: