Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(clickhouse sink)!: make skip_unknown_fields optional #22020

Conversation

PriceHiller
Copy link
Contributor

Summary

Copied from the body of the commit:

Problem: The Clickhouse sink's skip_unknown_fields doesn't follow the Clickhouse server default. It always sends a value for skip_unknown_fields; furthermore, there is also no way to disable skip_unknown_fields setting a "strict" mode for Clickhouse. We really want to permit either a default value from the Clickhouse server, meaning we shouldn't specify skip_unknown_fields by default. Otherwise, if a user wants to specify the strict mode for unknown fields, they should then pass either true or false for click house.

Solution: Change the skip_unknown_fields value to be of an Option<bool> instead of just a bool. This permits using the defaults provided by the Clickhouse server and doesn't send the skip_unknown_fields value to the server if left unspecified.

See #22013 for the original issue report.

Closes #22013

Allow the skip_unknown_fields setting to be optional, thereby allowing use of the defaults provided by the ClickHouse server. Setting it to true will permit skipping unknown fields and false will make ClickHouse strict on what fields it accepts.

NOTE: I am unfamiliar with ClickHouse and Vector's code as a whole, so it is very possible I have missed something.

Also, does this qualify as a Bug fix or a New feature? Technically this does add functionality that didn't exist previously, but does so by encoding a fix.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

How did you test this PR?

By running:

  1. cargo test --lib --no-default-features --features "sinks-console,sinks-clickhouse,clickhouse-integration-tests" 'sinks::clickhouse::service'
  2. cargo test --lib --no-default-features --features "sinks-console,sinks-clickhouse,clickhouse-integration-tests" 'sinks::clickhouse::config'
  3. make test-integration-clickhouse

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • Please read our Vector contributor resources.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label Dec 11, 2024
@PriceHiller PriceHiller marked this pull request as ready for review December 11, 2024 20:52
@PriceHiller PriceHiller requested a review from a team as a code owner December 11, 2024 20:52
@PriceHiller PriceHiller force-pushed the clickhouse-skip_unknown_fields-optional branch from 2f1edf0 to c25c533 Compare December 11, 2024 21:00
@PriceHiller
Copy link
Contributor Author

Modified the changelog from fix -> breaking.

@pront
Copy link
Member

pront commented Dec 11, 2024

Modified the changelog from fix -> breaking.

I wonder if we can avoid a breaking change, if we set it to Some(false) by default. Which is what we have today: https://vector.dev/docs/reference/configuration/sinks/clickhouse/#skip_unknown_fields

@PriceHiller
Copy link
Contributor Author

PriceHiller commented Dec 11, 2024

I wonder if we can avoid a breaking change, if we set it to Some(false) by default. Which is what we have today: https://vector.dev/docs/reference/configuration/sinks/clickhouse/#skip_unknown_fields

The issue is that the false value wasn't actually doing the right thing. It should have put ClickHouse into "strict" mode and reject any unknown fields for that request.

Instead, Vector was incorrectly using the false value to not send input_format_skip_unknown_fields at all and instead rely on the ClickHouse's server setting default.

See

if skip_unknown {
uri.push_str("input_format_skip_unknown_fields=1&");
}

What needs to happen is if skip_unknown_fields is specified at all we need to set the request's preference on whether or not to skip the unknown fields, ignoring the server's default setting for that.

There's 3 cases:

  1. skip_unknown_fields is true: we need to allow skipping unknown fields, overriding the server's default
  2. skip_unknown_fields is false: we need to enable "strict" mode and not allow skipping unknown fields for the request, overriding the server's default
  3. skip_unknown_fields is None: we need to follow the server's default by not specifying input_format_skip_unknown_fields

At least, that's based on my understanding of #22013. If I'm in error, I'll gladly make the change, that's just my interpretation.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @PriceHiller . This is a breaking change, but I'd argue that the behavior you have here is what users would have reasonably expected: if they explicitly set false it should set input_format_skip_unknown_fields=0.

Problem: The Clickhouse sink's `skip_unknown_fields` doesn't follow the Clickhouse
server default. It always sends a value for `skip_unknown_fields`;
furthermore, there is also no way to _disable_ `skip_unknown_fields`
setting a "strict" mode for Clickhouse. We really want to permit either
a default value from the Clickhouse server, meaning we shouldn't specify
`skip_unknown_fields` by default. Otherwise, if a user _wants_ to
specify the strict mode for unknown fields, they should then pass either
`true` or `false` for click house.

Solution: Change the `skip_unknown_fields` value to be of an
`Option<bool>` instead of just a `bool`. This permits using the defaults
provided by the Clickhouse server and doesn't send the
`skip_unknown_fields` value to the server if left unspecified.

See vectordotdev#22013 for the original
issue report.

Closes vectordotdev#22013
@PriceHiller PriceHiller force-pushed the clickhouse-skip_unknown_fields-optional branch from c25c533 to 2e22b55 Compare December 12, 2024 22:34
@PriceHiller PriceHiller requested review from a team as code owners December 12, 2024 22:34
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Dec 12, 2024
@PriceHiller
Copy link
Contributor Author

Sorry about the delay, just updated the cue files on the latest force push.

@jszwedko jszwedko enabled auto-merge December 13, 2024 08:18
@jszwedko jszwedko added this pull request to the merge queue Dec 13, 2024
Merged via the queue into vectordotdev:master with commit f8f33b5 Dec 13, 2024
56 checks passed
@PriceHiller PriceHiller deleted the clickhouse-skip_unknown_fields-optional branch December 13, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clickhouse skip_unknown_fields has no effect
4 participants