-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(sinks): Allow Encoding config to only/except list fields #1915
Conversation
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Here's a list to make this easier:
|
I think so since one of the use cases is to discard private data before writing downstream. (ex: dropping a |
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a reasonable approach 👍
The main downside seems to be requiring the validate
and apply_rules
calls. I think the eventual way around that is to centralize encoders so that it all can be pushed into the "library" portion of the code. That's a more invasive change to the sinks, so it makes sense to put it off for now. Could be worth opening an issue though.
@lukesteensen Yeah, I think that is a good future step. Having better usability was kind of hampered by each sink having it's own |
@Hoverbear yep, exactly. We'd need to find a good way to let each sink specify the types of encoding it supported and then delegate the actual functionality to a shared implementation. |
@lukesteensen One thing we can do is have an |
@Hoverbear Oooo I like that. Sinks could still define their own encoders for config purposes, but maybe delegate to shared utils internally? Would also be possible to make that change incrementally. |
Yup! How about as a follow up to this so we don't scope creep? |
Sounds good! |
@binarylogic No, we settled on |
Yep, I understand. So how are we handling backward compatibility? That's a very big consideration here, especially since this is a popular option. |
@binarylogic it looks like using |
Sounds good, |
Signed-off-by: Lucio Franco <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really really good! Left some comments in line, let me know if you have any questions :)
src/sinks/aws_kinesis_firehose.rs
Outdated
} | ||
|
||
#[typetag::serde(name = "aws_kinesis_firehose")] | ||
impl SinkConfig for KinesisFirehoseSinkConfig { | ||
fn build(&self, cx: SinkContext) -> crate::Result<(super::RouterSink, super::Healthcheck)> { | ||
let config = self.clone(); | ||
config.encoding.validate()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it looks like we call this on each build we could just nest this into our de impl for EncodingConfig?
src/sinks/clickhouse.rs
Outdated
pub encoding: EncodingConfig, | ||
#[serde( | ||
deserialize_with = "EncodingConfigWithDefault::from_deserializer", | ||
skip_serializing_if = "skip_serializing_if_default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but why do some sinks have this and others don't? My assumption is that this allows a "default" encoding?
To avoid further bikeshedding this should in theory be possible to do within a custom impl but happy to leave that out for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make sure a user never has to see encoding.format
if there is only one option. So this makes sure we skip this field when it's not required in vector generate
calls
#[typetag::serde(name = "humio_logs")] | ||
impl SinkConfig for HumioLogsConfig { | ||
fn build(&self, cx: SinkContext) -> crate::Result<(super::RouterSink, super::Healthcheck)> { | ||
if self.encoding.codec != Encoding::Json { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a better way around this, in theory could we just implement a custom encoding enum here that doesn't have Json
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't think this is the nicest way either but it's very straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one thing that worries me is I've seen serde sometimes enumerate all the possible options but in this case a user would see json as being available but would fail when you configure the sink. Just something we should check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, hm... The only way to fix that would be to give this a custom encoding type and transmute it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not write a custom enum here that doesn't include json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
Just to verify: Do we have a behavior test with the root level |
That link doesn't go anywhere :( |
gh links for lines in PRs have been broken for me for a while now... |
Signed-off-by: Ana Hobden <[email protected]>
Signed-off-by: Ana Hobden <[email protected]>
@LucioFranco Our powers combined, this seems ready once the tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work! 🔥
I'm slightly concerned this is not correct given:
How can we simply test for this? It seems silly to test for something so trivial, but I feel like we need to at this point. Should we create configs and lint them? Maybe fuzz testing? |
@binarylogic Let's not stack anything more on this PR, we already chose to unnecessarily stack the fixes for #894 in this. (Which I think was a good idea!) So, please create a separate issue! I'd love to share my thoughts there! Prior to merging I'll be doing some manual user acceptance testing. Our exiting unit tests already serve as a fairly good configuration test bank. Note how our clickhouse tests tend to use configuration fragments. I think this is a good way to do things since it lets us safely do these changes and catch mistakes. It'd be nice if we trended towards doing this more over using structs. |
I'm going to test this out locally. I've identified a few cases in particular I'm checking:
|
Signed-off-by: Ana Hobden <[email protected]>
Okay those examples work (Except #1987 ) so I'm going to merge this under the impression that any remaining fixes will be (very) minor and will most likely be related to setting |
This should fix #1448 .
So far
Things should work according to how #1448 (comment):
Please help me verify:
Encoding
at all) has been updatedencoding.validate()?
encoding.apply_rules(_)
on all sunk events._encoding_
partial in it's.meta/_partials
folder. Eg:To decide
I need some help from folks to determine the following:
clickhouse
,blackhole
, ordatadog
?