Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement!: Remove default encoding.codec where appropriate #5281

Merged
merged 18 commits into from
Dec 10, 2020

Conversation

fanatid
Copy link
Contributor

@fanatid fanatid commented Nov 29, 2020

Closes #5196

Affected components (remove text value by default, not backward compatible):

  • aws_s3, text -> null
  • file, text -> null
  • humio, json -> null
  • kafka, text -> null
  • nats, text -> null
  • new_relic_logs, json -> null
  • pulsar, text -> null
  • splunk_hec, text -> null

@fanatid fanatid added the type: enhancement A value-adding code change that enhances its existing functionality. label Nov 29, 2020
@fanatid fanatid self-assigned this Nov 29, 2020
@binarylogic
Copy link
Contributor

@kirillt shouldn't this include .cue changes?

@fanatid
Copy link
Contributor Author

fanatid commented Nov 29, 2020

@binarylogic yep, I'll need to update the docs too

Signed-off-by: Kirill Fomichev <[email protected]>
Signed-off-by: Kirill Fomichev <[email protected]>
Signed-off-by: Kirill Fomichev <[email protected]>
Signed-off-by: Kirill Fomichev <[email protected]>
#[derive(Clone, Copy, Debug, Derivative, Deserialize, Eq, PartialEq, Serialize)]
#[derivative(Default)]
#[serde(rename_all = "snake_case")]
pub enum EncodingText {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I guess the alternative here would be to have a single Encoding type with no default, then make each component implement a serde default function? It might adapt better as we add things like #5021

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think I like this better too.

Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing necessarily wrong with this implementation persay, it's just very inelegant and it will become quite burdensome as our system develops and adds support for more encoding types.

I think this is fine to merge, but if you have a different implementation idea for accomplishing this, I'd encourage you to investigate it. :)

@binarylogic
Copy link
Contributor

I tend to agree with @Hoverbear. I appreciate the effort, but coupling the various types in this way is likely to be difficult to manage in the future. We plan to expand support in the future.

Signed-off-by: Kirill Fomichev <[email protected]>
Signed-off-by: Kirill Fomichev <[email protected]>
@fanatid
Copy link
Contributor Author

fanatid commented Dec 1, 2020

Yeah, I think now that extend enums will be harder. I returned the original Encoding enums back.

src/sources/splunk_hec.rs Outdated Show resolved Hide resolved
src/sinks/splunk_hec.rs Outdated Show resolved Hide resolved
src/sinks/splunk_hec.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@Hoverbear Hoverbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted some spots we might prefer a default in a generated config, the rest looks good. :)

Signed-off-by: Kirill Fomichev <[email protected]>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question below, but otherwise this looks good to me!

It is making me wonder if we even want to default the Humio and New Relic sinks though. Should we just require it there as well for consistency?

src/sinks/util/buffer/compression.rs Outdated Show resolved Hide resolved
@fanatid
Copy link
Contributor Author

fanatid commented Dec 8, 2020

It is making me wonder if we even want to default the Humio and New Relic sinks though. Should we just require it there as well for consistency?

New Relic has only one encoding (json) so I think it's okay to have it by default. https://github.com/timberio/vector/blob/efef1185c743ed6ad2ceed82694af5abd4210859/src/sinks/new_relic_logs.rs#L70-L73

Why we need to remove the default value in Humio (json)?

@jszwedko
Copy link
Member

jszwedko commented Dec 8, 2020

It is making me wonder if we even want to default the Humio and New Relic sinks though. Should we just require it there as well for consistency?

New Relic has only one encoding (json) so I think it's okay to have it by default.

https://github.com/timberio/vector/blob/efef1185c743ed6ad2ceed82694af5abd4210859/src/sinks/new_relic_logs.rs#L70-L73

Why we need to remove the default value in Humio (json)?

I was just thinking, as a user of vector, it would be nice to have the sinks be consistent in their requirement of encoding.codec than to have exceptions for only one or two sinks.

For New Relic, I think we could easily add support for text encoding by setting message in the payload (https://docs.newrelic.com/docs/logs/log-management/log-api/introduction-log-api) to the message field in vector rather than encoding the event as JSON. Again, I think this consistency across sinks would make vector easier to use.

@fanatid
Copy link
Contributor Author

fanatid commented Dec 8, 2020

@binarylogic what do you think about Humio and New Relic? Should we change default encoding?

@binarylogic
Copy link
Contributor

binarylogic commented Dec 8, 2020

They should default to JSON. We should aim to reduce decisions imposed on the user, and I cannot think of a reason JSON should not be the default for those sinks (and most sinks).

@jszwedko
Copy link
Member

jszwedko commented Dec 8, 2020

They should default to JSON. We should aim to reduce decisions imposed on the user, and I cannot think of a reason JSON should not be the default for those sinks (and most sinks).

I don't disagree with this, but that is a different direction than this PR is taking. If we want to prefer to default to JSON, we should default to JSON in this PR everywhere feasible rather than removing default encodings.

@binarylogic
Copy link
Contributor

I agree, we should default to JSON where possible. I can't think of a scenario where we would not default to JSON if JSON is supported by the sink.

@lukesteensen
Copy link
Member

I agree, we should default to JSON where possible. I can't think of a scenario where we would not default to JSON if JSON is supported by the sink.

We've had this discussion many times. There are situations like syslog -> syslog and kafka -> kafka where the user would not necessarily expect their message to be wrapped in a JSON envelope before being forwarded. For that reason, we decided to avoid setting a default on sinks that support multiple encodings.

Humio, New Relic, and similar will often have their own JSON wrappers, but that doesn't mean it always makes sense to wrap each individual event in further JSON. For example, consider a file -> humio config with no parsing. With a "raw" encoding, each event will look (roughly) like this:

{
  "event": "I am a raw log line from the input file",
  "fields": [],
  "host": "bens-laptop.local",
  "time": "2006-08-14T02:34:56-06:00"
 }

If we default to JSON, it'll look like the following:

{
  "event": {
    "message": "I am a raw log line from the input file",
    "timestamp": "2006-08-14T02:34:56-06:00",
    "host": "bens-laptop.local"
  },
  "fields": [],
  "host": "bens-laptop.local",
  "time": "2006-08-14T02:34:56-06:00"
}

As you can see, in a pipeline with no real parsing or enrichment, the end result of encoding with JSON is a potentially confusing duplication of metadata that's already handled by the sink's normal protocol.

@fanatid fanatid changed the title enhancement!: Remove text as default encoding.codec where appropriate enhancement!: Remove default encoding.codec where appropriate Dec 9, 2020
@fanatid
Copy link
Contributor Author

fanatid commented Dec 9, 2020

Default encoding in humio / new-relic-logs removed.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fanatid !

It looks like the loki sink still has a default, but multiple options, should we remove that one too? That was the only other one I could find.

As mentioned in my in-line comment, I think we should add support for text encoding for New Relic as well.

pub enum Encoding {
#[derivative(Default)]
Json,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add Text here as an encoding option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you okay with adding in another PR? This will not be breaking change because we already removed Default for Encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 a separate PR is fine, but it'd be nice to do it before this change is released in a new version as it'd be confusing to require users to specify the encoding.codec option when there is only one valid value.

@@ -67,6 +67,22 @@ where
}
}

#[cfg(any(feature = "sinks-new_relic_logs", feature = "sinks-humio"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused why this type only is used by these two sinks and not the others. Should it be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's should be here. Only used in New Relic / Humio because we need to convert EncodingConfig<sink1::Encoding> to EncodingConfig<sink2::Encoding>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Gotcha, I see.

Signed-off-by: Kirill Fomichev <[email protected]>
@fanatid
Copy link
Contributor Author

fanatid commented Dec 9, 2020

@jszwedko default in loki removed.

@jszwedko
Copy link
Member

jszwedko commented Dec 9, 2020

The Windows build failure looks related:

---- sinks::humio::logs::tests::humio_valid_time_field stdout ----
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { inner: ErrorInner { kind: Custom, line: Some(0), col: 0, at: Some(0), message: "missing field `encoding`", key: [] } }', src\sinks\humio\logs.rs:123:10

Whenever CI is green this looks good to me!

Signed-off-by: Kirill Fomichev <[email protected]>
@fanatid fanatid merged commit 3a48a0a into master Dec 10, 2020
@fanatid fanatid deleted the encoding-codec-default branch December 10, 2020 08:59
@binarylogic
Copy link
Contributor

Nice work, thanks @kirillt .

jszwedko added a commit that referenced this pull request Jun 15, 2021
This is the only supported encoding for the New Relic sink.

There was some discussion about this in
#5281 when the default was
dropped. It was dropped as we thought `text` was also a valid encoding,
that we would be adding shortly, in which case it would make sense to
make the user choose; however, I found that `text` is not a valid
encoding for the New Relic logs API
(https://docs.newrelic.com/docs/logs/log-management/log-api/introduction-log-api/).

Also document the undocumented region field.

Signed-off-by: Jesse Szwedko <[email protected]>
jszwedko added a commit that referenced this pull request Jun 16, 2021
This is the only supported encoding for the New Relic sink.

There was some discussion about this in
#5281 when the default was
dropped. It was dropped as we thought `text` was also a valid encoding,
that we would be adding shortly, in which case it would make sense to
make the user choose; however, I found that `text` is not a valid
encoding for the New Relic logs API
(https://docs.newrelic.com/docs/logs/log-management/log-api/introduction-log-api/).

Also document the undocumented region field.

Signed-off-by: Jesse Szwedko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make encoding.codec non-optional or default to json/ndjson
5 participants