Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global options for schema field names (message, host, timestamp, etc) #1446

Closed
3 tasks
binarylogic opened this issue Dec 27, 2019 · 5 comments
Closed
3 tasks
Assignees
Labels
domain: config Anything related to configuring Vector type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@binarylogic
Copy link
Contributor

This is a simple change that allows a user to change the defaults for the common field names, such as a host, timestamp, and message.

Example

In the vector.toml file a user could specify these as top-level options:

[log_schema]
host_key = "instance" # default "host"
message_key = "info" # default "message"
timestamp_key = "datetime" # default "timestamp"

Requirements

  • All sources should use the settitngs.
  • All transforms that implicitly operate on the message field should use the message_key setting.
  • All sinks that implicity use the timestamp key for partittioning or other means s hould use the timestamp_key variable.
@binarylogic binarylogic added type: enhancement A value-adding code change that enhances its existing functionality. domain: config Anything related to configuring Vector labels Dec 27, 2019
@binarylogic
Copy link
Contributor Author

@a-rodin I'm moving this into your data processing project as a low priority item. If we can get to it, great, otherwise we can defer it. But I do think this is related to data processing since it deals with the data model.

@Hoverbear
Copy link
Contributor

So this change has some interesting user experience implications!


Let's say a user has a service feeding them some data with the message coming in as data (but it doesn't matter). Here this setting is quite useful! It lets them map it via this handy global option.

Consider when that user goes to use a new source, they'll find that despite the fact that this new source is the known Journald source, their log_schema setting actually breaks this new source!

So in this case they'll then need to go configure the specific source and remove this global option. Since source configurations do not always have these three settings, they'll find inconsistencies.


Let's say a user is using this configuration and they output to a sink upstream that expects this input format. They then add a new sink, and this sink expects timestamp to be present, but because of this log_schema, it's not. They need to do the same thing as above.


On a more technical note, we do have code like this that would definitely become more challenging:

https://github.com/timberio/vector/blob/2ee1b8867e5c79057ed4467d9d4ce3984dd73599/src/event/mod.rs#L481-L504

But I don't think those offer huge barriers.

I can definitely see valid use cases from this, but I can also see that this is essentially a rename_fields #377 stuck at both sides of a pipeline.

Or maybe I'm not getting something?

@binarylogic
Copy link
Contributor Author

That makes sense, and for fields that the user controls a rename_fields transform makes total sense. I think my specification for this issue overstepped some and could use some clarifying:

I was mostly referring to fields we set, not fields we receive as part of a structured event.

For example, notice in the file source we set the host field via the host_key option. I was imagining these global options would basically set the defaults for this key name. Notice we do the same thing for the event timestamp field here. I basically just want to control these defaults. If the user sends a payload through any source, and the keys are named a certain way, then the rename_fields transform should be used.

Does that help?

@Hoverbear
Copy link
Contributor

@binarylogic So like in #1769?

@binarylogic
Copy link
Contributor Author

Closed via #1446

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: config Anything related to configuring Vector type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

2 participants