-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Private fields used for processing that are not encoded #1448
Comments
Ah, that makes sense, thanks for letting us know. We have discussed exactly this feature in the past. To help us understand the full scope of this feature would you mind sharing your exact example? Specifically, how the field is initially inserted into the event and how you plan to use it in the |
@a-rodin I'm adding this to the "Improve data processing" milestone since you're re-approaching the data model. I think it would be worthwhile to consider private fields as you make those changes. |
Noting, we need a spec that addresses the following:
|
It is not a spec yet, but there is a thought to approach this problem from a slightly different angle. If we implement proper nesting of fields, it would be possible to move all public fields into a dedicated subfield with configurable name. For example, the {
"data": {
"message": "message",
"timestamp": "2019-12-28T12:34:56.789Z",
"host": "localhost"
}
} Here the Then making fields public/private could be as easy as renaming them (#377): [transforms.hide_host]
type = "rename_fields"
[transforms.hide_host.fields]
host = "data.host" All sinks would take the name of the field to send, which by default would be again Benefits of this approach:
|
That is an interesting approach, although I think the behavior should be inversed: public by default. I believe that to be the path of least surprise. We could always provide a global, or source level, option to change his default. To demonstrate with your example, the {
"message": "message",
"timestamp": "2019-12-28T12:34:56.789Z",
"host": "localhost"
} And then a special [transforms.hide_host]
type = "rename_fields"
[transforms.hide_host.fields]
host = "_private.host" Alterantively, the |
I agree that at least for existing users it would be much less surprising. I'm somewhat concerned about possible collisions, as the user potentially might have fields with names starting from |
@binarylogic we are trying to save S3 volume by using ~static fields from parsed_json for templating s3-key_prefixes rather than including this data in every log-line.
|
To avoid repeating the explicit/implicit fields split that we only recently ripped out, I'd suggest that we consider handling this at the encoder level. That seems to be where the real payoff is and it lets us avoid mucking around too much in the data model. For example, we could expand the |
Totally agree with @lukesteensen , adjusting encoder with include-exclude list of properties will perfectly solve this issue. Another thought — "metadata" for one sink could be usual data for another. |
I wanted to propose a few configuration examples and agree on them before we begin work: 1. Backward compatibilityIf possible, I'd like to preserve and deprecate the current syntax since this is a popular option: [sinks.my_sink]
type = "clickhouse"
encoding = "json" 2. New syntax[sinks.my_sink]
type = "clickhouse"
encoding.format = "json"
encoding.only_fields = ["timestamp", "message"]
encoding.except_fields = ["_meta"] Where [sinks.my_sink]
type = "clickhouse"
[sinks.my_sink.encoding]
format = "json"
only_fields = ["timestamp", "message"]
except_fields = ["_meta"] 3. Embedding the
|
Let's support the second option 2 first! |
Problem:
Ideas:
The text was updated successfully, but these errors were encountered: