-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New processor: truncate_fields #11297
New processor: truncate_fields #11297
Conversation
2642cbe
to
b0a8760
Compare
@kvch Any chance to split this up into at least 2 PR's? One per processor? |
@ruflin sure |
b0a8760
to
835e02a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides what andrew mentioned LGTM
835e02a
to
69b3201
Compare
8edd7af
to
375a21d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening a separate PR. Is the plan to rebase this one as soon as the other one is merged. If yes, please ignore my reviews.
375a21d
to
6624d24
Compare
07078ee
to
75bd386
Compare
This should be ready for another round of review. |
} | ||
|
||
func (f *truncateFields) addTruncatedString(field, value string, event *beat.Event) (*beat.Event, error) { | ||
truncated, err := f.truncate(f, []byte(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting from string
-> []byte
-> string
going to do two separate copies of the string data, but since it's only truncating I think this could be avoided with some refactoring. It would be good if someone could double-check my assumption about the two copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is indeed copying twice. But after thinking a bit more about the conversion and copying data I am not sure if it makes sense to eliminate them. First I need to copy and convert the string, as it is inmutable type, in order to modify it. Then I need to create a new truncated copy, so the previous long array can be freed. Then as I need to return a string, a new copy is needed to get the inmutable type again.
Without converting the string to []byte, I can only do slicing which does not touch the underlying data, thus the long line stays in memory as long there is a reference to it. I think it would take away the advantage of decreasing the size of the data in memory.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that we would mostly truncate []byte
fields (e.g. message) and rarely on string. Thus, this code path would be less frequently run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I can reduce the number of copies by not passing around the arrays. Will do that.
return nil | ||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unneeded space above?
maybe adding a unit test?
_, err = event.PutValue(field, truncated) | ||
if err != nil { | ||
return event, fmt.Errorf("could not add truncated byte slice value for key: %s, Error: %+v", field, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to add the log.flags
flag?
"truncate characters of too long byte line": { | ||
MaxChars: 10, | ||
Input: common.MapStr{ | ||
"message": []byte("ez egy túl hosszú sor"), // this is a too long line (hungarian) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
101 hungarian :)
36acbb0
to
991a4da
Compare
This PR introduces a new processor `truncate_fields`. To keep raw messages use this processor with `copy_fields`. ### `truncate_fields` This processor truncates configured fields. Example configuration is below: ```yaml processors: - truncate_fields: fields: - message max_bytes: 1024 fail_on_error: false ignore_missing: true ``` ### Keep raw events This preserves the orignal field and truncates it, if it's too long. ```yaml processors: - copy_fields: fields: - from: message to: event.original fail_on_error: false ignore_missing: true - truncate_fields: fields: - event.original max_bytes: 1024 fail_on_error: false ignore_missing: true ```
This PR introduces a new processor
truncate_fields
. To keep raw messages use this processor withcopy_fields
.truncate_fields
This processor truncates configured fields. Example configuration is below:
Keep raw events
This preserves the orignal field and truncates it, if it's too long.
Depends on #11303