-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data_dog: partially revert recent datadog PR to avoid provider ecs segfault #6785
base: 1.9
Are you sure you want to change the base?
Conversation
Signed-off-by: Matthew Fala <[email protected]>
aws/aws-for-fluent-bit#491 (comment)
I think this issue is caused by following condition case. On current master, it was fixed by #6750 and it has not been released yet. |
@@ -179,7 +179,7 @@ static int datadog_format(struct flb_config *config, | |||
return -1; | |||
} | |||
} else if (flb_sds_len(remapped_tags) < byte_cnt) { | |||
tmp = flb_sds_increase(remapped_tags, flb_sds_len(remapped_tags) - byte_cnt); | |||
tmp = flb_sds_increase(remapped_tags, byte_cnt - flb_sds_len(remapped_tags)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, the 2nd arg to flb_sds_increase is the new size, so shouldn't that be byte_cnt? And we shouldn't subtract anything from it?
static int dd_remap_container_name(const char *tag_name, | ||
msgpack_object attr_value, flb_sds_t *dd_tags_buf) | ||
static void dd_remap_container_name(const char *tag_name, | ||
msgpack_object attr_value, flb_sds_t dd_tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help me understand again, why are we reverting these changes which probably still are a real fix? Also if we are going to revert them, why not put it in a revert commit? Why same commit as this new fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the individual parts of the datadog pr and it turns out that completely unrelated error handling code was triggering a segfault somewhere random in the code (like on a network call).
We decided that this PR would keep only the essential portions of the recent PR and ditch the lower priority ones to avoid adding the segfault
@@ -179,7 +179,7 @@ static int datadog_format(struct flb_config *config, | |||
return -1; | |||
} | |||
} else if (flb_sds_len(remapped_tags) < byte_cnt) { | |||
tmp = flb_sds_increase(remapped_tags, flb_sds_len(remapped_tags) - byte_cnt); | |||
tmp = flb_sds_increase(remapped_tags, byte_cnt - flb_sds_len(remapped_tags)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the data buffer resize fix from here: #6570
static int dd_remap_container_name(const char *tag_name, | ||
msgpack_object attr_value, flb_sds_t *dd_tags_buf) | ||
static void dd_remap_container_name(const char *tag_name, | ||
msgpack_object attr_value, flb_sds_t dd_tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the individual parts of the datadog pr and it turns out that completely unrelated error handling code was triggering a segfault somewhere random in the code (like on a network call).
We decided that this PR would keep only the essential portions of the recent PR and ditch the lower priority ones to avoid adding the segfault
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
We noticed that the recent datadog pr triggers a segfault when
provider
option is set toecs
. After a lot of investigation, we were unable to find the root cause of the segfault, but discovers that it exists during a some random network call, which has nothing to do with the error handling code added in the PR, that when removed, resolves the segfault.As a solution, we partially revert the recent Datadog pr that mysteriously triggers this segfault. It is just some simple error handling code that was recently added. We also add the data buffer resize fix from here: #6570
Partial revert provider ecs code of Datadog recent pr that triggers segfaults:
#5930
#5929
See the segfault reports in aws-for-fluent-bit repo here:
aws/aws-for-fluent-bit#491
Signed-off-by: Matthew Fala [email protected]
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.