-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_datadog: fix/add error handling for all flb_sds calls #5929
Conversation
Valgrind passes:
With config:
|
plugins/out_datadog/datadog.c
Outdated
@@ -321,7 +358,7 @@ static void cb_datadog_flush(struct flb_event_chunk *event_chunk, | |||
ret = datadog_format(config, i_ins, | |||
ctx, NULL, | |||
event_chunk->tag, flb_sds_len(event_chunk->tag), | |||
event_chunk->data, event_chunk->size, | |||
event_chunk->data, event_chunk->size, event_chunk->total_events, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is causing the build to give a warning... will fix it... this is used in the test_formatter.callback
5a0655d
to
0685187
Compare
ret = remapping[ind].remap_to_tag(remapping[ind].remap_tag_name, v, | ||
&remapped_tags); | ||
if (ret < 0) { | ||
flb_plg_error(ctx->ins, "Failed to remap tag: %s, skipping", remapping[ind].remap_tag_name); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone from datadog could comment on this bit specifically, that'd be awesome.
So this code adds the ECS task metadata (cluster name, task arn, etc) to the datadog tags. And its very unlikely for this code to fail... that only happens if there is an alloc failure. But, if there is a failure, what should we do?
I was thinking that just skipping and applying the next tag is best. Technically I think the tag string has to be in a nice format like key:val,key2:val2
. When it fails, we don't know if it failed in the middle of adding a tag, so at the continue here you could have an incomplete string like key:val,key2:
. I do not know how bad this is.
My guess was that continuing and risking that the tags are mis-formatted is better than just discarding the tags or discarding the record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! The code looks much safer now. Made some small comments.
It seems like the convention you are following for flb_errno() is to call that only on failed allocations and reallocations, not on failed frees.
plugins/out_datadog/datadog.c
Outdated
|
||
/* Count number of records */ | ||
array_size = flb_mp_count(data, bytes); | ||
array_size = (int) total_events; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why downcast from a size_t to an int? Could you make array_size a size_t, or total_events an int?
plugins/out_datadog/datadog.c
Outdated
@@ -110,13 +111,14 @@ static int datadog_format(struct flb_config *config, | |||
msgpack_object k; | |||
msgpack_object v; | |||
struct flb_out_datadog *ctx = plugin_context; | |||
struct flb_event_chunk *event_chunk = flush_ctx; | |||
|
|||
array_size = (int) event_chunk->total_events; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to make this a size_t rather than an int to avoid downcasting issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea this is because I thought the msgpack API wants an int but actually it does want a size_t so this cast is very silly...
if (!remapped_tags) { | ||
remapped_tags = flb_sds_create_size(byte_cnt); | ||
if (!remapped_tags) { | ||
flb_errno(); | ||
msgpack_sbuffer_destroy(&mp_sbuf); | ||
msgpack_unpacked_destroy(&result); | ||
return -1; | ||
} | ||
} else if (flb_sds_len(remapped_tags) < byte_cnt) { | ||
tmp = flb_sds_increase(remapped_tags, flb_sds_len(remapped_tags) - byte_cnt); | ||
if (!tmp) { | ||
flb_errno(); | ||
flb_sds_destroy(remapped_tags); | ||
msgpack_sbuffer_destroy(&mp_sbuf); | ||
msgpack_unpacked_destroy(&result); | ||
return -1; | ||
} | ||
remapped_tags = tmp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to confirm:
This new section of code addresses the following problem
- a remapped_tags buffer is created for the first log event, based on it's size estimations
- the remapped_tags buffer is cleared and reused for the second log event, disregarding the second logevent's tag size estimation
- the remapped_tags buffer is too small which triggers a segfault.
The solution is as follows:
Reuse the remapped_tags unless the new size estimate is larger, in which case reinitialize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seem, though, that this section of code is an optimization and not necessary.
If there is too little space for the contents of the string, then the flb_sds_cat code will reinitialize the buffer with more space for all future use. Reallocating the memory upfront is more efficient, which is why this solution seems like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seem, though, that this section of code is an optimization and not necessary.
If there is too little space for the contents of the string, then the flb_sds_cat code will reinitialize the buffer with more space for all future use. Reallocating the memory upfront is more efficient, which is why this solution seems like a good idea.
Yea this is what I'm going for... trying to make sure this buffer is reallocated as few times as needed.
0685187
to
b0b7163
Compare
Signed-off-by: Wesley Pettit <[email protected]>
b0b7163
to
794acb1
Compare
Can we please prioritise this PR ? Due to this issue we have been facing regular fluentbit crashes while sending logs to datadog. |
@edsiper CI passes except for CIFuzz. This is a bug fix which was causing crashes for a csutomer. Can we please merge? |
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
…uent#5929)" This reverts commit 300206a.
Signed-off-by: Wesley Pettit [email protected]
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.