out_datadog: fix/add error handling for all flb_sds calls #5929

PettitWesley · 2022-08-22T23:31:45Z

Signed-off-by: Wesley Pettit [email protected]

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Attached local packaging test output showing all targets (including any new ones) build.

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

PettitWesley · 2022-08-22T23:32:32Z

Valgrind passes:

==26291==
==26291== HEAP SUMMARY:
==26291==     in use at exit: 110,402 bytes in 3,763 blocks
==26291==   total heap usage: 33,404 allocs, 29,641 frees, 6,283,769 bytes allocated
==26291==
==26291== LEAK SUMMARY:
==26291==    definitely lost: 0 bytes in 0 blocks
==26291==    indirectly lost: 0 bytes in 0 blocks
==26291==      possibly lost: 0 bytes in 0 blocks
==26291==    still reachable: 110,402 bytes in 3,763 blocks
==26291==         suppressed: 0 bytes in 0 blocks
==26291== Rerun with --leak-check=full to see details of leaked memory
==26291==
==26291== For lists of detected and suppressed errors, rerun with: -s
==26291== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

With config:


[SERVICE]
    Log_Level debug

[INPUT]
    Name dummy
    Tag dummy


[OUTPUT]
    Name        datadog
    Match	*
    Host        http-intake.logs.datadoghq.com
    TLS         on
    compress    gzip
    apikey	SECRET
    dd_service  wesley
    dd_source   test
    dd_tags     team:logs,foo:bar

PettitWesley · 2022-08-22T23:35:34Z

plugins/out_datadog/datadog.c

@@ -321,7 +358,7 @@ static void cb_datadog_flush(struct flb_event_chunk *event_chunk,
    ret = datadog_format(config, i_ins,
                         ctx, NULL,
                         event_chunk->tag, flb_sds_len(event_chunk->tag),
-                         event_chunk->data, event_chunk->size,
+                         event_chunk->data, event_chunk->size, event_chunk->total_events,


This change is causing the build to give a warning... will fix it... this is used in the test_formatter.callback

PettitWesley · 2022-08-22T23:49:23Z

plugins/out_datadog/datadog.c

+                ret = remapping[ind].remap_to_tag(remapping[ind].remap_tag_name, v,
+                                                  &remapped_tags);
+                if (ret < 0) {
+                    flb_plg_error(ctx->ins, "Failed to remap tag: %s, skipping", remapping[ind].remap_tag_name);
+                }


If someone from datadog could comment on this bit specifically, that'd be awesome.

So this code adds the ECS task metadata (cluster name, task arn, etc) to the datadog tags. And its very unlikely for this code to fail... that only happens if there is an alloc failure. But, if there is a failure, what should we do?

I was thinking that just skipping and applying the next tag is best. Technically I think the tag string has to be in a nice format like key:val,key2:val2. When it fails, we don't know if it failed in the middle of adding a tag, so at the continue here you could have an incomplete string like key:val,key2:. I do not know how bad this is.

My guess was that continuing and risking that the tags are mis-formatted is better than just discarding the tags or discarding the record.

matthewfala

Looks good! The code looks much safer now. Made some small comments.

It seems like the convention you are following for flb_errno() is to call that only on failed allocations and reallocations, not on failed frees.

matthewfala · 2022-08-22T23:35:53Z

plugins/out_datadog/datadog.c


-    /* Count number of records */
-    array_size = flb_mp_count(data, bytes);
+    array_size = (int) total_events;


Why downcast from a size_t to an int? Could you make array_size a size_t, or total_events an int?

matthewfala · 2022-08-23T00:45:30Z

plugins/out_datadog/datadog.c

@@ -110,13 +111,14 @@ static int datadog_format(struct flb_config *config,
    msgpack_object k;
    msgpack_object v;
    struct flb_out_datadog *ctx = plugin_context;
+    struct flb_event_chunk *event_chunk = flush_ctx;
+
+    array_size = (int) event_chunk->total_events;


is it possible to make this a size_t rather than an int to avoid downcasting issues?

yea this is because I thought the msgpack API wants an int but actually it does want a size_t so this cast is very silly...

matthewfala · 2022-08-23T01:22:09Z

plugins/out_datadog/datadog.c

            if (!remapped_tags) {
                remapped_tags = flb_sds_create_size(byte_cnt);
+                if (!remapped_tags) {
+                    flb_errno();
+                    msgpack_sbuffer_destroy(&mp_sbuf);
+                    msgpack_unpacked_destroy(&result);
+                    return -1;
+                }
+            } else if (flb_sds_len(remapped_tags) < byte_cnt) {
+                tmp = flb_sds_increase(remapped_tags, flb_sds_len(remapped_tags) - byte_cnt);
+                if (!tmp) {
+                    flb_errno();
+                    flb_sds_destroy(remapped_tags);
+                    msgpack_sbuffer_destroy(&mp_sbuf);
+                    msgpack_unpacked_destroy(&result);
+                    return -1;
+                }
+                remapped_tags = tmp;


So, to confirm:
This new section of code addresses the following problem

a remapped_tags buffer is created for the first log event, based on it's size estimations

the remapped_tags buffer is cleared and reused for the second log event, disregarding the second logevent's tag size estimation

the remapped_tags buffer is too small which triggers a segfault.

The solution is as follows:
Reuse the remapped_tags unless the new size estimate is larger, in which case reinitialize.

It seem, though, that this section of code is an optimization and not necessary.

If there is too little space for the contents of the string, then the flb_sds_cat code will reinitialize the buffer with more space for all future use. Reallocating the memory upfront is more efficient, which is why this solution seems like a good idea.

It seem, though, that this section of code is an optimization and not necessary.

If there is too little space for the contents of the string, then the flb_sds_cat code will reinitialize the buffer with more space for all future use. Reallocating the memory upfront is more efficient, which is why this solution seems like a good idea.

Yea this is what I'm going for... trying to make sure this buffer is reallocated as few times as needed.

plugins/out_datadog/datadog_remap.h

Signed-off-by: Wesley Pettit <[email protected]>

rajeev-netomi · 2022-11-11T05:34:43Z

Can we please prioritise this PR ? Due to this issue we have been facing regular fluentbit crashes while sending logs to datadog.

PettitWesley · 2022-11-14T22:20:14Z

@edsiper CI passes except for CIFuzz. This is a bug fix which was causing crashes for a csutomer. Can we please merge?

…uent#5929)" This reverts commit 300206a.

PettitWesley requested review from nokute78 and edsiper as code owners August 22, 2022 23:31

github-actions bot added the docs-required label Aug 22, 2022

PettitWesley commented Aug 22, 2022

View reviewed changes

PettitWesley force-pushed the flb_sds_cat_datadog branch 2 times, most recently from 5a0655d to 0685187 Compare August 22, 2022 23:43

PettitWesley commented Aug 22, 2022

View reviewed changes

matthewfala reviewed Aug 23, 2022

View reviewed changes

PettitWesley force-pushed the flb_sds_cat_datadog branch from 0685187 to b0b7163 Compare September 8, 2022 23:45

PettitWesley temporarily deployed to pr September 8, 2022 23:45 Inactive

PettitWesley temporarily deployed to pr September 9, 2022 00:05 Inactive

PettitWesley added this to the Fluent Bit v1.9.9 milestone Sep 16, 2022

lecaros modified the milestones: Fluent Bit v1.9.9, Fluent Bit v1.9.10 Sep 27, 2022

out_datadog: fix/add error handling for all flb_sds calls

794acb1

Signed-off-by: Wesley Pettit <[email protected]>

PettitWesley force-pushed the flb_sds_cat_datadog branch from b0b7163 to 794acb1 Compare October 5, 2022 16:24

PettitWesley temporarily deployed to pr October 5, 2022 16:47 Inactive

PettitWesley temporarily deployed to pr October 5, 2022 16:50 Inactive

PettitWesley temporarily deployed to pr October 5, 2022 17:07 Inactive

edsiper merged commit 300206a into fluent:1.9 Nov 16, 2022

Claych added a commit to Claych/fluent-bit that referenced this pull request Dec 9, 2022

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

7be0f4b

…uent#5929)" This reverts commit 300206a.

matthewfala added a commit to matthewfala/fluent-bit that referenced this pull request Jan 7, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

98313eb

…uent#5929)" This reverts commit 300206a.

This was referenced Feb 4, 2023

data_dog: partially revert recent datadog PR to avoid provider ecs segfault #6785

Open

data_dog: partially revert recent datadog PR to avoid segfault #6786

Open

matthewfala added a commit to matthewfala/fluent-bit that referenced this pull request Feb 6, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

0538e05

…uent#5929)" This reverts commit 300206a.

matthewfala added a commit to matthewfala/fluent-bit that referenced this pull request Feb 23, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

64caec7

…uent#5929)" This reverts commit 300206a.

PettitWesley pushed a commit to PettitWesley/fluent-bit that referenced this pull request Mar 13, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

c6bd594

…uent#5929)" This reverts commit 300206a.

PettitWesley pushed a commit to PettitWesley/fluent-bit that referenced this pull request Apr 25, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

49c0e8e

…uent#5929)" This reverts commit 300206a.

PettitWesley pushed a commit to PettitWesley/fluent-bit that referenced this pull request May 2, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

610d47b

…uent#5929)" This reverts commit 300206a.

PettitWesley pushed a commit to PettitWesley/fluent-bit that referenced this pull request Jun 2, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

96ce72a

…uent#5929)" This reverts commit 300206a.

PettitWesley pushed a commit to PettitWesley/fluent-bit that referenced this pull request Jun 8, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

3227b04

…uent#5929)" This reverts commit 300206a.

matthewfala added a commit to matthewfala/fluent-bit that referenced this pull request Sep 23, 2023

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

cd6313b

…uent#5929)" This reverts commit 300206a.

PettitWesley pushed a commit to PettitWesley/fluent-bit that referenced this pull request May 22, 2024

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

a7fe5a3

…uent#5929)" This reverts commit 300206a.

zhihonl pushed a commit to zhihonl/fluent-bit that referenced this pull request Aug 20, 2024

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

efed0b6

…uent#5929)" This reverts commit 300206a.

swapneils pushed a commit to amazon-contributing/upstream-to-fluent-bit that referenced this pull request Oct 3, 2024

Revert "out_datadog: fix/add error handling for all flb_sds calls (fl…

52e1656

…uent#5929)" This reverts commit 300206a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

out_datadog: fix/add error handling for all flb_sds calls #5929

out_datadog: fix/add error handling for all flb_sds calls #5929

PettitWesley commented Aug 22, 2022

PettitWesley commented Aug 22, 2022

PettitWesley Aug 22, 2022

PettitWesley Aug 22, 2022

matthewfala left a comment

matthewfala Aug 22, 2022

matthewfala Aug 23, 2022

PettitWesley Aug 23, 2022

matthewfala Aug 23, 2022

matthewfala Aug 23, 2022

PettitWesley Aug 23, 2022

rajeev-netomi commented Nov 11, 2022

PettitWesley commented Nov 14, 2022

out_datadog: fix/add error handling for all flb_sds calls #5929

out_datadog: fix/add error handling for all flb_sds calls #5929

Conversation

PettitWesley commented Aug 22, 2022

PettitWesley commented Aug 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewfala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rajeev-netomi commented Nov 11, 2022

PettitWesley commented Nov 14, 2022