Duplicated log entry fields when using Kubernetes filter with Merge_Log #1051

fhoy · 2019-01-23T11:59:30Z

With a particular combination of filters, I end up with duplicated fields in the log entry, without understanding how. Detailed, minimal config is attached, but a short description is as follows:

Read log line 2019-01-18T08:44:02,099 message=Some message using input plugin "dummy".
Filter log line through filter "parser", using a regex parser extracting fields prefix_timestamp and prefix_message.
Filter log object through filter "nest", moving extracted fields to be subfields of field log, and removing prefix prefix_.
Filter log object through filter "kubernetes", using Merge_Log to store Kubernetes fields under the log field.
Print log object using output plugin "stdout".

The result of this is the following:

{"log"=>{"timestamp"=>"2019-01-18T08:44:02,099", "message"=>"Some message"}, "timestamp"=>"2019-01-18T08:44:02,099", "message"=>"Some message", "kubernetes"=>{"dummy"=>"Wed Jan 23 10:24:00 2019"}}

, i.e. fields timestamp and message occur twice; at the root of the JSON object, and nested within field log. Furthermore, Kubernetes data is not nested within the log field.

Setting the Merge_Log attribute of the Kubernetes filter to Off gives

{"log"=>{"timestamp"=>"2019-01-18T08:44:02,099", "message"=>"Some message"}, "kubernetes"=>{"dummy"=>"Wed Jan 23 10:35:15 2019"}}

, i.e. no duplication.

I haven't been able to figure out what goes wrong in the Fluent Bit code doing the merge, or if I have misunderstood how the filters cooperate.

To Reproduce

Save attached files to directory ${dir_containing_attached_files}.
Rename files to remove .txt suffix.
Run docker run -i -t -v ${dir_containing_attached_files}:/fluent-bit/test fluent/fluent-bit:0.14.9 /fluent-bit/bin/fluent-bit -c /fluent-bit/test/fluent-bit.conf

I've also tested with the latest available image, 1.0.3, with the same results.

Config files:

The text was updated successfully, but these errors were encountered:

encircled · 2019-09-12T13:30:06Z

Hi,
I've faced similar issue. When a json log entry contains duplicate attributes and Merge_Log is On, fluent_d does:

An entry with duplicate is completely skipped without any error in fluent d logs
Sends a few duplicated log entries (other ones)

Our internal serialization format allows to have several keys with the same name in a map. When converting to JSON the case is quite dangerous, despite the JSON spec is not mandatory about having unique keys at the same level in a map, we got tons of reports that backend services that receives JSON payload raises exception due to duplicated keys. On this patch now the JSON formatter will check for duplicated keys, if found, it preserves the latest one found. Signed-off-by: Eduardo Silva <[email protected]>

…#1835 #1051) Signed-off-by: Eduardo Silva <[email protected]>

edsiper · 2020-09-30T17:58:09Z

FYI:

I've pushed fix 1d14886 to handle duplicated keys in a map. The workaround is basically: when converting the data to JSON, if a duplicated key is found, only pack the last value found for that key.

This will be part of v1.6 release next week.

…1051) Our internal serialization format allows to have several keys with the same name in a map. When converting to JSON the case is quite dangerous, despite the JSON spec is not mandatory about having unique keys at the same level in a map, we got tons of reports that backend services that receives JSON payload raises exception due to duplicated keys. On this patch now the JSON formatter will check for duplicated keys, if found, it preserves the latest one found. Signed-off-by: Eduardo Silva <[email protected]>

…fluent#1835 fluent#1051) Signed-off-by: Eduardo Silva <[email protected]>

…1051) Our internal serialization format allows to have several keys with the same name in a map. When converting to JSON the case is quite dangerous, despite the JSON spec is not mandatory about having unique keys at the same level in a map, we got tons of reports that backend services that receives JSON payload raises exception due to duplicated keys. On this patch now the JSON formatter will check for duplicated keys, if found, it preserves the latest one found. Signed-off-by: Eduardo Silva <[email protected]>

…fluent#1835 fluent#1051) Signed-off-by: Eduardo Silva <[email protected]>

* ra: fix typo of comment Signed-off-by: Takahiro YAMASHITA <[email protected]> * build: add an option for OSS-Fuzz builds (fluent#2502) This will make things a lot easier from the OSS-Fuzz side and also make it easier to construct new fuzzers. * aws: utils: fix mem leak in flb_imds_request (fluent#2532) Signed-off-by: Wesley Pettit <[email protected]> * io: fix EINPROGRESS check, also check that errno is not zero (fluent#2497) Signed-off-by: Eduardo Silva <[email protected]> * upstream: new 'destroy_queue' to defer connections context destroy (fluent#2497) A connection context might be destroyed while the event loop still has some pending event to be processed, in some cases a network exception. Destroying the context might lead to a corruption. The following patch implements a new queue to store temporary the connection context so the 'destroy' process is defered until all events from the event loop has been processed. Signed-off-by: Eduardo Silva <[email protected]> * engine: do upstream cleanup after the event loop finished processing events (fluent#2497) Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: parse tag that matches kubernetes default regex Signed-off-by: Jeff Luo <[email protected]> * test: runtime: out_stackdriver: add unit test for tag regex matching Signed-off-by: Jeff Luo <[email protected]> * upstream: expose flb_upstream_conn_pending_destroy() Signed-off-by: Eduardo Silva <[email protected]> * tests: runtime: filter_modify: fix sleep() usage and others Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: fix header guard Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: add missing ctype.h header on http_request Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: add cast for regex_match() variable Signed-off-by: Eduardo Silva <[email protected]> * out_azure_blob: add support for blockblob type and new blob_type property Signed-off-by: Eduardo Silva <[email protected]> * upstream: fix deletion of destroy_queue Signed-off-by: Eduardo Silva <[email protected]> * io: remove unnecessary errno message Signed-off-by: Eduardo Silva <[email protected]> * upstream: remove unnecessary errno message Signed-off-by: Eduardo Silva <[email protected]> * out_syslog: move configuration to a config_map Signed-off-by: Eduardo Silva <[email protected]> * pack: print timestamp formatting timespec fields Signed-off-by: Eduardo Silva <[email protected]> * filter_lua: fix handling of integer/double values (fluent#1932 fluent#1647) This patch makes to recognize the preferred numeric data type returned by the Lua script in the record fields. Signed-off-by: Eduardo Silva <[email protected]> * filter_lua: new 'time_as_table' property for timestamp handling (fluent#641 fluent#2519 fluent#2015) Long time ago we got some reports that using timestamps as double/floats might lost precision when the values are converted back from Lua. Actually there is no exact way to have 100% precision in doubles in our use case 'C > Lua > C'. Community suggested that we might workaround this with another solution. This patch considering backward compatibility, implements a new configuration property called 'time_as_table', which passes the timestamp as a Lua table with the following keys ['sec'] => timestamp seconds ['nsec'] => timestamp nanoseconds for users looking for 100% timestamp precision and specifically when dealing with nanoseconds, this option ensures timestamp integrity. If the option is enabled the user just need to adjust the script to use the new format if they touch or use the timestamp value. By default the option 'time_as_table' is disabled and in the future we might consider to enable it by default. Signed-off-by: Eduardo Silva <[email protected]> * out_file: use flb_time to format timestamp Signed-off-by: Eduardo Silva <[email protected]> * pack: remove unused variable Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: mp: include 'flb_compat.h' <unistd.h> is not POSIX standard and not provided by Microsoft Visual C++. Include "flb_compat.h" to avoid the build failure. Signed-off-by: Fujimoto Seiji <[email protected]> * utils: add `flb_utils_hex2int()` function mk_utils_hex2int() is included in mk_server. For this reason, this function only exists when FLB_HTTP_SERVER is enabled. Bring the function to flb_utils.c, so that we can use the functionality regardless of the compile-time option. Signed-off-by: Fujimoto Seiji <[email protected]> * Port Azure BLOB Output plugin to Windows With this patch out_azure_blob can be compiled and linked on Windows Server. This patch also fixes CI builds on AppVeyor. Signed-off-by: Fujimoto Seiji <[email protected]> * pack: Use PRIu64/PRId64 to format 64 bit integers. Do not use `%lu` to format 64-bit int; A long can be 32-bit depending on the system, so this line can implicitly truncate the target value. This should fix the "timestamps are truncated inproperly" bug on out_datadog. Signed-off-by: Fujimoto Seiji <[email protected]> * tests: internal: fuzzer: added three new fuzzer for msgpack handling (fluent#2541) Signed-off-by: davkor <[email protected]> * tests: runtime: out_cloudwatch_logs: Disable net.keepalive in test (fluent#2533) Signed-off-by: Wesley Pettit <[email protected]> * in_forward: fix header declaration (copy paste) (fluent#2559) Use FLB_IN_FW_CONFIG_H instead the MQTT one * ci: travis: remove deprecated key (fluent#2454) See: https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration Signed-off-by: Zero King <[email protected]> * out_es: support nanosecond timestamp precision (fluent#2544) Starting in Elasticsearch 7, a "date_nanos" data type was added, increasing timestamp precision from milliseconds to nanoseconds. This patch adds a "Time_Key_Nanos" option which tells the ElasticSearch output plugin to send 9 decimal places instead of 3 to ElasticSearch. Tests are included, and a patch to document the new option will be submitted shortly. Signed-off-by: Neal Turett <[email protected]> * doc: Add missing plugins to README (fluent#2534) Signed-off-by: zihengCat <[email protected]> * lib: sqlite: upgrade from 3.31.0 to 3.33.0 (fluent#2552) Signed-off-by: Eduardo Silva <[email protected]> * random: Introduce a random generator (fluent#2555) This adds flb_randombytes(), which fills the given buffer with random numbers using each OS's built-in Crypt API. The most common use scenario is: unsigned char buf[64] if (get_randombytes(buf, 64)) { flb_error("cannot get random bytes"); } This function supports both UNIX and Windows. You can use this function without caring the underlying OS. Signed-off-by: Fujimoto Seiji <[email protected]> * lib: mbedtls: upgrade from 2.16.5 to 2.24.0 Signed-off-by: Eduardo Silva <[email protected]> * lib: mbedtls: comment out link_to_source call Signed-off-by: Eduardo Silva <[email protected]> * lib: mbedtls: comment out Python3 check and 3dparty dir Signed-off-by: Eduardo Silva <[email protected]> * pack: gelf: validate incoming object is a map (oss-fuzz 25754) Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: fuzzer: remove test message Signed-off-by: Eduardo Silva <[email protected]> * lib: luajit: upgrade from 2.1.0-beta3 to 2.1.0-dd5032e Signed-off-by: Eduardo Silva <[email protected]> * build: adjust luajit path Signed-off-by: Eduardo Silva <[email protected]> * pack: gelf: format timestamp as seconds.milliseconds Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: gelf: adjust test case for expected records and timestamp Signed-off-by: Eduardo Silva <[email protected]> * in_cpu: normalize per-process CPU stats by number of cores (fluent#2543) Signed-off-by: yang-padawan <[email protected]> * pack: json_sds: validate unpacking Signed-off-by: Eduardo Silva <[email protected]> * utils: fix bad handling of invalid utf-8 bytes (oss-fuzz 25785) Signed-off-by: Eduardo Silva <[email protected]> * strptime: Add a fallback macro for `timezone` (fluent#2493) According to the UNIX standard: The external variable timezone shall be set to the difference, in seconds, between Coordinated Universal Time (UTC) and local standard time FreeBSD is incompatible with this standard. In particular, since it exposes a function symbol `char* timezone(int, int)`, expressions like `-(timezone)` causes a compile error. Fix it by adding a compat macro for FreeBSD. Signed-off-by: Fujimoto Seiji <[email protected]> * out_forward: always initialize salt with random numbers (fluent#2575) There is an initialization bug that leaves the shared key salt being uninitialized when SSL is not enabled. This might allow attackers to guess the shared key by looking at the hash. Let's always initialize the salt buffer securely using flb_randombytes(). Signed-off-by: Fujimoto Seiji <[email protected]> * out_gelf: port the plugin to Windows (fluent#2574) With this patch, the GELF Output plugin can be compiled and linked on Windows. This commit also fixes a few bugs in GELF plugin: * The message header was not constructed properly. In particular, 11-12th bytes were filled in the reversed order. * The message id generation was bogus (e.g. it did "tv_nsec*1000000 + tm.tm.tv_nsec" to generate a timestamp) Fix these glitches and add detailed documentation to each function. Signed-off-by: Fujimoto Seiji <[email protected]> * out_pgsql: add support for CockroachDB (fluent#2512) The SQL query will change when new option `cockroachdb` it's set to true Signed-off-by: Jonathan Gonzalez V <[email protected]> Co-authored-by: Jonathan Gonzalez V <[email protected]> * lib: chunkio: upgrade to v1.0.6 (dev changes) Signed-off-by: Eduardo Silva <[email protected]> * fstore: new interface to manage local files storage Signed-off-by: Eduardo Silva <[email protected]> * random: rename function flb_randombytes() to flb_random_bytes() Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: random: fix function name Signed-off-by: Eduardo Silva <[email protected]> * out_forward: remove unused variable and adjust random api name Signed-off-by: Eduardo Silva <[email protected]> * build: on FLB_SMALL mode, do not use 'strip' flag for non-gcc compiler Signed-off-by: Eduardo Silva <[email protected]> * strptime: initialize 'len' variable Signed-off-by: Eduardo Silva <[email protected]> * pack: gelf: initialize 'val_len' variable Signed-off-by: Eduardo Silva <[email protected]> * str: use memcpy to silent gcc warning Signed-off-by: Eduardo Silva <[email protected]> * io: tls: remove unused conditional and initialize variable Signed-off-by: Eduardo Silva <[email protected]> * filter_lua: initialize variable Signed-off-by: Eduardo Silva <[email protected]> * filter_parser: initialize variable Signed-off-by: Eduardo Silva <[email protected]> * filter_kubernetes: validate metadata is a map Signed-off-by: Eduardo Silva <[email protected]> * in_systemd: initialize variable Signed-off-by: Eduardo Silva <[email protected]> * out_azure_blob: add cast Signed-off-by: Eduardo Silva <[email protected]> * out_gelf: initialize variable Signed-off-by: Eduardo Silva <[email protected]> * out_syslog: initialize variables Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: signv4: initialize variable Signed-off-by: Eduardo Silva <[email protected]> * in stdin: add buffer_size parameter (fluent#2364) * in sdtin: add buffer_size parameter Signed-off-by: Martin Dojcak <[email protected]> * lib: mbedtls: force compiler c99 mode Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: disable gelf test in Win32 Signed-off-by: Eduardo Silva <[email protected]> * aws: s3 key format option Signed-off-by: Meghna Prabhu <[email protected]> * signv4: Support S3 Signed Payload Signed-off-by: Wesley Pettit <[email protected]> * out_es: update with new signv4 api Signed-off-by: Wesley Pettit <[email protected]> * aws: add new S3 local buffering library Signed-off-by: Wesley Pettit <[email protected]> * aws: s3 object key formatting and request signing Signed-off-by: Wesley Pettit <[email protected]> * out_s3: new output plugin for Amazon S3 Signed-off-by: Wesley Pettit <[email protected]> * out_s3: use new fstore API plus other adjustments - use new fstore API for buffer management - register plugin flags - config: change configmap for TIME and SIZE value types - config: rename chunk_buffer_dir -> store_dir - fix leaks on exceptions - use field to 'lock' chunks, instead of temporary node deletion - others adjustments note: still work in process. Signed-off-by: Eduardo Silva <[email protected]> * aws: fix broken s3 key test Signed-off-by: Wesley Pettit <[email protected]> * out_kinesis_firehose: new high performance core plugin for Kinesis Firehose (fluent#2572) Signed-off-by: Wesley Pettit <[email protected]> * in_tail: implement large-file support for Windows Previously it was not possible to handle files larger than 2GB on Windows. The fix is archived by migrating each vairable and function to be explicitly 64-bit ready. Signed-off-by: Fujimoto Seiji <[email protected]> * in_tail: do a small cleanup of the Windows port We have added a number of helpers to make it easier to support Windows (e.g. `flb_pipefd_t` as a portable type for pipes). This introduces these helpers to in_tail. Signed-off-by: Fujimoto Seiji <[email protected]> * fstore: return on memory exception (CID 309436) Signed-off-by: Eduardo Silva <[email protected]> * fstore: release context on exception (CID 309435) Signed-off-by: Eduardo Silva <[email protected]> * fstore: fix error message parameter (CID 309433) Signed-off-by: Eduardo Silva <[email protected]> * signv4: fix sds_printf parameters (CID 306765) Signed-off-by: Eduardo Silva <[email protected]> * out_azure_blob: fix use-after-free on exception (CID 306763 306762) Signed-off-by: Eduardo Silva <[email protected]> * out_azure_blob: do not use http context before validation (CID 306663) Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: release pattern on exception (CID 305802) Signed-off-by: Eduardo Silva <[email protected]> * in_tail: fix debug message params (CID 305255) Signed-off-by: Eduardo Silva <[email protected]> * out_cloudwatch_logs: add support for custom sts endpoint Signed-off-by: Meghna Prabhu <[email protected]> * out_es: add support for custom STS endpoint Signed-off-by: Meghna Prabhu <[email protected]> * aws: add support for custom STS endpoint in flb_aws_sts_provider Signed-off-by: Meghna Prabhu <[email protected]> * aws: add support for custom sts endpoint in flb_eks_provider Signed-off-by: Meghna Prabhu <[email protected]> * aws: sts: set custom_endpoint flag correctly Signed-off-by: Wesley Pettit <[email protected]> * out_s3: Add sts_endpoint and role_arn options Signed-off-by: Wesley Pettit <[email protected]> * out_kinesis_firehose: Add sts_endpoint option Signed-off-by: Wesley Pettit <[email protected]> * out_cloudwatch_logs: truncate events that are larger than 256KiB Signed-off-by: Wesley Pettit <[email protected]> * out_cloudwatch_logs: fix off by one error in payload creation Signed-off-by: Wesley Pettit <[email protected]> * out_cloudwatch_logs: Use documented max payload size and other fixes Signed-off-by: Wesley Pettit <[email protected]> * out_kinesis_firehose: fix type of event_bytes (CID 309457) Signed-off-by: Wesley Pettit <[email protected]> * out_kinesis_firehose: Use correct printf format for time_t (CID 309456) Signed-off-by: Wesley Pettit <[email protected]> * fstore: fix null dereference (CID 309455) Signed-off-by: Wesley Pettit <[email protected]> * aws: util: fix double free issues in get_s3_key (CID 309453 & 309443) (fluent#2599) Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix possible null dereference (CID 309448) Signed-off-by: Wesley Pettit <[email protected]> * out_s3: always cast fstore meta_buf to char * (CID 309442) Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix mem leak (CID 309438) Signed-off-by: Wesley Pettit <[email protected]> * lib: jansson: add jansson library for upcoming Avro support (fluent#2568) Signed-off-by: xmcqueen <[email protected]> * doc: add users: 'Microsoft', 'Linked In' and 'Trend Micro' Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: use the correct project ID The value for project ID of monitored resources of a LogEntry should be project ID, not the project number, which is in numeric format. The value of the project ID of Cloud Logging's Monitored Resources are documented in https://cloud.google.com/logging/docs/api/v2/resource-list. The project ID to be retrived from instance metadata is documented in https://cloud.google.com/compute/docs/storing-retrieving-metadata#default. Signed-off-by: Yen-Cheng Chou <[email protected]> * pack: fix json floating point format regression (fluent#2592) The recent change to the JSON floating point formatting to use "%.16g" caused a regression where values that have no fractional part are formatted as integers. For example, "10.0" gets formatted as "10". This patch uses the same approach as https://github.com/ohler55/oj/blob/v3.10.13/ext/oj/dump_strict.c#L100-L101, which is used in Fluentd. It checks if the double value is equal to the integer part, and if so, will use "%.1f" as the format to ensure the decimal part is still rendered (with a single decimal place of ".0"). This prevents downstream datastores from having data type conflicts. This was tested by building locally and running through different value using the dummy input plugin and stdout output plugin with json_lines formatting. Will include example outputs of tests in Pull Request. Signed-off-by: Joey Paskhay <[email protected]> * aws_util: fix failing sanitizer builds (fluent#2604) Signed-off-by: Wesley Pettit <[email protected]> * out_s3: use correct printf format for MAX_UPLOAD_ERRORS (CID 309440) (fluent#2602) Signed-off-by: Wesley Pettit <[email protected]> * out_stackdriver: fix project_id length for testing (fluent#2611) Signed-off-by: Yen-Cheng Chou <[email protected]> * out_http: on error, print HTTP response payload if available (fluent#2593) Signed-off-by: Eduardo Silva <[email protected]> * config: new flag to determinate if the ingestion is active Signed-off-by: Eduardo Silva <[email protected]> * engine: upon exit, turn ingestion flag off Signed-off-by: Eduardo Silva <[email protected]> * input: if ingestion is disable, do not resume it Signed-off-by: Eduardo Silva <[email protected]> * input: chunk: if ingestion is disable, do not resume it Signed-off-by: Eduardo Silva <[email protected]> * lib: rename worker thread to 'flb-pipeline' Signed-off-by: Eduardo Silva <[email protected]> * log: rename worker thread to 'flb-logger' Signed-off-by: Eduardo Silva <[email protected]> * bin: spawn pipeline using library mode (fluent#1496 fluent#2610) Signed-off-by: Eduardo Silva <[email protected]> * task: fix counter of running tasks, use 'users' counter (fluent#2411) Signed-off-by: Eduardo Silva <[email protected]> * input_chunk: drop oldest buffer chunks when reaching the limit Signed-off-by: Jeff Luo <[email protected]> * output: add new buffer option: storage.total_limit_size Signed-off-by: Jeff Luo <[email protected]> * router: add function to get routes_mask by tag Signed-off-by: Jeff Luo <[email protected]> * scheduler: clean up request in request_wait queue Signed-off-by: Jeff Luo <[email protected]> * in_tail: add checking to avoid deleting fd with value -1 Signed-off-by: Jeff Luo <[email protected]> * pipe: add checking to avoid closing fd with value -1 Signed-off-by: Jeff Luo <[email protected]> * task: creat task_routes by routes_mask Signed-off-by: Jeff Luo <[email protected]> * test: internal: input_chunk: add unit tests for buffering mechanism Signed-off-by: Jeff Luo <[email protected]> * in_tail: inotify: use proper logger API Signed-off-by: Eduardo Silva <[email protected]> * lib: new flb_loop() to wait for main thread exit Signed-off-by: Eduardo Silva <[email protected]> * bin: use new flb_loop() for clear exit Signed-off-by: Eduardo Silva <[email protected]> * out_gelf: port the random seed generation to Windows (fluent#2614) Windows does not have /dev/urandom. For this reason, it was always using a less secure value (= UNIX time) as an entropy source. Use flb_randombytes() to use a good entropy source, and thus, reduce the possibility of message collision. Signed-off-by: Fujimoto Seiji <[email protected]> * in_random: use the new API to generate a random seed (fluent#2613) Simplify the random seed generation by using flb_random_bytes() instead of trying to read entropy sources manually. Signed-off-by: Fujimoto Seiji <[email protected]> * aws: util: initialize key buffer to zero Signed-off-by: Eduardo Silva <[email protected]> * pack: on JSON formatting, remove duplicated keys (fluent#1835 fluent#1051) Our internal serialization format allows to have several keys with the same name in a map. When converting to JSON the case is quite dangerous, despite the JSON spec is not mandatory about having unique keys at the same level in a map, we got tons of reports that backend services that receives JSON payload raises exception due to duplicated keys. On this patch now the JSON formatter will check for duplicated keys, if found, it preserves the latest one found. Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: pack: add check for handling of duplicated JSON keys (fluent#1835 fluent#1051) Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: pack: release buffers Signed-off-by: Eduardo Silva <[email protected]> * in_tail: Prevent concatenation of no mulitline logs Signed-off-by: Dominik Rosiek <[email protected]> * tests: runtime: in_tail: Add test for dmode_firstline for docker mode Signed-off-by: Dominik Rosiek <[email protected]> * in_tail: read from tail (duh!) (fluent#1667 fluent#1761 fluent#474 fluent#1645 fluent#1330) From now on, Tail plugin implements the following behavior: 1. If a file is already registered in the database and contains an offset, the file is consumed from that offset position. 2. Upon start, if a file is not known by the database, read from it tail. 3. If the new 'read_from_head' property (default: false) is enabled, for newly discovered files read from the beginning. This flag don't override the behavior of a file that already exists in the database. Additional fix: When a file is being monitored in 'static mode', handle truncation properly. Signed-off-by: Eduardo Silva <[email protected]> * tests: runtime: filter_kubernetes: use tail + 'read_from_tail' Signed-off-by: Eduardo Silva <[email protected]> * tests: runtime: out_stackdriver: use tail + 'read_from_tail' Signed-off-by: Eduardo Silva <[email protected]> * tests: runtime: in_tail: use 'read_from_tail' Signed-off-by: Eduardo Silva <[email protected]> * input: chunk: comments cleanup Signed-off-by: Eduardo Silva <[email protected]> * bin: win32: fix 'error LNK2001: unresolved external symbol config' (fluent#2627) f303cdb has removed the global symbol "config" on which winsvc.c depends. This ended up producing the following error: error LNK2001: unresolved external symbol config Bring back the symbol to make Windows Service working again. Signed-off-by: Fujimoto Seiji <[email protected]> * in_tail: db: reset bindings after file insertion (fluent#2576) Signed-off-by: Eduardo Silva <[email protected]> * in_tail: always read from head for new discovered files Signed-off-by: Eduardo Silva <[email protected]> * out_cloudwatch_logs: bug fix: self throttle if too many calls per flush (fluent#2618) Signed-off-by: Wesley Pettit <[email protected]> * out_kinesis_firehose: add support for log_key (fluent#2619) * out_kinesis_firehose: add support for log_key Signed-off-by: Wesley Pettit <[email protected]> * aws_util: increase the jsmn default token size Signed-off-by: Drew Zhang <[email protected]> * filter_aws: expand EC2 metadata fields Signed-off-by: Drew Zhang <[email protected]> * ci: Skip 'tests/internal/input_chunk.c' on Windows This unittest relies on chunkio's file storage, which Windows port does not support yet. Let's skip it for now. Signed-off-by: Fujimoto Seiji <[email protected]> * filter_aws: fix return in get_vpc_metadata Signed-off-by: Wesley Pettit <[email protected]> * out_s3: add scaffolding for runtime tests Signed-off-by: Wesley Pettit <[email protected]> * tests: runtime: out_s3: add runtime tests Signed-off-by: Wesley Pettit <[email protected]> * input: on socket collector creation, return collector id Signed-off-by: Eduardo Silva <[email protected]> * in_forward: partial workaround to close connections on SIGTERM (fluent#2610) This patch adds some extra handling for service termination, so when the engine paused ingestion, the server socket is close to avoid having incoming connection and not-processed records. This patch works when the service is not in a scenario with back-pressure, so it can be considered a partial workaround. There is an upcoming re-resign of server side API for scalability that will address all these issues. Signed-off-by: Eduardo Silva <[email protected]> * fstore: extend API to map all chunks Signed-off-by: Eduardo Silva <[email protected]> * out_s3: migrate uploads to fstore API Signed-off-by: Eduardo Silva <[email protected]> * out_s3: update file size Signed-off-by: Eduardo Silva <[email protected]> * out_cloudwatch_logs: don't include <unistd.h> on Windows (fluent#2631) MSVC++ does not provide <unistd.h>. For this reason, including this header breaks the compilation on Windows. Fix it by adding an ifdef block to the header. Signed-off-by: Fujimoto Seiji <[email protected]> * cmake: define S3 and Kinesis Firehose plugin for Windows (fluent#2632) Turn off plugins that do not work on Windows yet, so that we can compile the master HEAD on MSVC. This patch is required for v1.6 release. Signed-off-by: Fujimoto Seiji <[email protected]> * in_tail: new 'db.locking' option and other perf improvements (fluent#2564) This patch adjust sqlite synchronization mode by default to 'normal', it sets journal mode to WAL and it adds a new option called 'db.locking' (default: false) which helps to reduce the number of syscalls on every commit but at the price of locking the access to the database file to third party programs. The new adjustments makes a significant performance increase reducing database I/O in about 40%. Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: clean up buffer before return (fluent#2640) Signed-off-by: Yen-Cheng Chou <[email protected]> * task: fix typo messages and variables name 'attemps' to 'attempts' (fluent#2636) Signed-off-by: Tanapol Rattanapichetkul <[email protected]> * out_stackdriver: update error message (fluent#2624) Signed-off-by: Peijia Luo <[email protected]> * aws: remove old s3 local buffer lib Signed-off-by: Wesley Pettit <[email protected]> * out_s3: add bucket name to store dir Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix build warnings Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix help text for upload_timeout Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix UploadPart response parsing Signed-off-by: Wesley Pettit <[email protected]> * out_s3: make iso8601 the default date format Signed-off-by: Wesley Pettit <[email protected]> * fstore: fix build warning Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix build failure Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix multipart upload buffering (fluent#2645) Signed-off-by: Wesley Pettit <[email protected]> * in_forward: handle 'compressed text' option (fluent#2644) Signed-off-by: Eduardo Silva <[email protected]> * out_stackdriver: support monitored resource ingestion from log entry Signed-off-by: Jie Wu <[email protected]> * tests: runtime: out_stackdriver: monitored resource ingestion Signed-off-by: Jie Wu <[email protected]> * signv4: fix uri encoding to allow '=' in the uri path Signed-off-by: Wesley Pettit <[email protected]> * out_s3: use UPLOAD_TIMER_MIN_WAIT correctly Signed-off-by: Wesley Pettit <[email protected]> * out_es: make aws_sts_endpoint optional Signed-off-by: Wesley Pettit <[email protected]> * lib: invalidate context on destroy Signed-off-by: Eduardo Silva <[email protected]> * out_es: increase default response buffer size to 512k Signed-off-by: Eduardo Silva <[email protected]> * aws: add eks, sts, and env credential providers Added three credentials providers: - EKS: Obtain credentials via sts:AssumeRoleWithWebIdentity using a k8s OIDC token - STS: Given a base set of credentials, obtain a new set with sts:AssumeRole - ENV: Standard environment variables for credentials Signed-off-by: Wesley Pettit <[email protected]> * out_cloudwatch_logs: add support for cpu and mem metrics using cloudwatch emf Signed-off-by: Rayhan Hossain <[email protected]> * out_cloudwatch_logs: rabse with master and address error handling This change rebases the emf support code with master. Additionally, it adds null memory allocation check and better error handling. Signed-off-by: Rayhan Hossain <[email protected]> * out_cloudwatch_logs: remove redundant code blocks and fix spelling mistake Signed-off-by: Rayhan Hossain <[email protected]> * out_cloudwatch_logs: fix spelling mistake Signed-off-by: Rayhan Hossain <[email protected]> * lib: chunkio: sync dev changes v1.0.6 #1c59044 Signed-off-by: Eduardo Silva <[email protected]> * fstore: delete empty streams on exit Signed-off-by: Eduardo Silva <[email protected]> * out_kafka: new option 'queue_full_retries' (fluent#2553) When the librdkafka queue is full and Fluent Bit cannot ingest more data, it does a 'local retry' of maximum 10 times waiting a second between each retry. But in some cases like the exposed in fluent#2553 is not enough. This patch exposes a new configuration property called 'queue_full_retries' that configure the maximum number of times the retry must be done. Note that this limit now can be disabled setting a value of '0' or 'false'. Signed-off-by: Eduardo Silva <[email protected]> * filter_tensorflow: new Tensorflow Lite filter plugin! (fluent#2511) Signed-off-by: Masoud Koleini <[email protected]> * tests: internal: aws: do not declare variables inside for loops Signed-off-by: Eduardo Silva <[email protected]> * aws: util: remove unneeded flb_s3_endpoint function Signed-off-by: Wesley Pettit <[email protected]> * out_s3: use path based endpoints to support bucket names with dots Signed-off-by: Wesley Pettit <[email protected]> * out_s3: s3_store: skip locked chunks in s3_store_file_get Signed-off-by: Wesley Pettit <[email protected]> * out_s3: fix mem leak Signed-off-by: Wesley Pettit <[email protected]> * out_s3: store: validate contexts on exit Signed-off-by: Eduardo Silva <[email protected]> * fstore: remove unused dump Signed-off-by: Eduardo Silva <[email protected]> * bin: on SIGTERM exit properly Signed-off-by: Eduardo Silva <[email protected]> * conf: update typo in fluent-bit.conf (fluent#2641) Simple typo in the default config for fluent-bit * fstore: fix invalid reference on exception (CID 310820) Signed-off-by: Eduardo Silva <[email protected]> * bin: on SIGTERM, wait for child thread Signed-off-by: Eduardo Silva <[email protected]> * out_s3: release 'uri' on exception (CID 310852) Signed-off-by: Eduardo Silva <[email protected]> * tests: runtime_shell: tail: set 'read_from_head' Signed-off-by: Eduardo Silva <[email protected]> * fstore: support chunkio backend type Signed-off-by: Eduardo Silva <[email protected]> * out_s3: store: detect Travis environment, if so, use fstore memory backend Signed-off-by: Eduardo Silva <[email protected]> * lib: mbedtls: CMakeLists.txt: fix a typo for CFLAGS (fluent#2662) CMAKE_C_CFLAGS -> CMAKE_C_FLAGS, this avoids C_FLAGS from environment being overridden. Signed-off-by: Ming Liu <[email protected]> * build: bump to v1.7.0 Signed-off-by: Eduardo Silva <[email protected]> * Dockerfile: bump to v1.7.0 Signed-off-by: Eduardo Silva <[email protected]> * bitbake: bump to v1.7.0 Signed-off-by: Eduardo Silva <[email protected]> * doc: add users: DigitalOcean Signed-off-by: Eduardo Silva <[email protected]> * tests: fuzzers: added a genera parser fuzzer (fluent#2665) Signed-off-by: davkor <[email protected]> * tests: internal: fuzzers: fix typo in cmakelists (fluent#2675) Signed-off-by: davkor <[email protected]> * scheduler: use flb_random_bytes() instead of /dev/urandom (fluent#2679) Windows does not support /dev/urandom. For this reason, it ended up using a weak entropy source for task scheduling. Avoid this issue by using flb_random_bytes() instead. Signed-off-by: Fujimoto Seiji <[email protected]> * pack: gelf: on flatten, only process string key types (oss-fuzz 26294) Signed-off-by: Eduardo Silva <[email protected]> * parser: on exception, use flb_parser_destroy() (oss-fuzz 26308) Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: fuzzer: initialize timestamp Signed-off-by: Eduardo Silva <[email protected]> * tests: internal: fuzzer: fix missing data type declaration Signed-off-by: Eduardo Silva <[email protected]> * Import libmaxminddb-1.3.2 Signed-off-by: okkez <[email protected]> * Add files to implement filter geoip2 Signed-off-by: okkez <[email protected]> * Add missing header Signed-off-by: okkez <[email protected]> * Implement filter_geoip2 Signed-off-by: okkez <[email protected]> * Add simple test for filter_geoip2 Signed-off-by: okkez <[email protected]> * Duplicate string Signed-off-by: okkez <[email protected]> * Use make install as INSTALL_COMMAND Signed-off-by: okkez <[email protected]> * Remove unused variables Signed-off-by: okkez <[email protected]> * Free lookup_key Signed-off-by: okkez <[email protected]> * Reset counter Signed-off-by: okkez <[email protected]> * Add more test for filter_geoip2 Signed-off-by: okkez <[email protected]> * Plug memory leaks reported by valgrind Signed-off-by: okkez <[email protected]> * Improve log messages Signed-off-by: okkez <[email protected]> * Move libmaxminddb-1.3.2 to plugins/filter_geoip/ Signed-off-by: okkez <[email protected]> * add cmake Co-authored-by: Takahiro YAMASHITA <[email protected]> Co-authored-by: DavidKorczynski <[email protected]> Co-authored-by: Wesley Pettit <[email protected]> Co-authored-by: Eduardo Silva <[email protected]> Co-authored-by: Jeff Luo <[email protected]> Co-authored-by: Fujimoto Seiji <[email protected]> Co-authored-by: Aleks <[email protected]> Co-authored-by: Zero King <[email protected]> Co-authored-by: turettn <[email protected]> Co-authored-by: ziheng <[email protected]> Co-authored-by: Matej Staroň <[email protected]> Co-authored-by: Jonathan Gonzalez V <[email protected]> Co-authored-by: Jonathan Gonzalez V <[email protected]> Co-authored-by: Martin Dojcak <[email protected]> Co-authored-by: Meghna Prabhu <[email protected]> Co-authored-by: Yen-Cheng Chou <[email protected]> Co-authored-by: Joey Paskhay <[email protected]> Co-authored-by: Yen-Cheng Chou <[email protected]> Co-authored-by: Dominik Rosiek <[email protected]> Co-authored-by: Drew Zhang <[email protected]> Co-authored-by: pugkung <[email protected]> Co-authored-by: Jeff Luo <[email protected]> Co-authored-by: Jie WU <[email protected]> Co-authored-by: Rayhan Hossain <[email protected]> Co-authored-by: Masoud Koleini <[email protected]> Co-authored-by: André <[email protected]> Co-authored-by: Ming <[email protected]> Co-authored-by: okkez <[email protected]> Co-authored-by: Vincent Auclair <[email protected]>

pierluigilenoci mentioned this issue Nov 6, 2019

Kiam logs break fluentbit uswitch/kiam#323

Open

edsiper added a commit that referenced this issue Sep 30, 2020

tests: internal: pack: add check for handling of duplicated JSON keys (…

e052741

…#1835 #1051) Signed-off-by: Eduardo Silva <[email protected]>

edsiper self-assigned this Sep 30, 2020

edsiper added enhancement fixed labels Sep 30, 2020

edsiper closed this as completed Sep 30, 2020

xmcqueen pushed a commit to xmcqueen/fluent-bit that referenced this issue Oct 6, 2020

tests: internal: pack: add check for handling of duplicated JSON keys (…

ba5cbc5

…fluent#1835 fluent#1051) Signed-off-by: Eduardo Silva <[email protected]>

sirwio pushed a commit to sirwio/fluent-bit that referenced this issue Oct 7, 2020

tests: internal: pack: add check for handling of duplicated JSON keys (…

236aa50

…fluent#1835 fluent#1051) Signed-off-by: Eduardo Silva <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated log entry fields when using Kubernetes filter with Merge_Log #1051

Duplicated log entry fields when using Kubernetes filter with Merge_Log #1051

fhoy commented Jan 23, 2019 •

edited

Loading

encircled commented Sep 12, 2019

edsiper commented Sep 30, 2020

Duplicated log entry fields when using Kubernetes filter with Merge_Log #1051

Duplicated log entry fields when using Kubernetes filter with Merge_Log #1051

Comments

fhoy commented Jan 23, 2019 • edited Loading

encircled commented Sep 12, 2019

edsiper commented Sep 30, 2020

fhoy commented Jan 23, 2019 •

edited

Loading