Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input forward: plugin doesn't close TCP port when it caught SIGTERM #2610

Closed
minamijoyo opened this issue Sep 29, 2020 · 31 comments
Closed

input forward: plugin doesn't close TCP port when it caught SIGTERM #2610

minamijoyo opened this issue Sep 29, 2020 · 31 comments

Comments

@minamijoyo
Copy link

Bug Report

Describe the bug

The input forward plugin doesn't close TCP port when it caught SIGTERM. This causes losing some logs on shutdown.

To Reproduce

  1. Run a fluent-bit container with Docker:
$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.5.6 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
  1. Run a nc command (port scan) container with Docker fluent log driver and forward its log to the fluent-bit container:
$ docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
  1. Stop the fluent-bit container:
$ date -u; time docker stop -t 10 flb; date -u

The outputs of command are as follows:

$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.5.6 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.5.6
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/09/29 00:09:55] [ info] Configuration:
[2020/09/29 00:09:55] [ info]  flush time     | 1.000000 seconds
[2020/09/29 00:09:55] [ info]  grace          | 5 seconds
[2020/09/29 00:09:55] [ info]  daemon         | 0
[2020/09/29 00:09:55] [ info] ___________
[2020/09/29 00:09:55] [ info]  inputs:
[2020/09/29 00:09:55] [ info]      forward
[2020/09/29 00:09:55] [ info] ___________
[2020/09/29 00:09:55] [ info]  filters:
[2020/09/29 00:09:55] [ info] ___________
[2020/09/29 00:09:55] [ info]  outputs:
[2020/09/29 00:09:55] [ info]      stdout.0
[2020/09/29 00:09:55] [ info] ___________
[2020/09/29 00:09:55] [ info]  collectors:
[2020/09/29 00:09:55] [ info] [engine] started (pid=1)
[2020/09/29 00:09:55] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/09/29 00:09:55] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/09/29 00:09:55] [ info] [storage] version=1.0.5, initializing...
[2020/09/29 00:09:55] [ info] [storage] in-memory
[2020/09/29 00:09:55] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/09/29 00:09:55] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/09/29 00:09:55] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2020/09/29 00:09:55] [debug] [router] default match rule forward.0:stdout.0
[2020/09/29 00:09:55] [ info] [sp] stream processor started
[0] e9ed8c030c0a: [1601338201.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"Tue Sep 29 00:10:01 UTC 2020
[1] e9ed8c030c0a: [1601338201.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/09/29 00:10:02] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:02] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
[0] e9ed8c030c0a: [1601338202.000000000, {"container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"Tue Sep 2", "container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b3e973"}]
[1] e9ed8c030c0a: [1601338202.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/09/29 00:10:03] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:03] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
", "container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b3e973", "container_name"=>"/laughing_feistel", "source"=>"stdout"}]
", "container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b3e973", "container_name"=>"/laughing_feistel", "source"=>"stdout"}]
[2020/09/29 00:10:04] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:04] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
[engine] caught signal (SIGTERM)
", "container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b3e973", "container_name"=>"/laughing_feistel", "source"=>"stdout"}]
[1] e9ed8c030c0a: [1601338204.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/09/29 00:10:04] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:04] [ warn] [engine] service will stop in 5 seconds
[2020/09/29 00:10:04] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
[0] e9ed8c030c0a: [1601338205.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"Tue Sep 29 00:10:05 UTC 2020
", "container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b3e973", "container_name"=>"/laughing_feistel"}]
[2020/09/29 00:10:06] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:06] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
[0] e9ed8c030c0a: [1601338206.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"Tue Sep 29 00:10:06 UTC 2020
[1] e9ed8c030c0a: [1601338206.000000000, {"container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"flb (172.", "container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b3e973"}]
[2020/09/29 00:10:07] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:07] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
[0] e9ed8c030c0a: [1601338207.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"Tue Sep 29 00:10:07 UTC 2020
[1] e9ed8c030c0a: [1601338207.000000000, {"container_id"=>"e9ed8c030c0af0df30f15bad9a5e62c49e78dfc581738c6ccf480457e0b"}]73", "container_name"=>"/laughing_feistel", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/09/29 00:10:08] [debug] [task] created task=0x7f4257a2f3a0 id=0 OK
[2020/09/29 00:10:08] [debug] [task] destroy task=0x7f4257a2f3a0 (task_id=0)
[2020/09/29 00:10:09] [ info] [engine] service stopped
$ docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
Tue Sep 29 00:10:01 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:02 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:03 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:04 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:05 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:06 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:07 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:08 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:10:09 UTC 2020
$ date -u; time docker stop -t 10 flb; date -u
Tue Sep 29 00:10:04 UTC 2020
flb
docker stop -t 10 flb  0.04s user 0.02s system 1% cpu 4.409 total
Tue Sep 29 00:10:09 UTC 2020

According to the above output,
the Docker tried to stop the fluent-bit container at 00:10:04.
The engine caught signal (SIGTERM) at 00:10:04, but some logs are still ingested and the timestamp of the last log entry is 1601338207(=2020/09/29 00:10:07) and the result of nc command shows the TCP port was still open.
The engine completed shutdown at 00:10:09 after grace period (5 sec)

I tested multiple times and I found its behavior seems to be inconsistent.
The input was sometimes paused (but TCP port was still open).

[0] cfedccd0310b: [1601337750.000000000, {"container_id"=>"cfedccd0310b0b4b747d4bd6b8362d8d009ff26b24cdd59d3b58faa3ea3"}]42", "container_name"=>"/vigilant_aryabhata", "source"=>"stdout", "log"=>"Tue Sep 29 00:02:30 UTC 2020
", "container_id"=>"cfedccd0310b0b4b747d4bd6b8362d8d009ff26b24cdd59d3b58faa3ea300b42", "container_name"=>"/vigilant_aryabhata", "source"=>"stdout"}]
[2020/09/29 00:02:30] [debug] [task] created task=0x7f8ee0e2f3a0 id=0 OK
[2020/09/29 00:02:30] [debug] [task] destroy task=0x7f8ee0e2f3a0 (task_id=0)
[engine] caught signal (SIGTERM)
[2020/09/29 00:02:30] [ warn] [engine] service will stop in 5 seconds
[2020/09/29 00:02:31] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:31] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:32] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:32] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:33] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:33] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:34] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:34] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:35] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:35] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/09/29 00:02:35] [ info] [engine] service stopped
Tue Sep 29 00:02:28 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:29 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:30 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:31 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:32 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:33 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:34 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:35 UTC 2020
flb (172.17.0.2:24224) open
Tue Sep 29 00:02:36 UTC 2020
nc: flb (172.17.0.2:24224): Connection timed out
Tue Sep 29 00:04:46 UTC 2020
$ date -u; time docker stop -t 10 flb; date -u
Tue Sep 29 00:02:30 UTC 2020
flb
docker stop -t 10 flb  0.04s user 0.02s system 1% cpu 5.190 total
Tue Sep 29 00:02:35 UTC 2020

Expected behavior

The input forward plugin should close TCP port whent it caught SIGTERM not to lose any logs on shutdown.

Your Environment

  • Version used: v1.5.6
  • Configuration: (passed with arguments described above the reproduction case)
  • Environment name and version (e.g. Kubernetes? What version?): AWS ECS Fargate (platform version: 1.4.0)
  • Server type and version: AWS
  • Operating System and version: Docker (fluent/fluent-bit:1.5.6)
  • Filters and plugins: input forward

Additional context

I'm running fluent-bit containers on AWS ECS Fargate behind AWS NLB and forwarding other container's log with the Docker fluentd log driver.
When I need to update my configuration for fluent-bit, I'm using the Blue-Green deployment strategy not to lose any logs during the deployment.
However, AWS NLB doesn't reset the existing TCP connection even if NLB's deregistration delay passed. The deregistration delay block a new connection, but it doesn't reset the existing one. If my understanding is correct, the grace period doesn't ensure to flush all buffers. This means that if fluent-bit ingest new log entries during shutdown they will be lost.

I don't think it only affects fluent-bit behind AWS NLB. I can reproduce a simple docker container setup described above.
Blocking input on shutdown consistently may be a minimum solution, but it would be great if the fluent-bit close TCP port on shutdown because it allows sender to detect quickly receiver no longer available.

Related issue
#1496

edsiper added a commit that referenced this issue Sep 29, 2020
@edsiper edsiper self-assigned this Sep 29, 2020
@edsiper
Copy link
Member

edsiper commented Sep 29, 2020

Fixed in f303cdb.

Binary now uses library mode so it can do graceful shutdown (in the grace period)

@edsiper edsiper closed this as completed Sep 29, 2020
@minamijoyo
Copy link
Author

minamijoyo commented Oct 1, 2020

@edsiper Thank you for the fix!
But I cannot confirm the fix was resolved the original issue. I assumed blocking the input during shutdown consistently would be the minimum solution, however I might misunderstand the behavior during blocking.

I tested with the latest master (13735a5) and I could see the input forward was paused consistently during shutdown. However when I started a new container after shutdown the blocked records during shutdown seem to be simply discarded.

Build a docker image from the latest master (13735a5)

[fluent-bit@master|✔]$ git rev-parse HEAD
13735a509906b3fd089cfb5df9e04f6322089e39

[fluent-bit@master|✔]$ docker version
Client: Docker Engine - Community
 Azure integration  0.1.7
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:41:33 2020
 OS/Arch:           darwin/amd64
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:49:27 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

[fluent-bit@master|✔]$ docker build -t fluent-bit:debug ./

Run a fluent-bit container:

[fluent-bit@master|✔]$ docker run -it --rm --name flb -p 24224:24224 fluent-bit:debug /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.6.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/10/01 07:05:28] [ info] Configuration:
[2020/10/01 07:05:28] [ info]  flush time     | 1.000000 seconds
[2020/10/01 07:05:28] [ info]  grace          | 5 seconds
[2020/10/01 07:05:28] [ info]  daemon         | 0
[2020/10/01 07:05:28] [ info] ___________
[2020/10/01 07:05:28] [ info]  inputs:
[2020/10/01 07:05:28] [ info]      forward
[2020/10/01 07:05:28] [ info] ___________
[2020/10/01 07:05:28] [ info]  filters:
[2020/10/01 07:05:28] [ info] ___________
[2020/10/01 07:05:28] [ info]  outputs:
[2020/10/01 07:05:28] [ info]      stdout.0
[2020/10/01 07:05:28] [ info] ___________
[2020/10/01 07:05:28] [ info]  collectors:
[2020/10/01 07:05:28] [ info] [engine] started (pid=1)
[2020/10/01 07:05:28] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/10/01 07:05:28] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/10/01 07:05:28] [ info] [storage] version=1.0.6, initializing...
[2020/10/01 07:05:28] [ info] [storage] in-memory
[2020/10/01 07:05:28] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/10/01 07:05:28] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/10/01 07:05:28] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2020/10/01 07:05:28] [debug] [router] default match rule forward.0:stdout.0
[2020/10/01 07:05:28] [ info] [sp] stream processor started
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan", "source"=>"stdout"}]
[2020/10/01 07:05:38] [debug] [task] created task=0x7fc98e44a0a0 id=0 OK
[1] 061221f1625e: [1601535938.000000000, {"container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d945"}]83", "container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/01 07:05:38] [debug] [task] destroy task=0x7fc98e44a0a0 (task_id=0)
[0] 061221f1625e: [1601535939.000000000, {"container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"Thu Oct  1 07:", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183"}]
[1] 061221f1625e: [1601535939.000000000, {"container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"flb (172.17.0.", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183"}]
[2020/10/01 07:05:39] [debug] [task] created task=0x7fc98e44a0a0 id=0 OK
[2020/10/01 07:05:39] [debug] [task] destroy task=0x7fc98e44a0a0 (task_id=0)
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan", "source"=>"stdout"}]
[1] 061221f1625e: [1601535940.000000000, {"container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d945"}]83", "container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/01 07:05:40] [debug] [task] created task=0x7fc98e44a0a0 id=0 OK
[2020/10/01 07:05:40] [debug] [task] destroy task=0x7fc98e44a0a0 (task_id=0)
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan"}]
[1] 061221f1625e: [1601535941.000000000, {"container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d945"}]83", "container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/01 07:05:41] [debug] [task] created task=0x7fc98e44a0a0 id=0 OK
[2020/10/01 07:05:41] [debug] [task] destroy task=0x7fc98e44a0a0 (task_id=0)
[0] 061221f1625e: [1601535942.000000000, {"container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d945"}]83", "container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"Thu Oct  1 07:05:42 UTC 2020
[1] 061221f1625e: [1601535942.000000000, {"container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"flb (172.17.0.", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183"}]
[2020/10/01 07:05:42] [debug] [task] created task=0x7fc98e44a0a0 id=0 OK
[2020/10/01 07:05:42] [debug] [task] destroy task=0x7fc98e44a0a0 (task_id=0)
[0] 061221f1625e: [1601535943.000000000, {"container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"Thu Oct  1 07:", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183"}]
[1] 061221f1625e: [1601535943.000000000, {"container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d945"}]83", "container_name"=>"/bold_dhawan", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/01 07:05:43] [debug] [task] created task=0x7fc98e44a0a0 id=0 OK
[2020/10/01 07:05:43] [debug] [task] destroy task=0x7fc98e44a0a0 (task_id=0)
[2020/10/01 07:05:43] [engine] caught signal (SIGTERM)
[2020/10/01 07:05:43] [ warn] [engine] service will stop in 5 seconds
[2020/10/01 07:05:44] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:44] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:45] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:45] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:46] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:46] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:47] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:47] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:48] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:48] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/01 07:05:48] [ info] [engine] service stopped

Run a port scan container:

[fluent-bit@:0580dc47|✔]$ docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
Thu Oct  1 07:05:38 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:39 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:40 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:41 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:42 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:43 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:44 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:45 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:46 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:47 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:48 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:49 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:50 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:51 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:52 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:53 UTC 2020
flb (172.17.0.2:24224) open
Thu Oct  1 07:05:54 UTC 2020
nc: flb (172.17.0.2:24224): No route to host
Thu Oct  1 07:07:02 UTC 2020
nc: flb (172.17.0.2:24224): No route to host
Thu Oct  1 07:07:06 UTC 2020
^Cpunt!
[fluent-bit@master|✔]$

Stop the fluent-bit container:

[fluent-bit@:0580dc47|✔]$ date -u; time docker stop -t 10 flb; date -u
Thu Oct  1 07:05:43 UTC 2020
flb
docker stop -t 10 flb  0.05s user 0.03s system 0% cpu 10.298 total
Thu Oct  1 07:05:54 UTC 2020

Start a new fluent-bit container:

[fluent-bit@master|✔]$ docker run -it --rm --name flb -p 24224:24224 fluent-bit:debug /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.6.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/10/01 07:08:22] [ info] Configuration:
[2020/10/01 07:08:22] [ info]  flush time     | 1.000000 seconds
[2020/10/01 07:08:22] [ info]  grace          | 5 seconds
[2020/10/01 07:08:22] [ info]  daemon         | 0
[2020/10/01 07:08:22] [ info] ___________
[2020/10/01 07:08:22] [ info]  inputs:
[2020/10/01 07:08:22] [ info]      forward
[2020/10/01 07:08:22] [ info] ___________
[2020/10/01 07:08:22] [ info]  filters:
[2020/10/01 07:08:22] [ info] ___________
[2020/10/01 07:08:22] [ info]  outputs:
[2020/10/01 07:08:22] [ info]      stdout.0
[2020/10/01 07:08:22] [ info] ___________
[2020/10/01 07:08:22] [ info]  collectors:
[2020/10/01 07:08:22] [ info] [engine] started (pid=1)
[2020/10/01 07:08:22] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/10/01 07:08:22] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/10/01 07:08:22] [ info] [storage] version=1.0.6, initializing...
[2020/10/01 07:08:22] [ info] [storage] in-memory
[2020/10/01 07:08:22] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/10/01 07:08:22] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/10/01 07:08:22] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2020/10/01 07:08:22] [debug] [router] default match rule forward.0:stdout.0
[2020/10/01 07:08:22] [ info] [sp] stream processor started
[2020/10/01 07:08:52] [debug] [task] created task=0x7f2c11e4a0a0 id=0 OK
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan", "source"=>"stdout"}]
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan", "source"=>"stdout"}]
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan", "source"=>"stdout"}]
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183", "container_name"=>"/bold_dhawan", "source"=>"stdout"}]
", "container_id"=>"061221f1625eb90bcee17ffb4fbb3a6b16d84da6005404799533e40d9455f183"}]stdout", "log"=>"^Cpunt!
[2020/10/01 07:08:52] [debug] [task] destroy task=0x7f2c11e4a0a0 (task_id=0)
^C[2020/10/01 07:09:46] [engine] caught signal (SIGINT)
[2020/10/01 07:09:46] [ warn] [engine] service will stop in 5 seconds
[2020/10/01 07:09:51] [ info] [engine] service stopped
[fluent-bit@master|✔]$

I expect the blocked records during shutdown to be retried to send by docker engine after restart, but they were actually dropped.
What do you think about it?

@edsiper edsiper reopened this Oct 1, 2020
@edsiper
Copy link
Member

edsiper commented Oct 1, 2020

troubleshooting...

edsiper added a commit that referenced this issue Oct 2, 2020
This patch adds some extra handling for service termination, so when
the engine paused ingestion, the server socket is close to avoid
having incoming connection and not-processed records.

This patch works when the service is not in a scenario with back-pressure,
so it can be considered a partial workaround.

There is an upcoming re-resign of server side API for scalability that
will address all these issues.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper
Copy link
Member

edsiper commented Oct 2, 2020

I've pushed a workaround for the issue: 4e0e687

The only downside is that if the plugin is under high-backpressure, the server socket won't be terminated before the grade period on shutdown/SIGTERM.

Note that after 1.6 release we will refactor the server side API to address this things.

@edsiper
Copy link
Member

edsiper commented Oct 2, 2020

for now can you provide some feedback on 4e0e687 ?

@minamijoyo
Copy link
Author

@edsiper thank you for the fix! I'll test it.

@minamijoyo
Copy link
Author

minamijoyo commented Oct 5, 2020

@edsiper I tested the 4e0e687, I could see the port was closed during shutdown, but some records were still dropped.

Build a docker image from 4e0e687 :

$ git checkout 4e0e687098836fd0cbe99944b13ac2ccb8058678
$ docker build -t fluent-bit:4e0e6870 ./

Run a fluent-bit container:

[fluent-bit@:4e0e6870|✔]$ docker run -it --rm --name flb -p 24224:24224 fluent-bit:4e0e6870 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.6.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/10/05 07:12:16] [ info] Configuration:
[2020/10/05 07:12:16] [ info]  flush time     | 1.000000 seconds
[2020/10/05 07:12:16] [ info]  grace          | 5 seconds
[2020/10/05 07:12:16] [ info]  daemon         | 0
[2020/10/05 07:12:16] [ info] ___________
[2020/10/05 07:12:16] [ info]  inputs:
[2020/10/05 07:12:16] [ info]      forward
[2020/10/05 07:12:16] [ info] ___________
[2020/10/05 07:12:16] [ info]  filters:
[2020/10/05 07:12:16] [ info] ___________
[2020/10/05 07:12:16] [ info]  outputs:
[2020/10/05 07:12:16] [ info]      stdout.0
[2020/10/05 07:12:16] [ info] ___________
[2020/10/05 07:12:16] [ info]  collectors:
[2020/10/05 07:12:16] [ info] [engine] started (pid=1)
[2020/10/05 07:12:16] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/10/05 07:12:16] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/10/05 07:12:16] [ info] [storage] version=1.0.6, initializing...
[2020/10/05 07:12:16] [ info] [storage] in-memory
[2020/10/05 07:12:16] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/10/05 07:12:16] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/10/05 07:12:16] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2020/10/05 07:12:16] [debug] [router] default match rule forward.0:stdout.0
[2020/10/05 07:12:16] [ info] [sp] stream processor started
[2020/10/05 07:12:25] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis", "source"=>"stdout"}]
[1] da81522ed489: [1601881945.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/05 07:12:25] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
[0] da81522ed489: [1601881946.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:12:26 UTC 2020
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis"}]
[2020/10/05 07:12:26] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:26] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis", "source"=>"stdout"}]
[1] da81522ed489: [1601881947.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/05 07:12:27] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:27] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
[0] da81522ed489: [1601881948.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:12:28 UTC 2020
[1] da81522ed489: [1601881948.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/05 07:12:28] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:28] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis"}]
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis"}]
[2020/10/05 07:12:29] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:29] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
[0] da81522ed489: [1601881950.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:12:30 UTC 2020
[1] da81522ed489: [1601881950.000000000, {"container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844"}]
[2020/10/05 07:12:30] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:30] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis", "source"=>"stdout"}]
[1] da81522ed489: [1601881951.000000000, {"container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844"}]
[2020/10/05 07:12:31] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:31] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
[2020/10/05 07:12:32] [engine] caught signal (SIGTERM)
[2020/10/05 07:12:32] [ info] [input] pausing forward.0
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis"}]
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis", "source"=>"stdout"}]
[2020/10/05 07:12:32] [debug] [task] created task=0x7ffa2ae4d0a0 id=0 OK
[2020/10/05 07:12:32] [ warn] [engine] service will stop in 5 seconds
[2020/10/05 07:12:32] [debug] [task] destroy task=0x7ffa2ae4d0a0 (task_id=0)
[2020/10/05 07:12:33] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/05 07:12:34] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/05 07:12:35] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/05 07:12:36] [debug] [input chunk] forward.0 is paused, cannot append records
[2020/10/05 07:12:36] [ info] [engine] service stopped

Run a port scan container:

$ docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --log-opt fluentd-async-connect="true" --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
Mon Oct  5 07:12:25 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:26 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:27 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:28 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:29 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:30 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:31 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:32 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:33 UTC 2020
Mon Oct  5 07:12:34 UTC 2020
Mon Oct  5 07:12:35 UTC 2020
Mon Oct  5 07:12:36 UTC 2020
Mon Oct  5 07:12:37 UTC 2020
Mon Oct  5 07:12:38 UTC 2020
Mon Oct  5 07:12:39 UTC 2020
Mon Oct  5 07:12:40 UTC 2020
Mon Oct  5 07:12:41 UTC 2020
Mon Oct  5 07:12:42 UTC 2020
Mon Oct  5 07:12:43 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:51 UTC 2020
flb (172.17.0.2:24224) open
Mon Oct  5 07:12:52 UTC 2020
flb (172.17.0.2:24224) open

Stop the fluent-bit container:

[fluent-bit@:4e0e6870|✔]$ date -u; time docker stop -t 10 flb; date -u
Mon Oct  5 07:12:58 UTC 2020
flb
docker stop -t 10 flb  0.04s user 0.02s system 0% cpu 10.325 total
Mon Oct  5 07:13:08 UTC 2020

Start a new fluent-bit container:

[fluent-bit@:4e0e6870|✔]$ docker run -it --rm --name flb -p 24224:24224 fluent-bit:4e0e6870 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.6.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/10/05 07:12:49] [ info] Configuration:
[2020/10/05 07:12:49] [ info]  flush time     | 1.000000 seconds
[2020/10/05 07:12:49] [ info]  grace          | 5 seconds
[2020/10/05 07:12:49] [ info]  daemon         | 0
[2020/10/05 07:12:49] [ info] ___________
[2020/10/05 07:12:49] [ info]  inputs:
[2020/10/05 07:12:49] [ info]      forward
[2020/10/05 07:12:49] [ info] ___________
[2020/10/05 07:12:49] [ info]  filters:
[2020/10/05 07:12:49] [ info] ___________
[2020/10/05 07:12:49] [ info]  outputs:
[2020/10/05 07:12:49] [ info]      stdout.0
[2020/10/05 07:12:49] [ info] ___________
[2020/10/05 07:12:49] [ info]  collectors:
[2020/10/05 07:12:49] [ info] [engine] started (pid=1)
[2020/10/05 07:12:49] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/10/05 07:12:49] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/10/05 07:12:49] [ info] [storage] version=1.0.6, initializing...
[2020/10/05 07:12:49] [ info] [storage] in-memory
[2020/10/05 07:12:49] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/10/05 07:12:49] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/10/05 07:12:49] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2020/10/05 07:12:49] [debug] [router] default match rule forward.0:stdout.0
[2020/10/05 07:12:49] [ info] [sp] stream processor started
[0] da81522ed489: [1601881963.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:12:43 UTC 2020
", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844", "container_name"=>"/busy_ellis"}]
[2020/10/05 07:12:50] [debug] [task] created task=0x7fc661e4d0a0 id=0 OK
[2020/10/05 07:12:50] [debug] [task] destroy task=0x7fc661e4d0a0 (task_id=0)
[0] da81522ed489: [1601881971.000000000, {"container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:1", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844"}]
[1] da81522ed489: [1601881971.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2020/10/05 07:12:51] [debug] [task] created task=0x7fc661e4d0a0 id=0 OK
[2020/10/05 07:12:51] [debug] [task] destroy task=0x7fc661e4d0a0 (task_id=0)
[0] da81522ed489: [1601881972.000000000, {"container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:1", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844"}]
[1] da81522ed489: [1601881972.000000000, {"container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844"}]
[2020/10/05 07:12:52] [debug] [task] created task=0x7fc661e4d0a0 id=0 OK
[2020/10/05 07:12:52] [debug] [task] destroy task=0x7fc661e4d0a0 (task_id=0)
[0] da81522ed489: [1601881973.000000000, {"container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"Mon Oct  5 07:1", "container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6ad6844"}]
[1] da81522ed489: [1601881973.000000000, {"container_id"=>"da81522ed489d3602dc56e27b5d5dd3aa881d1961328ae25fd7d0943d6a"}]44", "container_name"=>"/busy_ellis", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open

The engine caught SIGTERM at 07:12:32 and the result of port scan was not open during shutdown. However, the first timestamp after restart was 1601881963 (Mon Oct 5 07:12:43 UTC 2020).

I'm reading the implementation of Docker's fluentd log driver. If my understanding is correct, it seems to check an error returned by net.Conn.Write().

https://github.com/moby/moby/blob/v19.03.12/daemon/logger/fluentd/fluentd.go#L175-L177
https://github.com/fluent/fluent-logger-golang/blob/v1.4.0/fluent/fluent.go#L384

I assumed it would retry if the connection was closed. I'm not sure why this doesn't work as intended.

A good news: It will check the response if Ack mode is enabled.
A bad news: Unfortunately, the fluentd-request-ack feature is part of Docker CE 20.03, which has not been released yet for now.
moby/moby#39086

Even though without the fluentd-request-ack feature, I think closing the connection should cause an error. What am I misunderstanding?

@minamijoyo
Copy link
Author

I captured packets with tcpdump to figure out what's happened here.

$ echo 'FROM alpine\nRUN apk add --no-cache tcpdump' | docker build -t debug -f - . && docker run -it --rm --net container:flb debug tcpdump -nn -X port 24224

In the below test case, SIGTERM was sent at 08:24:22:

[2020/10/05 08:24:22] [engine] caught signal (SIGTERM)

tcpdump:

08:24:20.412946 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 2905:3268, ack 1, win 229, options [nop,nop,TS val 2530955343 ecr 992119111], length 363
        0x0000:  4500 019f 75f8 4000 4006 6b3b ac11 0001  E...u.@[email protected];....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 1d76 4072 6d58  .....~^..[.v@rmX
        0x0020:  8018 00e5 59b7 0000 0101 080a 96db 504f  ....Y.........PO
        0x0030:  3b22 8947 94ac 6565 3862 3266 3439 3939  ;".G..ee8b2f4999
        0x0040:  6232 d25f 7ad8 3484 ae63 6f6e 7461 696e  b2._z.4..contain
        0x0050:  6572 5f6e 616d 65b2 2f6d 6167 6963 616c  er_name./magical
        0x0060:  5f61 6c62 6174 7461 6e69 a673 6f75 7263  _albattani.sourc
        0x0070:  65a6 7374 646f 7574 a36c 6f67 bd4d 6f6e  e.stdout.log.Mon
        0x0080:  204f 6374 2020 3520 3038 3a32 343a 3230  .Oct..5.08:24:20
        0x0090:  2055 5443 2032 3032 300d ac63 6f6e 7461  .UTC.2020..conta
        0x00a0:  696e 6572 5f69 64d9 4065 6538 6232 6634  iner_id.@ee8b2f4
        0x00b0:  3939 3962 3239 6236 6232 3061 3034 6265  999b29b6b20a04be
        0x00c0:  6462 3437 6165 6131 3330 3665 3962 3865  db47aea1306e9b8e
        0x00d0:  3462 3362 6439 3333 6161 3031 3062 3732  4b3bd933aa010b72
        0x00e0:  3866 3465 6133 6664 6180 94ac 6565 3862  8f4ea3fda...ee8b
        0x00f0:  3266 3439 3939 6232 d25f 7ad8 3484 ac63  2f4999b2._z.4..c
        0x0100:  6f6e 7461 696e 6572 5f69 64d9 4065 6538  ontainer_id.@ee8
        0x0110:  6232 6634 3939 3962 3239 6236 6232 3061  b2f4999b29b6b20a
        0x0120:  3034 6265 6462 3437 6165 6131 3330 3665  04bedb47aea1306e
        0x0130:  3962 3865 3462 3362 6439 3333 6161 3031  9b8e4b3bd933aa01
        0x0140:  3062 3732 3866 3465 6133 6664 61ae 636f  0b728f4ea3fda.co
        0x0150:  6e74 6169 6e65 725f 6e61 6d65 b22f 6d61  ntainer_name./ma
        0x0160:  6769 6361 6c5f 616c 6261 7474 616e 69a6  gical_albattani.
        0x0170:  736f 7572 6365 a673 7464 6f75 74a3 6c6f  source.stdout.lo
        0x0180:  67bc 666c 6220 2831 3732 2e31 372e 302e  g.flb.(172.17.0.
        0x0190:  323a 3234 3232 3429 206f 7065 6e0d 80    2:24224).open..
08:24:20.412993 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 3268, win 361, options [nop,nop,TS val 992120121 ecr 2530955343], length 0
        0x0000:  4500 0034 0ab1 4000 4006 d7ed ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 1ee1  ....^..~@rmX.[..
        0x0020:  8010 0169 584c 0000 0101 080a 3b22 8d39  ...iXL......;".9
        0x0030:  96db 504f                                ..PO
08:24:21.414651 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 3268:3450, ack 1, win 229, options [nop,nop,TS val 2530956344 ecr 992120121], length 182
        0x0000:  4500 00ea 75f9 4000 4006 6bef ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 1ee1 4072 6d58  .....~^..[..@rmX
        0x0020:  8018 00e5 5902 0000 0101 080a 96db 5438  ....Y.........T8
        0x0030:  3b22 8d39 94ac 6565 3862 3266 3439 3939  ;".9..ee8b2f4999
        0x0040:  6232 d25f 7ad8 3584 ac63 6f6e 7461 696e  b2._z.5..contain
        0x0050:  6572 5f69 64d9 4065 6538 6232 6634 3939  er_id.@ee8b2f499
        0x0060:  3962 3239 6236 6232 3061 3034 6265 6462  9b29b6b20a04bedb
        0x0070:  3437 6165 6131 3330 3665 3962 3865 3462  47aea1306e9b8e4b
        0x0080:  3362 6439 3333 6161 3031 3062 3732 3866  3bd933aa010b728f
        0x0090:  3465 6133 6664 61ae 636f 6e74 6169 6e65  4ea3fda.containe
        0x00a0:  725f 6e61 6d65 b22f 6d61 6769 6361 6c5f  r_name./magical_
        0x00b0:  616c 6261 7474 616e 69a6 736f 7572 6365  albattani.source
        0x00c0:  a673 7464 6f75 74a3 6c6f 67bd 4d6f 6e20  .stdout.log.Mon.
        0x00d0:  4f63 7420 2035 2030 383a 3234 3a32 3120  Oct..5.08:24:21.
        0x00e0:  5554 4320 3230 3230 0d80                 UTC.2020..
08:24:21.414697 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 3450, win 369, options [nop,nop,TS val 992121122 ecr 2530956344], length 0
        0x0000:  4500 0034 0ab2 4000 4006 d7ec ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 1f97  ....^..~@rmX.[..
        0x0020:  8010 0171 584c 0000 0101 080a 3b22 9122  ...qXL......;"."
        0x0030:  96db 5438                                ..T8
08:24:21.414760 IP 172.17.0.3.45137 > 172.17.0.2.24224: Flags [S], seq 3055193332, win 29200, options [mss 1460,sackOK,TS val 2770144932 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c 07aa 4000 4006 daea ac11 0003  E..<..@.@.......
        0x0010:  ac11 0002 b051 5ea0 b61a 8cf4 0000 0000  .....Q^.........
        0x0020:  a002 7210 5856 0000 0204 05b4 0402 080a  ..r.XV..........
        0x0030:  a51d 0ea4 0000 0000 0103 0307            ............
08:24:21.414795 IP 172.17.0.2.24224 > 172.17.0.3.45137: Flags [S.], seq 4150699300, ack 3055193333, win 28960, options [mss 1460,sackOK,TS val 627047421 ecr 2770144932,nop,wscale 7], length 0
        0x0000:  4500 003c 0000 4000 4006 e294 ac11 0002  E..<..@.@.......
        0x0010:  ac11 0003 5ea0 b051 f766 a524 b61a 8cf5  ....^..Q.f.$....
        0x0020:  a012 7120 5856 0000 0204 05b4 0402 080a  ..q.XV..........
        0x0030:  255f fbfd a51d 0ea4 0103 0307            %_..........
08:24:21.414852 IP 172.17.0.3.45137 > 172.17.0.2.24224: Flags [.], ack 1, win 229, options [nop,nop,TS val 2770144932 ecr 627047421], length 0
        0x0000:  4500 0034 07ab 4000 4006 daf1 ac11 0003  E..4..@.@.......
        0x0010:  ac11 0002 b051 5ea0 b61a 8cf5 f766 a525  .....Q^......f.%
        0x0020:  8010 00e5 584e 0000 0101 080a a51d 0ea4  ....XN..........
        0x0030:  255f fbfd                                %_..
08:24:21.415041 IP 172.17.0.3.45137 > 172.17.0.2.24224: Flags [F.], seq 1, ack 1, win 229, options [nop,nop,TS val 2770144933 ecr 627047421], length 0
        0x0000:  4500 0034 07ac 4000 4006 daf0 ac11 0003  E..4..@.@.......
        0x0010:  ac11 0002 b051 5ea0 b61a 8cf5 f766 a525  .....Q^......f.%
        0x0020:  8011 00e5 584e 0000 0101 080a a51d 0ea5  ....XN..........
        0x0030:  255f fbfd                                %_..
08:24:21.415189 IP 172.17.0.2.24224 > 172.17.0.3.45137: Flags [F.], seq 1, ack 2, win 227, options [nop,nop,TS val 627047422 ecr 2770144933], length 0
        0x0000:  4500 0034 2e7b 4000 4006 b421 ac11 0002  E..4.{@.@..!....
        0x0010:  ac11 0003 5ea0 b051 f766 a525 b61a 8cf6  ....^..Q.f.%....
        0x0020:  8011 00e3 584e 0000 0101 080a 255f fbfe  ....XN......%_..
        0x0030:  a51d 0ea5                                ....
08:24:21.415265 IP 172.17.0.3.45137 > 172.17.0.2.24224: Flags [.], ack 2, win 229, options [nop,nop,TS val 2770144933 ecr 627047422], length 0
        0x0000:  4500 0034 07ad 4000 4006 daef ac11 0003  E..4..@.@.......
        0x0010:  ac11 0002 b051 5ea0 b61a 8cf6 f766 a526  .....Q^......f.&
        0x0020:  8010 00e5 584e 0000 0101 080a a51d 0ea5  ....XN..........
        0x0030:  255f fbfe                                %_..
08:24:21.417224 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 3450:3631, ack 1, win 229, options [nop,nop,TS val 2530956347 ecr 992121122], length 181
        0x0000:  4500 00e9 75fa 4000 4006 6bef ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 1f97 4072 6d58  .....~^..[..@rmX
        0x0020:  8018 00e5 5901 0000 0101 080a 96db 543b  ....Y.........T;
        0x0030:  3b22 9122 94ac 6565 3862 3266 3439 3939  ;"."..ee8b2f4999
        0x0040:  6232 d25f 7ad8 3584 ae63 6f6e 7461 696e  b2._z.5..contain
        0x0050:  6572 5f6e 616d 65b2 2f6d 6167 6963 616c  er_name./magical
        0x0060:  5f61 6c62 6174 7461 6e69 a673 6f75 7263  _albattani.sourc
        0x0070:  65a6 7374 646f 7574 a36c 6f67 bc66 6c62  e.stdout.log.flb
        0x0080:  2028 3137 322e 3137 2e30 2e32 3a32 3432  .(172.17.0.2:242
        0x0090:  3234 2920 6f70 656e 0dac 636f 6e74 6169  24).open..contai
        0x00a0:  6e65 725f 6964 d940 6565 3862 3266 3439  ner_id.@ee8b2f49
        0x00b0:  3939 6232 3962 3662 3230 6130 3462 6564  99b29b6b20a04bed
        0x00c0:  6234 3761 6561 3133 3036 6539 6238 6534  b47aea1306e9b8e4
        0x00d0:  6233 6264 3933 3361 6130 3130 6237 3238  b3bd933aa010b728
        0x00e0:  6634 6561 3366 6461 80                   f4ea3fda.
08:24:21.417344 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 3631, win 377, options [nop,nop,TS val 992121125 ecr 2530956347], length 0
        0x0000:  4500 0034 0ab3 4000 4006 d7eb ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 204c  ....^..~@rmX.[.L
        0x0020:  8010 0179 584c 0000 0101 080a 3b22 9125  ...yXL......;".%
        0x0030:  96db 543b                                ..T;
08:24:22.418717 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 3631:3813, ack 1, win 229, options [nop,nop,TS val 2530957348 ecr 992121125], length 182
        0x0000:  4500 00ea 75fb 4000 4006 6bed ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 204c 4072 6d58  .....~^..[.L@rmX
        0x0020:  8018 00e5 5902 0000 0101 080a 96db 5824  ....Y.........X$
        0x0030:  3b22 9125 94ac 6565 3862 3266 3439 3939  ;".%..ee8b2f4999
        0x0040:  6232 d25f 7ad8 3684 ac63 6f6e 7461 696e  b2._z.6..contain
        0x0050:  6572 5f69 64d9 4065 6538 6232 6634 3939  er_id.@ee8b2f499
        0x0060:  3962 3239 6236 6232 3061 3034 6265 6462  9b29b6b20a04bedb
        0x0070:  3437 6165 6131 3330 3665 3962 3865 3462  47aea1306e9b8e4b
        0x0080:  3362 6439 3333 6161 3031 3062 3732 3866  3bd933aa010b728f
        0x0090:  3465 6133 6664 61ae 636f 6e74 6169 6e65  4ea3fda.containe
        0x00a0:  725f 6e61 6d65 b22f 6d61 6769 6361 6c5f  r_name./magical_
        0x00b0:  616c 6261 7474 616e 69a6 736f 7572 6365  albattani.source
        0x00c0:  a673 7464 6f75 74a3 6c6f 67bd 4d6f 6e20  .stdout.log.Mon.
        0x00d0:  4f63 7420 2035 2030 383a 3234 3a32 3220  Oct..5.08:24:22.
        0x00e0:  5554 4320 3230 3230 0d80                 UTC.2020..
08:24:22.418794 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 3813, win 386, options [nop,nop,TS val 992122126 ecr 2530957348], length 0
        0x0000:  4500 0034 0ab4 4000 4006 d7ea ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 2102  ....^..~@rmX.[!.
        0x0020:  8010 0182 584c 0000 0101 080a 3b22 950e  ....XL......;"..
        0x0030:  96db 5824                                ..X$
08:24:22.419107 IP 172.17.0.3.42625 > 172.17.0.2.24224: Flags [S], seq 1200225671, win 29200, options [mss 1460,sackOK,TS val 2770145937 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c 72cd 4000 4006 6fc7 ac11 0003  E..<r.@[email protected].....
        0x0010:  ac11 0002 a681 5ea0 4789 fd87 0000 0000  ......^.G.......
        0x0020:  a002 7210 5856 0000 0204 05b4 0402 080a  ..r.XV..........
        0x0030:  a51d 1291 0000 0000 0103 0307            ............
08:24:22.419168 IP 172.17.0.2.24224 > 172.17.0.3.42625: Flags [S.], seq 3907279555, ack 1200225672, win 28960, options [mss 1460,sackOK,TS val 627048426 ecr 2770145937,nop,wscale 7], length 0
        0x0000:  4500 003c 0000 4000 4006 e294 ac11 0002  E..<..@.@.......
        0x0010:  ac11 0003 5ea0 a681 e8e4 5ac3 4789 fd88  ....^.....Z.G...
        0x0020:  a012 7120 5856 0000 0204 05b4 0402 080a  ..q.XV..........
        0x0030:  255f ffea a51d 1291 0103 0307            %_..........
08:24:22.419210 IP 172.17.0.3.42625 > 172.17.0.2.24224: Flags [.], ack 1, win 229, options [nop,nop,TS val 2770145937 ecr 627048426], length 0
        0x0000:  4500 0034 72ce 4000 4006 6fce ac11 0003  E..4r.@[email protected].....
        0x0010:  ac11 0002 a681 5ea0 4789 fd88 e8e4 5ac4  ......^.G.....Z.
        0x0020:  8010 00e5 584e 0000 0101 080a a51d 1291  ....XN..........
        0x0030:  255f ffea                                %_..
08:24:22.419370 IP 172.17.0.3.42625 > 172.17.0.2.24224: Flags [F.], seq 1, ack 1, win 229, options [nop,nop,TS val 2770145937 ecr 627048426], length 0
        0x0000:  4500 0034 72cf 4000 4006 6fcd ac11 0003  E..4r.@[email protected].....
        0x0010:  ac11 0002 a681 5ea0 4789 fd88 e8e4 5ac4  ......^.G.....Z.
        0x0020:  8011 00e5 584e 0000 0101 080a a51d 1291  ....XN..........
        0x0030:  255f ffea                                %_..
08:24:22.419567 IP 172.17.0.2.24224 > 172.17.0.3.42625: Flags [.], ack 2, win 227, options [nop,nop,TS val 627048426 ecr 2770145937], length 0
        0x0000:  4500 0034 d58e 4000 4006 0d0e ac11 0002  E..4..@.@.......
        0x0010:  ac11 0003 5ea0 a681 e8e4 5ac4 4789 fd89  ....^.....Z.G...
        0x0020:  8010 00e3 584e 0000 0101 080a 255f ffea  ....XN......%_..
        0x0030:  a51d 1291                                ....
08:24:22.419631 IP 172.17.0.2.24224 > 172.17.0.3.42625: Flags [F.], seq 1, ack 2, win 227, options [nop,nop,TS val 627048426 ecr 2770145937], length 0
        0x0000:  4500 0034 d58f 4000 4006 0d0d ac11 0002  E..4..@.@.......
        0x0010:  ac11 0003 5ea0 a681 e8e4 5ac4 4789 fd89  ....^.....Z.G...
        0x0020:  8011 00e3 584e 0000 0101 080a 255f ffea  ....XN......%_..
        0x0030:  a51d 1291                                ....
08:24:22.419714 IP 172.17.0.3.42625 > 172.17.0.2.24224: Flags [.], ack 2, win 229, options [nop,nop,TS val 2770145937 ecr 627048426], length 0
        0x0000:  4500 0034 0000 4000 4006 e29c ac11 0003  E..4..@.@.......
        0x0010:  ac11 0002 a681 5ea0 4789 fd89 e8e4 5ac5  ......^.G.....Z.
        0x0020:  8010 00e5 b2d8 0000 0101 080a a51d 1291  ................
        0x0030:  255f ffea                                %_..
08:24:22.427911 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 3813:3994, ack 1, win 229, options [nop,nop,TS val 2530957358 ecr 992122126], length 181
        0x0000:  4500 00e9 75fc 4000 4006 6bed ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 2102 4072 6d58  .....~^..[!.@rmX
        0x0020:  8018 00e5 5901 0000 0101 080a 96db 582e  ....Y.........X.
        0x0030:  3b22 950e 94ac 6565 3862 3266 3439 3939  ;"....ee8b2f4999
        0x0040:  6232 d25f 7ad8 3684 ac63 6f6e 7461 696e  b2._z.6..contain
        0x0050:  6572 5f69 64d9 4065 6538 6232 6634 3939  er_id.@ee8b2f499
        0x0060:  3962 3239 6236 6232 3061 3034 6265 6462  9b29b6b20a04bedb
        0x0070:  3437 6165 6131 3330 3665 3962 3865 3462  47aea1306e9b8e4b
        0x0080:  3362 6439 3333 6161 3031 3062 3732 3866  3bd933aa010b728f
        0x0090:  3465 6133 6664 61ae 636f 6e74 6169 6e65  4ea3fda.containe
        0x00a0:  725f 6e61 6d65 b22f 6d61 6769 6361 6c5f  r_name./magical_
        0x00b0:  616c 6261 7474 616e 69a6 736f 7572 6365  albattani.source
        0x00c0:  a673 7464 6f75 74a3 6c6f 67bc 666c 6220  .stdout.log.flb.
        0x00d0:  2831 3732 2e31 372e 302e 323a 3234 3232  (172.17.0.2:2422
        0x00e0:  3429 206f 7065 6e0d 80                   4).open..
08:24:22.427973 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 3994, win 394, options [nop,nop,TS val 992122136 ecr 2530957358], length 0
        0x0000:  4500 0034 0ab5 4000 4006 d7e9 ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 21b7  ....^..~@rmX.[!.
        0x0020:  8010 018a 584c 0000 0101 080a 3b22 9518  ....XL......;"..
        0x0030:  96db 582e                                ..X.
08:24:23.422948 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 3994:4176, ack 1, win 229, options [nop,nop,TS val 2530958353 ecr 992122136], length 182
        0x0000:  4500 00ea 75fd 4000 4006 6beb ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 21b7 4072 6d58  .....~^..[!.@rmX
        0x0020:  8018 00e5 5902 0000 0101 080a 96db 5c11  ....Y.........\.
        0x0030:  3b22 9518 94ac 6565 3862 3266 3439 3939  ;"....ee8b2f4999
        0x0040:  6232 d25f 7ad8 3784 a36c 6f67 bd4d 6f6e  b2._z.7..log.Mon
        0x0050:  204f 6374 2020 3520 3038 3a32 343a 3233  .Oct..5.08:24:23
        0x0060:  2055 5443 2032 3032 300d ac63 6f6e 7461  .UTC.2020..conta
        0x0070:  696e 6572 5f69 64d9 4065 6538 6232 6634  iner_id.@ee8b2f4
        0x0080:  3939 3962 3239 6236 6232 3061 3034 6265  999b29b6b20a04be
        0x0090:  6462 3437 6165 6131 3330 3665 3962 3865  db47aea1306e9b8e
        0x00a0:  3462 3362 6439 3333 6161 3031 3062 3732  4b3bd933aa010b72
        0x00b0:  3866 3465 6133 6664 61ae 636f 6e74 6169  8f4ea3fda.contai
        0x00c0:  6e65 725f 6e61 6d65 b22f 6d61 6769 6361  ner_name./magica
        0x00d0:  6c5f 616c 6261 7474 616e 69a6 736f 7572  l_albattani.sour
        0x00e0:  6365 a673 7464 6f75 7480                 ce.stdout.
08:24:23.423002 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 4176, win 403, options [nop,nop,TS val 992123131 ecr 2530958353], length 0
        0x0000:  4500 0034 0ab6 4000 4006 d7e8 ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 226d  ....^..~@rmX.["m
        0x0020:  8010 0193 584c 0000 0101 080a 3b22 98fb  ....XL......;"..
        0x0030:  96db 5c11                                ..\.
08:24:23.424909 IP 172.17.0.3.41581 > 172.17.0.2.24224: Flags [S], seq 781733910, win 29200, options [mss 1460,sackOK,TS val 2770146942 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c c321 4000 4006 1f73 ac11 0003  E..<.!@[email protected]....
        0x0010:  ac11 0002 a26d 5ea0 2e98 5016 0000 0000  .....m^...P.....
        0x0020:  a002 7210 5856 0000 0204 05b4 0402 080a  ..r.XV..........
        0x0030:  a51d 167e 0000 0000 0103 0307            ...~........
08:24:23.425216 IP 172.17.0.2.24224 > 172.17.0.3.41581: Flags [R.], seq 0, ack 781733911, win 0, length 0
        0x0000:  4500 0028 0000 4000 4006 e2a8 ac11 0002  E..(..@.@.......
        0x0010:  ac11 0003 5ea0 a26d 0000 0000 2e98 5017  ....^..m......P.
        0x0020:  5014 0000 d7eb 0000                      P.......
08:24:24.430950 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 4176:4358, ack 1, win 229, options [nop,nop,TS val 2530959361 ecr 992123131], length 182
        0x0000:  4500 00ea 75fe 4000 4006 6bea ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 226d 4072 6d58  .....~^..["m@rmX
        0x0020:  8018 00e5 5902 0000 0101 080a 96db 6001  ....Y.........`.
        0x0030:  3b22 98fb 94ac 6565 3862 3266 3439 3939  ;"....ee8b2f4999
        0x0040:  6232 d25f 7ad8 3884 ae63 6f6e 7461 696e  b2._z.8..contain
        0x0050:  6572 5f6e 616d 65b2 2f6d 6167 6963 616c  er_name./magical
        0x0060:  5f61 6c62 6174 7461 6e69 a673 6f75 7263  _albattani.sourc
        0x0070:  65a6 7374 646f 7574 a36c 6f67 bd4d 6f6e  e.stdout.log.Mon
        0x0080:  204f 6374 2020 3520 3038 3a32 343a 3234  .Oct..5.08:24:24
        0x0090:  2055 5443 2032 3032 300d ac63 6f6e 7461  .UTC.2020..conta
        0x00a0:  696e 6572 5f69 64d9 4065 6538 6232 6634  iner_id.@ee8b2f4
        0x00b0:  3939 3962 3239 6236 6232 3061 3034 6265  999b29b6b20a04be
        0x00c0:  6462 3437 6165 6131 3330 3665 3962 3865  db47aea1306e9b8e
        0x00d0:  3462 3362 6439 3333 6161 3031 3062 3732  4b3bd933aa010b72
        0x00e0:  3866 3465 6133 6664 6180                 8f4ea3fda.
08:24:24.431012 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 4358, win 411, options [nop,nop,TS val 992124139 ecr 2530959361], length 0
        0x0000:  4500 0034 0ab7 4000 4006 d7e7 ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 2323  ....^..~@rmX.[##
        0x0020:  8010 019b 584c 0000 0101 080a 3b22 9ceb  ....XL......;"..
        0x0030:  96db 6001                                ..`.
08:24:24.431077 IP 172.17.0.3.38779 > 172.17.0.2.24224: Flags [S], seq 1660570891, win 29200, options [mss 1460,sackOK,TS val 2770147949 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c da2d 4000 4006 0867 ac11 0003  E..<.-@[email protected]....
        0x0010:  ac11 0002 977b 5ea0 62fa 4d0b 0000 0000  .....{^.b.M.....
        0x0020:  a002 7210 5856 0000 0204 05b4 0402 080a  ..r.XV..........
        0x0030:  a51d 1a6d 0000 0000 0103 0307            ...m........
08:24:24.431132 IP 172.17.0.2.24224 > 172.17.0.3.38779: Flags [R.], seq 0, ack 1660570892, win 0, length 0
        0x0000:  4500 0028 0000 4000 4006 e2a8 ac11 0002  E..(..@.@.......
        0x0010:  ac11 0003 5ea0 977b 0000 0000 62fa 4d0c  ....^..{....b.M.
        0x0020:  5014 0000 b186 0000                      P.......
08:24:25.434570 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 4358:4540, ack 1, win 229, options [nop,nop,TS val 2530960364 ecr 992124139], length 182
        0x0000:  4500 00ea 75ff 4000 4006 6be9 ac11 0001  E...u.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 2323 4072 6d58  .....~^..[##@rmX
        0x0020:  8018 00e5 5902 0000 0101 080a 96db 63ec  ....Y.........c.
        0x0030:  3b22 9ceb 94ac 6565 3862 3266 3439 3939  ;"....ee8b2f4999
        0x0040:  6232 d25f 7ad8 3984 a36c 6f67 bd4d 6f6e  b2._z.9..log.Mon
        0x0050:  204f 6374 2020 3520 3038 3a32 343a 3235  .Oct..5.08:24:25
        0x0060:  2055 5443 2032 3032 300d ac63 6f6e 7461  .UTC.2020..conta
        0x0070:  696e 6572 5f69 64d9 4065 6538 6232 6634  iner_id.@ee8b2f4
        0x0080:  3939 3962 3239 6236 6232 3061 3034 6265  999b29b6b20a04be
        0x0090:  6462 3437 6165 6131 3330 3665 3962 3865  db47aea1306e9b8e
        0x00a0:  3462 3362 6439 3333 6161 3031 3062 3732  4b3bd933aa010b72
        0x00b0:  3866 3465 6133 6664 61ae 636f 6e74 6169  8f4ea3fda.contai
        0x00c0:  6e65 725f 6e61 6d65 b22f 6d61 6769 6361  ner_name./magica
        0x00d0:  6c5f 616c 6261 7474 616e 69a6 736f 7572  l_albattani.sour
        0x00e0:  6365 a673 7464 6f75 7480                 ce.stdout.
08:24:25.434625 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 4540, win 419, options [nop,nop,TS val 992125142 ecr 2530960364], length 0
        0x0000:  4500 0034 0ab8 4000 4006 d7e6 ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 23d9  ....^..~@rmX.[#.
        0x0020:  8010 01a3 584c 0000 0101 080a 3b22 a0d6  ....XL......;"..
        0x0030:  96db 63ec                                ..c.
08:24:25.434679 IP 172.17.0.3.36663 > 172.17.0.2.24224: Flags [S], seq 3331640399, win 29200, options [mss 1460,sackOK,TS val 2770148952 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c 575d 4000 4006 8b37 ac11 0003  E..<W]@[email protected]....
        0x0010:  ac11 0002 8f37 5ea0 c694 cc4f 0000 0000  .....7^....O....
        0x0020:  a002 7210 5856 0000 0204 05b4 0402 080a  ..r.XV..........
        0x0030:  a51d 1e58 0000 0000 0103 0307            ...X........
08:24:25.434714 IP 172.17.0.2.24224 > 172.17.0.3.36663: Flags [R.], seq 0, ack 3331640400, win 0, length 0
        0x0000:  4500 0028 0000 4000 4006 e2a8 ac11 0002  E..(..@.@.......
        0x0010:  ac11 0003 5ea0 8f37 0000 0000 c694 cc50  ....^..7.......P
        0x0020:  5014 0000 d6eb 0000                      P.......
08:24:26.439883 IP 172.17.0.3.36427 > 172.17.0.2.24224: Flags [S], seq 3898998966, win 29200, options [mss 1460,sackOK,TS val 2770149957 ecr 0,nop,wscale 7], length 0
        0x0000:  4500 003c 9e49 4000 4006 444b ac11 0003  E..<.I@[email protected]....
        0x0010:  ac11 0002 8e4b 5ea0 e866 00b6 0000 0000  .....K^..f......
        0x0020:  a002 7210 5856 0000 0204 05b4 0402 080a  ..r.XV..........
        0x0030:  a51d 2245 0000 0000 0103 0307            .."E........
08:24:26.439969 IP 172.17.0.2.24224 > 172.17.0.3.36427: Flags [R.], seq 0, ack 3898998967, win 0, length 0
        0x0000:  4500 0028 0000 4000 4006 e2a8 ac11 0002  E..(..@.@.......
        0x0010:  ac11 0003 5ea0 8e4b 0000 0000 e866 00b7  ....^..K.....f..
        0x0020:  5014 0000 819f 0000                      P.......
08:24:26.440410 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [P.], seq 4540:4722, ack 1, win 229, options [nop,nop,TS val 2530961370 ecr 992125142], length 182
        0x0000:  4500 00ea 7600 4000 4006 6be8 ac11 0001  E...v.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 23d9 4072 6d58  .....~^..[#.@rmX
        0x0020:  8018 00e5 5902 0000 0101 080a 96db 67da  ....Y.........g.
        0x0030:  3b22 a0d6 94ac 6565 3862 3266 3439 3939  ;"....ee8b2f4999
        0x0040:  6232 d25f 7ad8 3a84 ac63 6f6e 7461 696e  b2._z.:..contain
        0x0050:  6572 5f69 64d9 4065 6538 6232 6634 3939  er_id.@ee8b2f499
        0x0060:  3962 3239 6236 6232 3061 3034 6265 6462  9b29b6b20a04bedb
        0x0070:  3437 6165 6131 3330 3665 3962 3865 3462  47aea1306e9b8e4b
        0x0080:  3362 6439 3333 6161 3031 3062 3732 3866  3bd933aa010b728f
        0x0090:  3465 6133 6664 61ae 636f 6e74 6169 6e65  4ea3fda.containe
        0x00a0:  725f 6e61 6d65 b22f 6d61 6769 6361 6c5f  r_name./magical_
        0x00b0:  616c 6261 7474 616e 69a6 736f 7572 6365  albattani.source
        0x00c0:  a673 7464 6f75 74a3 6c6f 67bd 4d6f 6e20  .stdout.log.Mon.
        0x00d0:  4f63 7420 2035 2030 383a 3234 3a32 3620  Oct..5.08:24:26.
        0x00e0:  5554 4320 3230 3230 0d80                 UTC.2020..
08:24:26.440434 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [.], ack 4722, win 428, options [nop,nop,TS val 992126148 ecr 2530961370], length 0
        0x0000:  4500 0034 0ab9 4000 4006 d7e5 ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 248f  ....^..~@rmX.[$.
        0x0020:  8010 01ac 584c 0000 0101 080a 3b22 a4c4  ....XL......;"..
        0x0030:  96db 67da                                ..g.
08:24:26.749555 IP 172.17.0.2.24224 > 172.17.0.1.59006: Flags [F.], seq 1, ack 4722, win 428, options [nop,nop,TS val 992126457 ecr 2530961370], length 0
        0x0000:  4500 0034 0aba 4000 4006 d7e4 ac11 0002  E..4..@.@.......
        0x0010:  ac11 0001 5ea0 e67e 4072 6d58 bc5b 248f  ....^..~@rmX.[$.
        0x0020:  8011 01ac 584c 0000 0101 080a 3b22 a5f9  ....XL......;"..
        0x0030:  96db 67da                                ..g.
08:24:26.750562 IP 172.17.0.1.59006 > 172.17.0.2.24224: Flags [.], ack 2, win 229, options [nop,nop,TS val 2530961680 ecr 992126457], length 0
        0x0000:  4500 0034 7601 4000 4006 6c9d ac11 0001  E..4v.@[email protected].....
        0x0010:  ac11 0002 e67e 5ea0 bc5b 248f 4072 6d59  .....~^..[$.@rmY
        0x0020:  8010 00e5 584c 0000 0101 080a 96db 6910  ....XL........i.
        0x0030:  3b22 a5f9                                ;"..

From the above, IP address mappings are probably as follows:

  • 172.17.0.1.59006: docker engine (fluentd log driver)
  • 172.17.0.2.24224: fluent-bit
  • 172.17.0.3.*: nc(port scanner)

As you can see, new connections from 172.17.0.3.* were rejected, but the existing connection from 172.17.0.1.59006 was still sending packets. What's actually happened for the existing connection on the socket close operation?

xmcqueen pushed a commit to xmcqueen/fluent-bit that referenced this issue Oct 6, 2020
sirwio pushed a commit to sirwio/fluent-bit that referenced this issue Oct 7, 2020
…t#2610)

This patch adds some extra handling for service termination, so when
the engine paused ingestion, the server socket is close to avoid
having incoming connection and not-processed records.

This patch works when the service is not in a scenario with back-pressure,
so it can be considered a partial workaround.

There is an upcoming re-resign of server side API for scalability that
will address all these issues.

Signed-off-by: Eduardo Silva <[email protected]>
@minamijoyo
Copy link
Author

@edsiper I noticed that v1.6.0 release note said:

https://fluentbit.io/announcements/v1.6.0/

Properly handle of SIGTERM signal, graceful shutdown.

What does that mean for this issue?

Kingyo added a commit to Kingyo/fluent-bit that referenced this issue Nov 25, 2020
* ra: fix typo of comment

Signed-off-by: Takahiro YAMASHITA <[email protected]>

* build: add an option for OSS-Fuzz builds (fluent#2502)

This will make things a lot easier from the OSS-Fuzz side and also make it easier to construct new fuzzers.

* aws: utils: fix mem leak in flb_imds_request (fluent#2532)

Signed-off-by: Wesley Pettit <[email protected]>

* io: fix EINPROGRESS check, also check that errno is not zero (fluent#2497)

Signed-off-by: Eduardo Silva <[email protected]>

* upstream: new 'destroy_queue' to defer connections context destroy (fluent#2497)

A connection context might be destroyed while the event loop still has some
pending event to be processed, in some cases a network exception. Destroying
the context might lead to a corruption.

The following patch implements a new queue to store temporary the connection
context so the 'destroy' process is defered until all events from the event
loop has been processed.

Signed-off-by: Eduardo Silva <[email protected]>

* engine: do upstream cleanup after the event loop finished processing events (fluent#2497)

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: parse tag that matches kubernetes default regex

Signed-off-by: Jeff Luo <[email protected]>

* test: runtime: out_stackdriver: add unit test for tag regex matching

Signed-off-by: Jeff Luo <[email protected]>

* upstream: expose flb_upstream_conn_pending_destroy()

Signed-off-by: Eduardo Silva <[email protected]>

* tests: runtime: filter_modify: fix sleep() usage and others

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: fix header guard

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: add missing ctype.h header on http_request

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: add cast for regex_match() variable

Signed-off-by: Eduardo Silva <[email protected]>

* out_azure_blob: add support for blockblob type and new blob_type property

Signed-off-by: Eduardo Silva <[email protected]>

* upstream: fix deletion of destroy_queue

Signed-off-by: Eduardo Silva <[email protected]>

* io: remove unnecessary errno message

Signed-off-by: Eduardo Silva <[email protected]>

* upstream: remove unnecessary errno message

Signed-off-by: Eduardo Silva <[email protected]>

* out_syslog: move configuration to a config_map

Signed-off-by: Eduardo Silva <[email protected]>

* pack: print timestamp formatting timespec fields

Signed-off-by: Eduardo Silva <[email protected]>

* filter_lua: fix handling of integer/double values (fluent#1932 fluent#1647)

This patch makes to recognize the preferred numeric data type returned by
the Lua script in the record fields.

Signed-off-by: Eduardo Silva <[email protected]>

* filter_lua: new 'time_as_table' property for timestamp handling (fluent#641 fluent#2519 fluent#2015)

Long time ago we got some reports that using timestamps as double/floats might lost
precision when the values are converted back from Lua. Actually there is no exact
way to have 100% precision in doubles in our use case 'C > Lua > C'. Community
suggested that we might workaround this with another solution.

This patch considering backward compatibility, implements a new configuration
property called 'time_as_table', which passes the timestamp as a Lua table
with the following keys

  ['sec']  => timestamp seconds
  ['nsec'] => timestamp nanoseconds

for users looking for 100% timestamp precision and specifically when dealing
with nanoseconds, this option ensures timestamp integrity. If the option is enabled
the user just need to adjust the script to use the new format if they touch or
use the timestamp value.

By default the option 'time_as_table' is disabled and in the future we might
consider to enable it by default.

Signed-off-by: Eduardo Silva <[email protected]>

* out_file: use flb_time to format timestamp

Signed-off-by: Eduardo Silva <[email protected]>

* pack: remove unused variable

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: mp: include 'flb_compat.h'

<unistd.h> is not POSIX standard and not provided by Microsoft
Visual C++.

Include "flb_compat.h" to avoid the build failure.

Signed-off-by: Fujimoto Seiji <[email protected]>

* utils: add `flb_utils_hex2int()` function

mk_utils_hex2int() is included in mk_server. For this reason, this
function only exists when FLB_HTTP_SERVER is enabled.

Bring the function to flb_utils.c, so that we can use the
functionality regardless of the compile-time option.

Signed-off-by: Fujimoto Seiji <[email protected]>

* Port Azure BLOB Output plugin to Windows

With this patch out_azure_blob can be compiled and linked on
Windows Server.

This patch also fixes CI builds on AppVeyor.

Signed-off-by: Fujimoto Seiji <[email protected]>

* pack: Use PRIu64/PRId64 to format 64 bit integers.

Do not use `%lu` to format 64-bit int; A long can be 32-bit depending
on the system, so this line can implicitly truncate the target value.

This should fix the "timestamps are truncated inproperly" bug on
out_datadog.

Signed-off-by: Fujimoto Seiji <[email protected]>

* tests: internal: fuzzer: added three new fuzzer for msgpack handling (fluent#2541)

Signed-off-by: davkor <[email protected]>

* tests: runtime: out_cloudwatch_logs: Disable net.keepalive in test (fluent#2533)

Signed-off-by: Wesley Pettit <[email protected]>

* in_forward: fix header declaration (copy paste) (fluent#2559)

Use FLB_IN_FW_CONFIG_H instead the MQTT one

* ci: travis: remove deprecated key (fluent#2454)

See: https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration
Signed-off-by: Zero King <[email protected]>

* out_es: support nanosecond timestamp precision (fluent#2544)

Starting in Elasticsearch 7, a "date_nanos" data type was added, increasing
timestamp precision from milliseconds to nanoseconds.

This patch adds a "Time_Key_Nanos" option which tells the ElasticSearch output
plugin to send 9 decimal places instead of 3 to ElasticSearch.

Tests are included, and a patch to document the new option will be submitted
shortly.

Signed-off-by: Neal Turett <[email protected]>

* doc: Add missing plugins to README (fluent#2534)

Signed-off-by: zihengCat <[email protected]>

* lib: sqlite: upgrade from 3.31.0 to 3.33.0 (fluent#2552)

Signed-off-by: Eduardo Silva <[email protected]>

* random: Introduce a random generator (fluent#2555)

This adds flb_randombytes(), which fills the given buffer with random
numbers using each OS's built-in Crypt API.

The most common use scenario is:

    unsigned char buf[64]

    if (get_randombytes(buf, 64)) {
        flb_error("cannot get random bytes");
    }

This function supports both UNIX and Windows. You can use this
function without caring the underlying OS.

Signed-off-by: Fujimoto Seiji <[email protected]>

* lib: mbedtls: upgrade from 2.16.5 to 2.24.0

Signed-off-by: Eduardo Silva <[email protected]>

* lib: mbedtls: comment out link_to_source call

Signed-off-by: Eduardo Silva <[email protected]>

* lib: mbedtls: comment out Python3 check and 3dparty dir

Signed-off-by: Eduardo Silva <[email protected]>

* pack: gelf: validate incoming object is a map (oss-fuzz 25754)

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: fuzzer: remove test message

Signed-off-by: Eduardo Silva <[email protected]>

* lib: luajit: upgrade from 2.1.0-beta3 to 2.1.0-dd5032e

Signed-off-by: Eduardo Silva <[email protected]>

* build: adjust luajit path

Signed-off-by: Eduardo Silva <[email protected]>

* pack: gelf: format timestamp as seconds.milliseconds

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: gelf: adjust test case for expected records and timestamp

Signed-off-by: Eduardo Silva <[email protected]>

* in_cpu: normalize per-process CPU stats by number of cores (fluent#2543)

Signed-off-by: yang-padawan <[email protected]>

* pack: json_sds: validate unpacking

Signed-off-by: Eduardo Silva <[email protected]>

* utils: fix bad handling of invalid utf-8 bytes (oss-fuzz 25785)

Signed-off-by: Eduardo Silva <[email protected]>

* strptime: Add a fallback macro for `timezone` (fluent#2493)

According to the UNIX standard:

    The external variable timezone shall be set to the difference,
    in seconds, between Coordinated Universal Time (UTC) and local
    standard time

FreeBSD is incompatible with this standard. In particular, since it
exposes a function symbol `char* timezone(int, int)`, expressions
like `-(timezone)` causes a compile error.

Fix it by adding a compat macro for FreeBSD.

Signed-off-by: Fujimoto Seiji <[email protected]>

* out_forward: always initialize salt with random numbers (fluent#2575)

There is an initialization bug that leaves the shared key salt
being uninitialized when SSL is not enabled.

This might allow attackers to guess the shared key by looking at
the hash. Let's always initialize the salt buffer securely using
flb_randombytes().

Signed-off-by: Fujimoto Seiji <[email protected]>

* out_gelf: port the plugin to Windows (fluent#2574)

With this patch, the GELF Output plugin can be compiled and linked
on Windows.

This commit also fixes a few bugs in GELF plugin:

 * The message header was not constructed properly. In particular,
   11-12th bytes were filled in the reversed order.

 * The message id generation was bogus (e.g. it did "tv_nsec*1000000
   + tm.tm.tv_nsec" to generate a timestamp)

Fix these glitches and add detailed documentation to each function.

Signed-off-by: Fujimoto Seiji <[email protected]>

* out_pgsql: add support for CockroachDB (fluent#2512)

The SQL query will change when new option `cockroachdb` it's set to true

Signed-off-by: Jonathan Gonzalez V <[email protected]>

Co-authored-by: Jonathan Gonzalez V <[email protected]>

* lib: chunkio: upgrade to v1.0.6 (dev changes)

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: new interface to manage local files storage

Signed-off-by: Eduardo Silva <[email protected]>

* random: rename function flb_randombytes() to flb_random_bytes()

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: random: fix function name

Signed-off-by: Eduardo Silva <[email protected]>

* out_forward: remove unused variable and adjust random api name

Signed-off-by: Eduardo Silva <[email protected]>

* build: on FLB_SMALL mode, do not use 'strip' flag for non-gcc compiler

Signed-off-by: Eduardo Silva <[email protected]>

* strptime: initialize 'len' variable

Signed-off-by: Eduardo Silva <[email protected]>

* pack: gelf: initialize 'val_len' variable

Signed-off-by: Eduardo Silva <[email protected]>

* str: use memcpy to silent gcc warning

Signed-off-by: Eduardo Silva <[email protected]>

* io: tls: remove unused conditional and initialize variable

Signed-off-by: Eduardo Silva <[email protected]>

* filter_lua: initialize variable

Signed-off-by: Eduardo Silva <[email protected]>

* filter_parser: initialize variable

Signed-off-by: Eduardo Silva <[email protected]>

* filter_kubernetes: validate metadata is a map

Signed-off-by: Eduardo Silva <[email protected]>

* in_systemd: initialize variable

Signed-off-by: Eduardo Silva <[email protected]>

* out_azure_blob: add cast

Signed-off-by: Eduardo Silva <[email protected]>

* out_gelf: initialize variable

Signed-off-by: Eduardo Silva <[email protected]>

* out_syslog: initialize variables

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: signv4: initialize variable

Signed-off-by: Eduardo Silva <[email protected]>

* in stdin: add buffer_size parameter (fluent#2364)

* in sdtin: add buffer_size parameter

Signed-off-by: Martin Dojcak <[email protected]>

* lib: mbedtls: force compiler c99 mode

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: disable gelf test in Win32

Signed-off-by: Eduardo Silva <[email protected]>

* aws: s3 key format option

Signed-off-by: Meghna Prabhu <[email protected]>

* signv4: Support S3 Signed Payload

Signed-off-by: Wesley Pettit <[email protected]>

* out_es: update with new signv4 api

Signed-off-by: Wesley Pettit <[email protected]>

* aws: add new S3 local buffering library

Signed-off-by: Wesley Pettit <[email protected]>

* aws: s3 object key formatting and request signing

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: new output plugin for Amazon S3

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: use new fstore API plus other adjustments

 - use new fstore API for buffer management
 - register plugin flags
 - config: change configmap for TIME and SIZE value types
 - config: rename chunk_buffer_dir -> store_dir
 - fix leaks on exceptions
 - use field to 'lock' chunks, instead of temporary node deletion
 - others adjustments

note: still work in process.

Signed-off-by: Eduardo Silva <[email protected]>

* aws: fix broken s3 key test

Signed-off-by: Wesley Pettit <[email protected]>

* out_kinesis_firehose: new high performance core plugin for Kinesis Firehose (fluent#2572)

Signed-off-by: Wesley Pettit <[email protected]>

* in_tail: implement large-file support for Windows

Previously it was not possible to handle files larger than 2GB on
Windows.

The fix is archived by migrating each vairable and function to be
explicitly 64-bit ready.

Signed-off-by: Fujimoto Seiji <[email protected]>

* in_tail: do a small cleanup of the Windows port

We have added a number of helpers to make it easier to support
Windows (e.g. `flb_pipefd_t` as a portable type for pipes).

This introduces these helpers to in_tail.

Signed-off-by: Fujimoto Seiji <[email protected]>

* fstore: return on memory exception (CID 309436)

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: release context on exception (CID 309435)

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: fix error message parameter (CID 309433)

Signed-off-by: Eduardo Silva <[email protected]>

* signv4: fix sds_printf parameters (CID 306765)

Signed-off-by: Eduardo Silva <[email protected]>

* out_azure_blob: fix use-after-free on exception (CID 306763 306762)

Signed-off-by: Eduardo Silva <[email protected]>

* out_azure_blob: do not use http context before validation (CID 306663)

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: release pattern on exception (CID 305802)

Signed-off-by: Eduardo Silva <[email protected]>

* in_tail: fix debug message params (CID 305255)

Signed-off-by: Eduardo Silva <[email protected]>

* out_cloudwatch_logs: add support for custom sts endpoint

Signed-off-by: Meghna Prabhu <[email protected]>

* out_es: add support for custom STS endpoint

Signed-off-by: Meghna Prabhu <[email protected]>

* aws: add support for custom STS endpoint in flb_aws_sts_provider

Signed-off-by: Meghna Prabhu <[email protected]>

* aws: add support for custom sts endpoint in flb_eks_provider

Signed-off-by: Meghna Prabhu <[email protected]>

* aws: sts: set custom_endpoint flag correctly

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: Add sts_endpoint and role_arn options

Signed-off-by: Wesley Pettit <[email protected]>

* out_kinesis_firehose: Add sts_endpoint option

Signed-off-by: Wesley Pettit <[email protected]>

* out_cloudwatch_logs: truncate events that are larger than 256KiB

Signed-off-by: Wesley Pettit <[email protected]>

* out_cloudwatch_logs: fix off by one error in payload creation

Signed-off-by: Wesley Pettit <[email protected]>

* out_cloudwatch_logs: Use documented max payload size and other fixes

Signed-off-by: Wesley Pettit <[email protected]>

* out_kinesis_firehose: fix type of event_bytes (CID 309457)

Signed-off-by: Wesley Pettit <[email protected]>

* out_kinesis_firehose: Use correct printf format for time_t (CID 309456)

Signed-off-by: Wesley Pettit <[email protected]>

* fstore: fix null dereference (CID 309455)

Signed-off-by: Wesley Pettit <[email protected]>

* aws: util: fix double free issues in get_s3_key (CID 309453 & 309443) (fluent#2599)

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix possible null dereference (CID 309448)

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: always cast fstore meta_buf to char * (CID 309442)

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix mem leak (CID 309438)

Signed-off-by: Wesley Pettit <[email protected]>

* lib: jansson: add jansson library for upcoming Avro support (fluent#2568)

Signed-off-by: xmcqueen <[email protected]>

* doc: add users: 'Microsoft', 'Linked In' and 'Trend Micro'

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: use the correct project ID

The value for project ID of monitored resources of a LogEntry should be
project ID, not the project number, which is in numeric format.

The value of the project ID of Cloud Logging's Monitored Resources are
documented in
https://cloud.google.com/logging/docs/api/v2/resource-list.

The project ID to be retrived from instance metadata is documented in
https://cloud.google.com/compute/docs/storing-retrieving-metadata#default.

Signed-off-by: Yen-Cheng Chou <[email protected]>

* pack: fix json floating point format regression (fluent#2592)

The recent change to the JSON floating point formatting to use "%.16g" caused a
regression where values that have no fractional part are formatted as integers.
For example, "10.0" gets formatted as "10". This patch uses the same approach
as https://github.com/ohler55/oj/blob/v3.10.13/ext/oj/dump_strict.c#L100-L101,
which is used in Fluentd. It checks if the double value is equal to the integer
part, and if so, will use "%.1f" as the format to ensure the decimal part is
still rendered (with a single decimal place of ".0"). This prevents downstream
datastores from having data type conflicts.

This was tested by building locally and running through different value using
the dummy input plugin and stdout output plugin with json_lines formatting.
Will include example outputs of tests in Pull Request.

Signed-off-by: Joey Paskhay <[email protected]>

* aws_util: fix failing sanitizer builds (fluent#2604)

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: use correct printf format for MAX_UPLOAD_ERRORS (CID 309440) (fluent#2602)

Signed-off-by: Wesley Pettit <[email protected]>

* out_stackdriver: fix project_id length for testing (fluent#2611)

Signed-off-by: Yen-Cheng Chou <[email protected]>

* out_http: on error, print HTTP response payload if available (fluent#2593)

Signed-off-by: Eduardo Silva <[email protected]>

* config: new flag to determinate if the ingestion is active

Signed-off-by: Eduardo Silva <[email protected]>

* engine: upon exit, turn ingestion flag off

Signed-off-by: Eduardo Silva <[email protected]>

* input: if ingestion is disable, do not resume it

Signed-off-by: Eduardo Silva <[email protected]>

* input: chunk: if ingestion is disable, do not resume it

Signed-off-by: Eduardo Silva <[email protected]>

* lib: rename worker thread to 'flb-pipeline'

Signed-off-by: Eduardo Silva <[email protected]>

* log: rename worker thread to 'flb-logger'

Signed-off-by: Eduardo Silva <[email protected]>

* bin: spawn pipeline using library mode (fluent#1496 fluent#2610)

Signed-off-by: Eduardo Silva <[email protected]>

* task: fix counter of running tasks, use 'users' counter (fluent#2411)

Signed-off-by: Eduardo Silva <[email protected]>

* input_chunk: drop oldest buffer chunks when reaching the limit

Signed-off-by: Jeff Luo <[email protected]>

* output: add new buffer option: storage.total_limit_size

Signed-off-by: Jeff Luo <[email protected]>

* router: add function to get routes_mask by tag

Signed-off-by: Jeff Luo <[email protected]>

* scheduler: clean up request in request_wait queue

Signed-off-by: Jeff Luo <[email protected]>

* in_tail: add checking to avoid deleting fd with value -1

Signed-off-by: Jeff Luo <[email protected]>

* pipe: add checking to avoid closing fd with value -1

Signed-off-by: Jeff Luo <[email protected]>

* task: creat task_routes by routes_mask

Signed-off-by: Jeff Luo <[email protected]>

* test: internal: input_chunk: add unit tests for buffering mechanism

Signed-off-by: Jeff Luo <[email protected]>

* in_tail: inotify: use proper logger API

Signed-off-by: Eduardo Silva <[email protected]>

* lib: new flb_loop() to wait for main thread exit

Signed-off-by: Eduardo Silva <[email protected]>

* bin: use new flb_loop() for clear exit

Signed-off-by: Eduardo Silva <[email protected]>

* out_gelf: port the random seed generation to Windows (fluent#2614)

Windows does not have /dev/urandom. For this reason, it was always
using a less secure value (= UNIX time) as an entropy source.

Use flb_randombytes() to use a good entropy source, and thus,
reduce the possibility of message collision.

Signed-off-by: Fujimoto Seiji <[email protected]>

* in_random: use the new API to generate a random seed (fluent#2613)

Simplify the random seed generation by using flb_random_bytes()
instead of trying to read entropy sources manually.

Signed-off-by: Fujimoto Seiji <[email protected]>

* aws: util: initialize key buffer to zero

Signed-off-by: Eduardo Silva <[email protected]>

* pack: on JSON formatting, remove duplicated keys (fluent#1835 fluent#1051)

Our internal serialization format allows to have several keys with
the same name in a map. When converting to JSON the case is quite
dangerous, despite the JSON spec is not mandatory about having
unique keys at the same level in a map, we got tons of reports that
backend services that receives JSON payload raises exception due
to duplicated keys.

On this patch now the JSON formatter will check for duplicated
keys, if found, it preserves the latest one found.

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: pack: add check for handling of duplicated JSON keys (fluent#1835 fluent#1051)

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: pack: release buffers

Signed-off-by: Eduardo Silva <[email protected]>

* in_tail: Prevent concatenation of no mulitline logs

Signed-off-by: Dominik Rosiek <[email protected]>

* tests: runtime: in_tail: Add test for dmode_firstline for docker mode

Signed-off-by: Dominik Rosiek <[email protected]>

* in_tail: read from tail (duh!) (fluent#1667 fluent#1761 fluent#474 fluent#1645 fluent#1330)

From now on, Tail plugin implements the following behavior:

1. If a file is already registered in the database and contains an offset,
   the file is consumed from that offset position.

2. Upon start, if a file is not known by the database, read from it tail.

3. If the new 'read_from_head' property (default: false) is enabled, for newly
   discovered files read from the beginning. This flag don't override the
   behavior of a file that already exists in the database.

Additional fix:

When a file is being monitored in 'static mode', handle truncation properly.

Signed-off-by: Eduardo Silva <[email protected]>

* tests: runtime: filter_kubernetes: use tail + 'read_from_tail'

Signed-off-by: Eduardo Silva <[email protected]>

* tests: runtime: out_stackdriver: use tail + 'read_from_tail'

Signed-off-by: Eduardo Silva <[email protected]>

* tests: runtime: in_tail: use 'read_from_tail'

Signed-off-by: Eduardo Silva <[email protected]>

* input: chunk: comments cleanup

Signed-off-by: Eduardo Silva <[email protected]>

* bin: win32: fix 'error LNK2001: unresolved external symbol config' (fluent#2627)

f303cdb has removed the global symbol "config" on which winsvc.c
depends. This ended up producing the following error:

    error LNK2001: unresolved external symbol config

Bring back the symbol to make Windows Service working again.

Signed-off-by: Fujimoto Seiji <[email protected]>

* in_tail: db: reset bindings after file insertion (fluent#2576)

Signed-off-by: Eduardo Silva <[email protected]>

* in_tail: always read from head for new discovered files

Signed-off-by: Eduardo Silva <[email protected]>

* out_cloudwatch_logs: bug fix: self throttle if too many calls per flush (fluent#2618)

Signed-off-by: Wesley Pettit <[email protected]>

* out_kinesis_firehose: add support for log_key (fluent#2619)

* out_kinesis_firehose: add support for log_key

Signed-off-by: Wesley Pettit <[email protected]>

* aws_util: increase the jsmn default token size

Signed-off-by: Drew Zhang <[email protected]>

* filter_aws: expand EC2 metadata fields

Signed-off-by: Drew Zhang <[email protected]>

* ci: Skip 'tests/internal/input_chunk.c' on Windows

This unittest relies on chunkio's file storage, which Windows port
does not support yet. Let's skip it for now.

Signed-off-by: Fujimoto Seiji <[email protected]>

* filter_aws: fix return in get_vpc_metadata

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: add scaffolding for runtime tests

Signed-off-by: Wesley Pettit <[email protected]>

* tests: runtime: out_s3: add runtime tests

Signed-off-by: Wesley Pettit <[email protected]>

* input: on socket collector creation, return collector id

Signed-off-by: Eduardo Silva <[email protected]>

* in_forward: partial workaround to close connections on SIGTERM (fluent#2610)

This patch adds some extra handling for service termination, so when
the engine paused ingestion, the server socket is close to avoid
having incoming connection and not-processed records.

This patch works when the service is not in a scenario with back-pressure,
so it can be considered a partial workaround.

There is an upcoming re-resign of server side API for scalability that
will address all these issues.

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: extend API to map all chunks

Signed-off-by: Eduardo Silva <[email protected]>

* out_s3: migrate uploads to fstore API

Signed-off-by: Eduardo Silva <[email protected]>

* out_s3: update file size

Signed-off-by: Eduardo Silva <[email protected]>

* out_cloudwatch_logs: don't include <unistd.h> on Windows (fluent#2631)

MSVC++ does not provide <unistd.h>. For this reason, including
this header breaks the compilation on Windows.

Fix it by adding an ifdef block to the header.

Signed-off-by: Fujimoto Seiji <[email protected]>

* cmake: define S3 and Kinesis Firehose plugin for Windows (fluent#2632)

Turn off plugins that do not work on Windows yet, so that we
can compile the master HEAD on MSVC.

This patch is required for v1.6 release.

Signed-off-by: Fujimoto Seiji <[email protected]>

* in_tail: new 'db.locking' option and other perf improvements (fluent#2564)

This patch adjust sqlite synchronization mode by default
to 'normal', it sets journal mode to WAL and it adds a
new option called 'db.locking' (default: false) which helps
to reduce the number of syscalls on every commit but at the price
of locking the access to the database file to third party programs.

The new adjustments makes a significant performance increase reducing
database I/O in about 40%.

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: clean up buffer before return (fluent#2640)

Signed-off-by: Yen-Cheng Chou <[email protected]>

* task: fix typo messages and variables name 'attemps' to 'attempts' (fluent#2636)

Signed-off-by: Tanapol Rattanapichetkul <[email protected]>

* out_stackdriver: update error message (fluent#2624)

Signed-off-by: Peijia Luo <[email protected]>

* aws: remove old s3 local buffer lib

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: add bucket name to store dir

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix build warnings

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix help text for upload_timeout

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix UploadPart response parsing

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: make iso8601 the default date format

Signed-off-by: Wesley Pettit <[email protected]>

* fstore: fix build warning

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix build failure

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix multipart upload buffering (fluent#2645)

Signed-off-by: Wesley Pettit <[email protected]>

* in_forward: handle 'compressed text' option (fluent#2644)

Signed-off-by: Eduardo Silva <[email protected]>

* out_stackdriver: support monitored resource ingestion from log entry

Signed-off-by: Jie Wu <[email protected]>

* tests: runtime: out_stackdriver: monitored resource ingestion

Signed-off-by: Jie Wu <[email protected]>

* signv4: fix uri encoding to allow '=' in the uri path

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: use UPLOAD_TIMER_MIN_WAIT correctly

Signed-off-by: Wesley Pettit <[email protected]>

* out_es: make aws_sts_endpoint optional

Signed-off-by: Wesley Pettit <[email protected]>

* lib: invalidate context on destroy

Signed-off-by: Eduardo Silva <[email protected]>

* out_es: increase default response buffer size to 512k

Signed-off-by: Eduardo Silva <[email protected]>

* aws: add eks, sts, and env credential providers

Added three credentials providers:
 - EKS: Obtain credentials via sts:AssumeRoleWithWebIdentity using a k8s OIDC token
 - STS: Given a base set of credentials, obtain a new set with sts:AssumeRole
 - ENV: Standard environment variables for credentials

Signed-off-by: Wesley Pettit <[email protected]>

* out_cloudwatch_logs: add support for cpu and mem metrics using cloudwatch emf

Signed-off-by: Rayhan Hossain <[email protected]>

* out_cloudwatch_logs: rabse with master and address error handling

This change rebases the emf support code with master. Additionally,
it adds null memory allocation check and better error handling.

Signed-off-by: Rayhan Hossain <[email protected]>

* out_cloudwatch_logs: remove redundant code blocks and fix spelling mistake

Signed-off-by: Rayhan Hossain <[email protected]>

* out_cloudwatch_logs: fix spelling mistake

Signed-off-by: Rayhan Hossain <[email protected]>

* lib: chunkio: sync dev changes v1.0.6 #1c59044

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: delete empty streams on exit

Signed-off-by: Eduardo Silva <[email protected]>

* out_kafka: new option 'queue_full_retries' (fluent#2553)

When the librdkafka queue is full and Fluent Bit cannot ingest more data,
it does a 'local retry' of maximum 10 times waiting a second between each
retry. But in some cases like the exposed in fluent#2553 is not enough.

This patch exposes a new configuration property called 'queue_full_retries'
that configure the maximum number of times the retry must be done.

Note that this limit now can be disabled setting a value of '0' or 'false'.

Signed-off-by: Eduardo Silva <[email protected]>

* filter_tensorflow: new Tensorflow Lite filter plugin! (fluent#2511)

Signed-off-by: Masoud Koleini <[email protected]>

* tests: internal: aws: do not declare variables inside for loops

Signed-off-by: Eduardo Silva <[email protected]>

* aws: util: remove unneeded flb_s3_endpoint function

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: use path based endpoints to support bucket names with dots

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: s3_store: skip locked chunks in s3_store_file_get

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: fix mem leak

Signed-off-by: Wesley Pettit <[email protected]>

* out_s3: store: validate contexts on exit

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: remove unused dump

Signed-off-by: Eduardo Silva <[email protected]>

* bin: on SIGTERM exit properly

Signed-off-by: Eduardo Silva <[email protected]>

* conf: update typo in fluent-bit.conf (fluent#2641)

Simple typo in the default config for fluent-bit

* fstore: fix invalid reference on exception (CID 310820)

Signed-off-by: Eduardo Silva <[email protected]>

* bin: on SIGTERM, wait for child thread

Signed-off-by: Eduardo Silva <[email protected]>

* out_s3: release 'uri' on exception (CID 310852)

Signed-off-by: Eduardo Silva <[email protected]>

* tests: runtime_shell: tail: set 'read_from_head'

Signed-off-by: Eduardo Silva <[email protected]>

* fstore: support chunkio backend type

Signed-off-by: Eduardo Silva <[email protected]>

* out_s3: store: detect Travis environment, if so, use fstore memory backend

Signed-off-by: Eduardo Silva <[email protected]>

* lib: mbedtls: CMakeLists.txt: fix a typo for CFLAGS (fluent#2662)

CMAKE_C_CFLAGS -> CMAKE_C_FLAGS, this avoids C_FLAGS from environment
being overridden.

Signed-off-by: Ming Liu <[email protected]>

* build: bump to v1.7.0

Signed-off-by: Eduardo Silva <[email protected]>

* Dockerfile: bump to v1.7.0

Signed-off-by: Eduardo Silva <[email protected]>

* bitbake: bump to v1.7.0

Signed-off-by: Eduardo Silva <[email protected]>

* doc: add users: DigitalOcean

Signed-off-by: Eduardo Silva <[email protected]>

* tests: fuzzers: added a genera parser fuzzer  (fluent#2665)

Signed-off-by: davkor <[email protected]>

* tests: internal: fuzzers: fix typo in cmakelists  (fluent#2675)

Signed-off-by: davkor <[email protected]>

* scheduler: use flb_random_bytes() instead of /dev/urandom (fluent#2679)

Windows does not support /dev/urandom. For this reason, it ended
up using a weak entropy source for task scheduling.

Avoid this issue by using flb_random_bytes() instead.

Signed-off-by: Fujimoto Seiji <[email protected]>

* pack: gelf: on flatten, only process string key types (oss-fuzz 26294)

Signed-off-by: Eduardo Silva <[email protected]>

* parser: on exception, use flb_parser_destroy() (oss-fuzz 26308)

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: fuzzer: initialize timestamp

Signed-off-by: Eduardo Silva <[email protected]>

* tests: internal: fuzzer: fix missing data type declaration

Signed-off-by: Eduardo Silva <[email protected]>

* Import libmaxminddb-1.3.2

Signed-off-by: okkez <[email protected]>

* Add files to implement filter geoip2

Signed-off-by: okkez <[email protected]>

* Add missing header

Signed-off-by: okkez <[email protected]>

* Implement filter_geoip2

Signed-off-by: okkez <[email protected]>

* Add simple test for filter_geoip2

Signed-off-by: okkez <[email protected]>

* Duplicate string

Signed-off-by: okkez <[email protected]>

* Use make install as INSTALL_COMMAND

Signed-off-by: okkez <[email protected]>

* Remove unused variables

Signed-off-by: okkez <[email protected]>

* Free lookup_key

Signed-off-by: okkez <[email protected]>

* Reset counter

Signed-off-by: okkez <[email protected]>

* Add more test for filter_geoip2

Signed-off-by: okkez <[email protected]>

* Plug memory leaks reported by valgrind

Signed-off-by: okkez <[email protected]>

* Improve log messages

Signed-off-by: okkez <[email protected]>

* Move libmaxminddb-1.3.2 to plugins/filter_geoip/

Signed-off-by: okkez <[email protected]>

* add cmake

Co-authored-by: Takahiro YAMASHITA <[email protected]>
Co-authored-by: DavidKorczynski <[email protected]>
Co-authored-by: Wesley Pettit <[email protected]>
Co-authored-by: Eduardo Silva <[email protected]>
Co-authored-by: Jeff Luo <[email protected]>
Co-authored-by: Fujimoto Seiji <[email protected]>
Co-authored-by: Aleks <[email protected]>
Co-authored-by: Zero King <[email protected]>
Co-authored-by: turettn <[email protected]>
Co-authored-by: ziheng <[email protected]>
Co-authored-by: Matej Staroň <[email protected]>
Co-authored-by: Jonathan Gonzalez V <[email protected]>
Co-authored-by: Jonathan Gonzalez V <[email protected]>
Co-authored-by: Martin Dojcak <[email protected]>
Co-authored-by: Meghna Prabhu <[email protected]>
Co-authored-by: Yen-Cheng Chou <[email protected]>
Co-authored-by: Joey Paskhay <[email protected]>
Co-authored-by: Yen-Cheng Chou <[email protected]>
Co-authored-by: Dominik Rosiek <[email protected]>
Co-authored-by: Drew Zhang <[email protected]>
Co-authored-by: pugkung <[email protected]>
Co-authored-by: Jeff Luo <[email protected]>
Co-authored-by: Jie WU <[email protected]>
Co-authored-by: Rayhan Hossain <[email protected]>
Co-authored-by: Masoud Koleini <[email protected]>
Co-authored-by: André <[email protected]>
Co-authored-by: Ming <[email protected]>
Co-authored-by: okkez <[email protected]>
Co-authored-by: Vincent Auclair <[email protected]>
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Mar 13, 2021
@minamijoyo
Copy link
Author

Any update on this?

@github-actions github-actions bot removed the Stale label Mar 16, 2021
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Apr 20, 2021
@minamijoyo
Copy link
Author

I tested the fluentd-request-ack option in Docker 20.10 with the latest fluent-bit v1.7.4, but some records were lost during shutdown.

[fluent-bit@master|✔]$ docker version
Client: Docker Engine - Community
 Cloud integration: 1.0.12
 Version:           20.10.5
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:13:00 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.5
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       363e9a8
  Built:            Tue Mar  2 20:15:47 2021
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
[fluent-bit@master|✔]$

Start a fluent-bit v.1.7.4 container:

[fluent-bit@master|✔]$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.7.4 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.7.4
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/04/22 01:32:24] [ info] Configuration:
[2021/04/22 01:32:24] [ info]  flush time     | 1.000000 seconds
[2021/04/22 01:32:24] [ info]  grace          | 5 seconds
[2021/04/22 01:32:24] [ info]  daemon         | 0
[2021/04/22 01:32:24] [ info] ___________
[2021/04/22 01:32:24] [ info]  inputs:
[2021/04/22 01:32:24] [ info]      forward
[2021/04/22 01:32:24] [ info] ___________
[2021/04/22 01:32:24] [ info]  filters:
[2021/04/22 01:32:24] [ info] ___________
[2021/04/22 01:32:24] [ info]  outputs:
[2021/04/22 01:32:24] [ info]      stdout.0
[2021/04/22 01:32:24] [ info] ___________
[2021/04/22 01:32:24] [ info]  collectors:
[2021/04/22 01:32:24] [ info] [engine] started (pid=1)
[2021/04/22 01:32:24] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/04/22 01:32:24] [debug] [storage] [cio stream] new stream registered: forward.0
[2021/04/22 01:32:24] [ info] [storage] version=1.1.1, initializing...
[2021/04/22 01:32:24] [ info] [storage] in-memory
[2021/04/22 01:32:24] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/04/22 01:32:24] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2021/04/22 01:32:24] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2021/04/22 01:32:24] [debug] [stdout:stdout.0] created event channels: read=19 write=20
[2021/04/22 01:32:24] [debug] [router] default match rule forward.0:stdout.0
[2021/04/22 01:32:24] [ info] [sp] stream processor started
[0] 34d4a39228e2: [1619052762.000000000, {"container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"Thu Apr 22 00", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39"}]
[1] 34d4a39228e2: [1619055147.000000000, {"container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"nc: flb (172.", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39"}]
[2] 34d4a39228e2: [1619055147.000000000, {"container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdb"}]a39", "container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"Thu Apr 22 00:53:50 UTC 2021
[3] 34d4a39228e2: [1619055147.000000000, {"container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdb"}]a39", "container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"nc: flb (172.17.0.2:24224): No route to host
[2021/04/22 01:32:28] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39", "container_name"=>"/goofy_moser", "source"=>"stdout"}]
[5] 34d4a39228e2: [1619055147.000000000, {"container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"nc: flb (172.", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39"}]
[6] 34d4a39228e2: [1619055147.000000000, {"container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdb"}]a39", "container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"Thu Apr 22 00:53:57 UTC 2021
[7] 34d4a39228e2: [1619055147.000000000, {"container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"nc: flb (172.", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39"}]
", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39", "container_name"=>"/goofy_moser"}]
[9] 34d4a39228e2: [1619055147.000000000, {"container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"nc: flb (172.", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39"}]
[10] 34d4a39228e2: [1619055147.000000000, {"partial_message"=>"true", "partial_id"=>"f39cd50d8c59d51423eb6bb2193131d1cc3cc97ad2243bf86ea98b389454fe45", "partial_ordinal"=>"1", "partial_last"=>"false", "container_id"=>"34d4a39228e22524eb464e8e6f0cce7b06e791d762bd7d1d599a5cdfdbcd8a39", "container_name"=>"/goofy_moser", "source"=>"stdout", "log"=>"^C"}]
[2021/04/22 01:32:28] [debug] [out coro] cb_destroy coro_id=0
[2021/04/22 01:32:28] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
[1] b5a1534be2b3: [1619055149.000000000, {"container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (1", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee"}]
[2021/04/22 01:32:29] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
[2021/04/22 01:32:29] [debug] [out coro] cb_destroy coro_id=1
[2021/04/22 01:32:29] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout"}]
[2021/04/22 01:32:30] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
[2021/04/22 01:32:30] [debug] [out coro] cb_destroy coro_id=2
[2021/04/22 01:32:30] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
[0] b5a1534be2b3: [1619055151.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:31 UTC 2021
[1] b5a1534be2b3: [1619055151.000000000, {"container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (1", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee"}]
[2021/04/22 01:32:31] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
[2021/04/22 01:32:31] [debug] [out coro] cb_destroy coro_id=3
[2021/04/22 01:32:31] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
[0] b5a1534be2b3: [1619055152.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:32 UTC 2021
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
[2021/04/22 01:32:32] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
[2021/04/22 01:32:32] [debug] [out coro] cb_destroy coro_id=4
[2021/04/22 01:32:32] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
[0] b5a1534be2b3: [1619055153.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:33 UTC 2021
[2021/04/22 01:32:33] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
[2021/04/22 01:32:33] [debug] [out coro] cb_destroy coro_id=5
[2021/04/22 01:32:33] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout"}]
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout"}]
[2021/04/22 01:32:34] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
[2021/04/22 01:32:34] [debug] [out coro] cb_destroy coro_id=6
[2021/04/22 01:32:34] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout"}]
[2021/04/22 01:32:35] [debug] [task] created task=0x7fd3b8c371e0 id=0 OK
[2021/04/22 01:32:35] [debug] [out coro] cb_destroy coro_id=7
[2021/04/22 01:32:35] [debug] [task] destroy task=0x7fd3b8c371e0 (task_id=0)
[2021/04/22 01:32:35] [engine] caught signal (SIGTERM)
[2021/04/22 01:32:35] [ info] [input] pausing forward.0
[2021/04/22 01:32:35] [ warn] [engine] service will stop in 5 seconds
[2021/04/22 01:32:36] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:32:37] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:32:38] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:32:39] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:32:40] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:32:40] [ info] [engine] service stopped

Start a nc container with --log-opt fluentd-request-ack=true:

[fluent-bit@master|✔]$ docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --log-opt fluentd-request-ack=true --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
Thu Apr 22 01:32:29 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:30 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:31 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:32 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:33 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:34 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:35 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:36 UTC 2021
Thu Apr 22 01:32:37 UTC 2021
Thu Apr 22 01:32:38 UTC 2021
Thu Apr 22 01:32:39 UTC 2021
Thu Apr 22 01:32:40 UTC 2021
Thu Apr 22 01:32:41 UTC 2021
Thu Apr 22 01:32:42 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:58 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:32:59 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:33:00 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:33:01 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:33:02 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:33:03 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:33:04 UTC 2021
flb (172.17.0.2:24224) open
Thu Apr 22 01:33:05 UTC 2021
Thu Apr 22 01:33:06 UTC 2021
Thu Apr 22 01:33:07 UTC 2021
Thu Apr 22 01:33:08 UTC 2021
Thu Apr 22 01:33:09 UTC 2021
Thu Apr 22 01:33:10 UTC 2021
^Cpunt!
^C

Stop the fluent-bit container:

[fluent-bit@master|✔]$ date -u; time docker stop -t 10 flb; date -u
Thu Apr 22 01:32:35 UTC 2021
flb
docker stop -t 10 flb  0.12s user 0.06s system 2% cpu 6.090 total
Thu Apr 22 01:32:41 UTC 2021

Start a new fluent-bit container:

[fluent-bit@master|✔]$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.7.4 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.7.4
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/04/22 01:32:54] [ info] Configuration:
[2021/04/22 01:32:54] [ info]  flush time     | 1.000000 seconds
[2021/04/22 01:32:54] [ info]  grace          | 5 seconds
[2021/04/22 01:32:54] [ info]  daemon         | 0
[2021/04/22 01:32:54] [ info] ___________
[2021/04/22 01:32:54] [ info]  inputs:
[2021/04/22 01:32:54] [ info]      forward
[2021/04/22 01:32:54] [ info] ___________
[2021/04/22 01:32:54] [ info]  filters:
[2021/04/22 01:32:54] [ info] ___________
[2021/04/22 01:32:54] [ info]  outputs:
[2021/04/22 01:32:54] [ info]      stdout.0
[2021/04/22 01:32:54] [ info] ___________
[2021/04/22 01:32:54] [ info]  collectors:
[2021/04/22 01:32:54] [ info] [engine] started (pid=1)
[2021/04/22 01:32:54] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/04/22 01:32:54] [debug] [storage] [cio stream] new stream registered: forward.0
[2021/04/22 01:32:54] [ info] [storage] version=1.1.1, initializing...
[2021/04/22 01:32:54] [ info] [storage] in-memory
[2021/04/22 01:32:54] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/04/22 01:32:54] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2021/04/22 01:32:54] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2021/04/22 01:32:54] [debug] [stdout:stdout.0] created event channels: read=19 write=20
[2021/04/22 01:32:54] [debug] [router] default match rule forward.0:stdout.0
[2021/04/22 01:32:54] [ info] [sp] stream processor started
[0] b5a1534be2b3: [1619055161.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:41 UTC 2021
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout"}]
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
[3] b5a1534be2b3: [1619055180.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:58 UTC 2021
[4] b5a1534be2b3: [1619055180.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
[6] b5a1534be2b3: [1619055180.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/04/22 01:33:00] [debug] [task] created task=0x7fdce62371e0 id=0 OK
[2021/04/22 01:33:00] [debug] [out coro] cb_destroy coro_id=0
[2021/04/22 01:33:00] [debug] [task] destroy task=0x7fdce62371e0 (task_id=0)
[0] b5a1534be2b3: [1619055180.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:33:00 UTC 2021
", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee", "container_name"=>"/naughty_lichterman"}]
[2021/04/22 01:33:01] [debug] [task] created task=0x7fdce62371e0 id=0 OK
[2021/04/22 01:33:01] [debug] [out coro] cb_destroy coro_id=1
[2021/04/22 01:33:01] [debug] [task] destroy task=0x7fdce62371e0 (task_id=0)
[0] b5a1534be2b3: [1619055181.000000000, {"container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Ap", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee"}]
[1] b5a1534be2b3: [1619055181.000000000, {"container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (1", "container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32208dee"}]
[2021/04/22 01:33:02] [debug] [task] created task=0x7fdce62371e0 id=0 OK
[2021/04/22 01:33:02] [debug] [out coro] cb_destroy coro_id=2
[2021/04/22 01:33:02] [debug] [task] destroy task=0x7fdce62371e0 (task_id=0)
[0] b5a1534be2b3: [1619055182.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:33:02 UTC 2021
[1] b5a1534be2b3: [1619055182.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/04/22 01:33:03] [debug] [task] created task=0x7fdce62371e0 id=0 OK
[2021/04/22 01:33:03] [debug] [out coro] cb_destroy coro_id=3
[2021/04/22 01:33:03] [debug] [task] destroy task=0x7fdce62371e0 (task_id=0)
[0] b5a1534be2b3: [1619055183.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:33:03 UTC 2021
[1] b5a1534be2b3: [1619055183.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/04/22 01:33:04] [debug] [task] created task=0x7fdce62371e0 id=0 OK
[2021/04/22 01:33:04] [debug] [out coro] cb_destroy coro_id=4
[2021/04/22 01:33:04] [debug] [task] destroy task=0x7fdce62371e0 (task_id=0)
[2021/04/22 01:33:04] [engine] caught signal (SIGTERM)
[2021/04/22 01:33:04] [ info] [input] pausing forward.0
[0] b5a1534be2b3: [1619055184.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:33:04 UTC 2021
[1] b5a1534be2b3: [1619055184.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/04/22 01:33:04] [debug] [task] created task=0x7fdce62371e0 id=0 OK
[2021/04/22 01:33:04] [ warn] [engine] service will stop in 5 seconds
[2021/04/22 01:33:04] [debug] [out coro] cb_destroy coro_id=5
[2021/04/22 01:33:04] [debug] [task] destroy task=0x7fdce62371e0 (task_id=0)
[2021/04/22 01:33:05] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:33:06] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:33:07] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:33:08] [debug] [input chunk] forward.0 is paused, cannot append records
[2021/04/22 01:33:09] [ info] [engine] service stopped

Stop the 2nd fluent-bit container:

[fluent-bit@master|✔]$ date -u; time docker stop -t 10 flb; date -u
Thu Apr 22 01:33:04 UTC 2021
flb
docker stop -t 10 flb  0.14s user 0.06s system 3% cpu 5.577 total
Thu Apr 22 01:33:10 UTC 2021

As you can see, the last record processed by the 1st fluent-bit container:

[0] b5a1534be2b3: [1619055153.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:33 UTC 2021

The first record processed by the 2nd fluent-bit container:

[0] b5a1534be2b3: [1619055161.000000000, {"container_id"=>"b5a1534be2b3b8640a330f6198910f3226019c8fc9ff8b6689c4b5bd32"}]dee", "container_name"=>"/naughty_lichterman", "source"=>"stdout", "log"=>"Thu Apr 22 01:32:41 UTC 2021

This means some records were lost during shutdown. I expected that the fluentd-request-ack option would solve this issue, but am I misunderstanding something?

@PettitWesley
Copy link
Contributor

@edsiper Any update on this? I think we have seen reports that match the last comment from @minamijoyo

@PettitWesley
Copy link
Contributor

PettitWesley commented Aug 16, 2021

@edsiper Ping- we are still seeing reports that Fluent Bit stops accepting records on the forward input when it receives a SIGTERM, which causes log loss. Customers prefer fluent bit to continue accepting records until SIGKILL. If that behavior is not your preferred default, then we request that this alternate behavior be an opt-in thing.

aws/aws-for-fluent-bit#201

@zhonghui12
Copy link
Contributor

@edsiper could we have any update on this one? We have other customers also face this issue and hope to get a fix on this one.

@hossain-rayhan
Copy link
Contributor

@edsiper Do we have any update on this? We have other customer issues similar to this one. Expecting a response from you.

@DrewZhang13
Copy link
Contributor

@edsiper Ping for update. This issue is still in risk and could causing log loss to customers.

edsiper added a commit that referenced this issue Nov 18, 2021
When Fluent Bit receives a SIGTERM, internally it was only taking
care of the server socket leaving the active connections in an
open state.

This patch changes that behavior and if the plugin is paused or
if the engine receives SIGTERM (which will trigger a pause), now
the agent will drop the active TCP connections.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper
Copy link
Member

edsiper commented Nov 18, 2021

I just pushed a fix on PR #4339

extra tests are welcome (so we can merge the change in the next release)

@edsiper
Copy link
Member

edsiper commented Nov 18, 2021

@zhonghui12 @hossain-rayhan @DrewZhang13 @PettitWesley

edsiper added a commit that referenced this issue Nov 18, 2021
When Fluent Bit receives a SIGTERM, internally it was only taking
care of the server socket leaving the active connections in an
open state.

This patch changes that behavior and if the plugin is paused or
if the engine receives SIGTERM (which will trigger a pause), now
the agent will drop the active TCP connections.

Signed-off-by: Eduardo Silva <[email protected]>
edsiper added a commit that referenced this issue Nov 18, 2021
When Fluent Bit receives a SIGTERM, internally it was only taking
care of the server socket leaving the active connections in an
open state.

This patch changes that behavior and if the plugin is paused or
if the engine receives SIGTERM (which will trigger a pause), now
the agent will drop the active TCP connections.

Signed-off-by: Eduardo Silva <[email protected]>
@PettitWesley
Copy link
Contributor

@edsiper Thanks. We tested this, and you merged it, so this issue can be closed?

@PettitWesley
Copy link
Contributor

@edsiper Oh I see, this hasn't been released yet? When is it coming? in 1.8 or 1.9?

@matthewfala
Copy link
Contributor

@PettitWesley, my bad. I said that I didn't see this in 1.8 because I was looking for a backport PR, and only saw a PR into master.

I just checked the commits to 1.8 and now see the one referenced above: ea0fba8

This looks good, and is in 1.8.10. I think this issue can be closed.

@minamijoyo
Copy link
Author

@edsiper Thank you for the fix!

I tested the latest fluent-bit v1.8.10 and confirmed that it now correctly closes port on shutdown. It's a great improvement. Thanks!

However, I can still see that some records were lost during shutdown even if with the fluentd-request-ack=true option enabled. I'm not sure where the root cause is, anyway I'll share the log.

$ docker version
Client:
 Cloud integration: 1.0.17
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:20 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:31 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Start a fluent-bit v.1.8.10 container:

$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.8.10 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.8.10
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/09 01:09:03] [ info] Configuration:
[2021/12/09 01:09:03] [ info]  flush time     | 1.000000 seconds
[2021/12/09 01:09:03] [ info]  grace          | 5 seconds
[2021/12/09 01:09:03] [ info]  daemon         | 0
[2021/12/09 01:09:03] [ info] ___________
[2021/12/09 01:09:03] [ info]  inputs:
[2021/12/09 01:09:03] [ info]      forward
[2021/12/09 01:09:03] [ info] ___________
[2021/12/09 01:09:03] [ info]  filters:
[2021/12/09 01:09:03] [ info] ___________
[2021/12/09 01:09:03] [ info]  outputs:
[2021/12/09 01:09:03] [ info]      stdout.0
[2021/12/09 01:09:03] [ info] ___________
[2021/12/09 01:09:03] [ info]  collectors:
[2021/12/09 01:09:03] [ info] [engine] started (pid=1)
[2021/12/09 01:09:03] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/12/09 01:09:03] [debug] [storage] [cio stream] new stream registered: forward.0
[2021/12/09 01:09:03] [ info] [storage] version=1.1.5, initializing...
[2021/12/09 01:09:03] [ info] [storage] in-memory
[2021/12/09 01:09:03] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/09 01:09:03] [ info] [cmetrics] version=0.2.2
[2021/12/09 01:09:03] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2021/12/09 01:09:03] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2021/12/09 01:09:03] [debug] [stdout:stdout.0] created event channels: read=20 write=21
[2021/12/09 01:09:03] [debug] [router] default match rule forward.0:stdout.0
[2021/12/09 01:09:03] [ info] [sp] stream processor started
[0] 89b1d0e1e2a5: [1639012158.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:18] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[1] 89b1d0e1e2a5: [1639012158.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:18] [debug] [out coro] cb_destroy coro_id=0
[2021/12/09 01:09:18] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012159.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia", "source"=>"stdout"}]
[2021/12/09 01:09:19] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:19] [debug] [out coro] cb_destroy coro_id=1
[2021/12/09 01:09:19] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[1] 89b1d0e1e2a5: [1639012160.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:20] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:20] [debug] [out coro] cb_destroy coro_id=2
[2021/12/09 01:09:20] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012161.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[1] 89b1d0e1e2a5: [1639012161.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/12/09 01:09:21] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:21] [debug] [out coro] cb_destroy coro_id=3
[2021/12/09 01:09:21] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012162.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:22 UTC 2021
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[2021/12/09 01:09:22] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:22] [debug] [out coro] cb_destroy coro_id=4
[2021/12/09 01:09:22] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
[2021/12/09 01:09:23] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[0] 89b1d0e1e2a5: [1639012163.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:23 UTC 2021
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[2021/12/09 01:09:23] [debug] [out coro] cb_destroy coro_id=5
[2021/12/09 01:09:23] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia", "source"=>"stdout"}]
[1] 89b1d0e1e2a5: [1639012164.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/12/09 01:09:24] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:24] [debug] [out coro] cb_destroy coro_id=6
[2021/12/09 01:09:24] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012165.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:25 UTC 2021
[1] 89b1d0e1e2a5: [1639012165.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/12/09 01:09:25] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:25] [debug] [out coro] cb_destroy coro_id=7
[2021/12/09 01:09:25] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[1] 89b1d0e1e2a5: [1639012166.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:26] [debug] [task] created task=0x7fc4e92371e0 id=0 OK
[2021/12/09 01:09:26] [debug] [out coro] cb_destroy coro_id=8
[2021/12/09 01:09:26] [debug] [task] destroy task=0x7fc4e92371e0 (task_id=0)
[2021/12/09 01:09:26] [engine] caught signal (SIGTERM)
[2021/12/09 01:09:26] [ info] [input] pausing forward.0
[2021/12/09 01:09:26] [ warn] [engine] service will stop in 5 seconds
[2021/12/09 01:09:31] [ info] [engine] service stopped

Start a nc container with --log-opt fluentd-request-ack=true:

$ docker run -it --rm --log-driver=fluentd --log-opt fluentd-address=127.0.0.1:24224 --log-opt fluentd-request-ack=true --link flb busybox sh -c 'while true; do date; nc -vz flb 24224; sleep 1; done'
Thu Dec  9 01:09:18 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:19 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:20 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:21 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:22 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:23 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:24 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:25 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:26 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:27 UTC 2021
Thu Dec  9 01:09:28 UTC 2021
Thu Dec  9 01:09:29 UTC 2021
Thu Dec  9 01:09:30 UTC 2021
Thu Dec  9 01:09:31 UTC 2021
Thu Dec  9 01:09:32 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:48 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:49 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:50 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:51 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:52 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:53 UTC 2021
flb (172.17.0.2:24224) open
Thu Dec  9 01:09:54 UTC 2021
Thu Dec  9 01:09:55 UTC 2021
Thu Dec  9 01:09:56 UTC 2021
Thu Dec  9 01:09:57 UTC 2021
Thu Dec  9 01:09:58 UTC 2021

Stop the fluent-bit container:

$ date -u; time docker stop -t 10 flb; date -u
Thu Dec  9 01:09:26 UTC 2021
flb
docker stop -t 10 flb  0.13s user 0.06s system 3% cpu 5.365 total
Thu Dec  9 01:09:31 UTC 2021

Start a new fluent-bit container:

$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.8.10 /fluent-bit/bin/fluent-bit -v -i forward -o stdout -f 1
Fluent Bit v1.8.10
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/09 01:09:42] [ info] Configuration:
[2021/12/09 01:09:42] [ info]  flush time     | 1.000000 seconds
[2021/12/09 01:09:42] [ info]  grace          | 5 seconds
[2021/12/09 01:09:42] [ info]  daemon         | 0
[2021/12/09 01:09:42] [ info] ___________
[2021/12/09 01:09:42] [ info]  inputs:
[2021/12/09 01:09:42] [ info]      forward
[2021/12/09 01:09:42] [ info] ___________
[2021/12/09 01:09:42] [ info]  filters:
[2021/12/09 01:09:42] [ info] ___________
[2021/12/09 01:09:42] [ info]  outputs:
[2021/12/09 01:09:42] [ info]      stdout.0
[2021/12/09 01:09:42] [ info] ___________
[2021/12/09 01:09:42] [ info]  collectors:
[2021/12/09 01:09:42] [ info] [engine] started (pid=1)
[2021/12/09 01:09:42] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/12/09 01:09:42] [debug] [storage] [cio stream] new stream registered: forward.0
[2021/12/09 01:09:42] [ info] [storage] version=1.1.5, initializing...
[2021/12/09 01:09:42] [ info] [storage] in-memory
[2021/12/09 01:09:42] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/09 01:09:42] [ info] [cmetrics] version=0.2.2
[2021/12/09 01:09:42] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2021/12/09 01:09:42] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2021/12/09 01:09:42] [debug] [stdout:stdout.0] created event channels: read=20 write=21
[2021/12/09 01:09:42] [debug] [router] default match rule forward.0:stdout.0
[2021/12/09 01:09:42] [ info] [sp] stream processor started
[0] 89b1d0e1e2a5: [1639012167.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:27 UTC 2021
[1] 89b1d0e1e2a5: [1639012186.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:28 UTC 2021
[2] 89b1d0e1e2a5: [1639012186.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:29 UTC 2021
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[4] 89b1d0e1e2a5: [1639012186.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[2021/12/09 01:09:46] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
[2021/12/09 01:09:46] [debug] [out coro] cb_destroy coro_id=0
[2021/12/09 01:09:46] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012187.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:48] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
[2021/12/09 01:09:48] [debug] [out coro] cb_destroy coro_id=1
[2021/12/09 01:09:48] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[1] 89b1d0e1e2a5: [1639012188.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:49] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
[2021/12/09 01:09:49] [debug] [out coro] cb_destroy coro_id=2
[2021/12/09 01:09:49] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012189.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia", "source"=>"stdout"}]
[2021/12/09 01:09:50] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
[2021/12/09 01:09:50] [debug] [out coro] cb_destroy coro_id=3
[2021/12/09 01:09:50] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia", "source"=>"stdout"}]
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia"}]
[2021/12/09 01:09:51] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
[2021/12/09 01:09:51] [debug] [out coro] cb_destroy coro_id=4
[2021/12/09 01:09:51] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
[2021/12/09 01:09:52] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
[0] 89b1d0e1e2a5: [1639012191.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[1] 89b1d0e1e2a5: [1639012191.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[2021/12/09 01:09:52] [debug] [out coro] cb_destroy coro_id=5
[2021/12/09 01:09:52] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
[0] 89b1d0e1e2a5: [1639012192.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[1] 89b1d0e1e2a5: [1639012192.000000000, {"container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (17", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4"}]
[0] 318f617f28df: [1639011606.000000000, {"container_name"=>"/frosty_easley", "source"=>"stdout", "log"=>"Thu Dec  9 01", "container_id"=>"318f617f28dfcdb77262f0192306bfedc1c32ebbccc61d34544d4890c68ff45b"}]
[1] 318f617f28df: [1639012193.000000000, {"container_name"=>"/frosty_easley", "source"=>"stdout", "log"=>"Thu Dec  9 01", "container_id"=>"318f617f28dfcdb77262f0192306bfedc1c32ebbccc61d34544d4890c68ff45b"}]
[2] 318f617f28df: [1639012193.000000000, {"container_name"=>"/frosty_easley", "source"=>"stdout", "log"=>"Thu Dec  9 01", "container_id"=>"318f617f28dfcdb77262f0192306bfedc1c32ebbccc61d34544d4890c68ff45b"}]
[3] 318f617f28df: [1639012193.000000000, {"container_id"=>"318f617f28dfcdb77262f0192306bfedc1c32ebbccc61d34544d4890c68f"}]b", "container_name"=>"/frosty_easley", "source"=>"stdout", "log"=>"Thu Dec  9 01:00:09 UTC 2021
", "container_id"=>"318f617f28dfcdb77262f0192306bfedc1c32ebbccc61d34544d4890c68ff45b", "container_name"=>"/frosty_easley"}]
[2021/12/09 01:09:53] [debug] [task] created task=0x7f2d59e371e0 id=0 OK
", "container_id"=>"318f617f28dfcdb77262f0192306bfedc1c32ebbccc61d34544d4890c68ff45b", "container_name"=>"/frosty_easley"}]
[2021/12/09 01:09:53] [debug] [task] created task=0x7f2d59e37280 id=1 OK
[2021/12/09 01:09:53] [debug] [out coro] cb_destroy coro_id=6
[2021/12/09 01:09:53] [debug] [out coro] cb_destroy coro_id=7
[2021/12/09 01:09:53] [debug] [task] destroy task=0x7f2d59e371e0 (task_id=0)
[2021/12/09 01:09:53] [debug] [task] destroy task=0x7f2d59e37280 (task_id=1)
[2021/12/09 01:09:54] [engine] caught signal (SIGTERM)
[2021/12/09 01:09:54] [ info] [input] pausing forward.0
", "container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a33809ff4", "container_name"=>"/interesting_hypatia", "source"=>"stdout"}]
[1] 89b1d0e1e2a5: [1639012193.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open
[2021/12/09 01:09:54] [debug] [task] created task=0x7f2d59e37280 id=0 OK
[2021/12/09 01:09:54] [ warn] [engine] service will stop in 5 seconds
[2021/12/09 01:09:54] [debug] [out coro] cb_destroy coro_id=8
[2021/12/09 01:09:54] [debug] [task] destroy task=0x7f2d59e37280 (task_id=0)
[2021/12/09 01:09:58] [ info] [engine] service stopped

As you can see, the last record processed by the 1st fluent-bit container:

[0] 89b1d0e1e2a5: [1639012165.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:25 UTC 2021
[1] 89b1d0e1e2a5: [1639012165.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"flb (172.17.0.2:24224) open

The first record processed by the 2nd fluent-bit container:

[0] 89b1d0e1e2a5: [1639012167.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:27 UTC 2021
[1] 89b1d0e1e2a5: [1639012186.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:28 UTC 2021
[2] 89b1d0e1e2a5: [1639012186.000000000, {"container_id"=>"89b1d0e1e2a51d04091551e4c9b3a3ab0cd765f52780623d2ba6bc0a3380"}]4", "container_name"=>"/interesting_hypatia", "source"=>"stdout", "log"=>"Thu Dec  9 01:09:29 UTC 2021

This means two records were lost during shutdown.

Thu Dec  9 01:09:26 UTC 2021
flb (172.17.0.2:24224) open

If you think this is another issue, please let me so. I'll open another issue.
Thanks again!

@edsiper
Copy link
Member

edsiper commented Dec 9, 2021

can you try out building a container from GIT master ?, we did other improvements when shutting down and also we added more verbose messages.

@minamijoyo
Copy link
Author

minamijoyo commented Dec 9, 2021

@edsiper Thank you for your reply. I built and tested the latest master.

[fluent-bit@master|✔]$ git rev-parse HEAD
3197e975143b3982d5ce214996e3595f6f12bab3

[fluent-bit@master|✔]$ docker build -t fluent-bit:master_debug -f ./dockerfiles/Dockerfile.x86_64-master_debug .

It looks immediately shutdown when it caught SIGTERM, but I can still see that some records were lost as the same as v1.8.10.

There may be a problem on the side of docker's log driver retry mechanism, how can we identify what is wrong?, or is it possible to prove innocence of the fluent-bit?

@edsiper
Copy link
Member

edsiper commented Dec 9, 2021

hmm you might try enabling debug mode in Fluent Bit side log_level debug . You can try also testing without docker, just vanilla Fluent Bit binary

@minamijoyo
Copy link
Author

@edsiper Thanks. I'll make a time to try other environments.

For anyone who interested in this topic, please help debugging and report your result what condition was success or fail.

@minamijoyo
Copy link
Author

I tested without docker fluentd log driver. In this case, I've still use docker as a runtime to reproduce the case easily for anyone, but sent records by fluent-bit output forward plugin.

$ docker version
Client:
 Cloud integration: v1.0.22
 Version:           20.10.11
 API version:       1.41
 Go version:        go1.16.10
 Git commit:        dea9396
 Built:             Thu Nov 18 00:36:09 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.11
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.9
  Git commit:       847da18
  Built:            Thu Nov 18 00:35:39 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Start a fluent-bit container as a receiver:

$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.8.11 /fluent-bit/bin/fluent-bit -i forward -o stdout -f 1
Fluent Bit v1.8.11
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/28 09:09:24] [ info] [engine] started (pid=1)
[2021/12/28 09:09:24] [ info] [storage] version=1.1.5, initializing...
[2021/12/28 09:09:24] [ info] [storage] in-memory
[2021/12/28 09:09:24] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/28 09:09:24] [ info] [cmetrics] version=0.2.2
[2021/12/28 09:09:24] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2021/12/28 09:09:24] [ info] [sp] stream processor started
[0] dummy.log: [1640682576.399975274, {"message"=>"dummy"}]
[0] dummy.log: [1640682577.402610815, {"message"=>"dummy"}]
[0] dummy.log: [1640682578.401480238, {"message"=>"dummy"}]
[0] dummy.log: [1640682579.401216837, {"message"=>"dummy"}]
[0] dummy.log: [1640682580.400747546, {"message"=>"dummy"}]
[0] dummy.log: [1640682581.404175655, {"message"=>"dummy"}]
[0] dummy.log: [1640682582.400892291, {"message"=>"dummy"}]
[0] dummy.log: [1640682583.401939930, {"message"=>"dummy"}]
[2021/12/28 09:09:46] [engine] caught signal (SIGTERM)
[2021/12/28 09:09:46] [ info] [input] pausing forward.0
[0] dummy.log: [1640682584.400569917, {"message"=>"dummy"}]
[2021/12/28 09:09:46] [ warn] [engine] service will shutdown in max 5 seconds
[2021/12/28 09:09:46] [ info] [engine] service has stopped (0 pending tasks)

Start another fluent-bit container as a sender with the following configuration:

$ cat dummy.conf
[SERVICE]
    Flush      1
    Log_Level  debug

[INPUT]
    Name   dummy
    Tag    dummy.log

[OUTPUT]
    Name                 forward
    Match                *
    Host                 flb
    Port                 24224
    Require_ack_response On
    Retry_Limit          10

$ docker run -it --rm --name dummy --link flb -v $(pwd):/tmp fluent/fluent-bit:1.8.11 /fluent-bit/bin/fluent-bit -c /tmp/dummy.conf
Fluent Bit v1.8.11
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/28 09:09:36] [ info] Configuration:
[2021/12/28 09:09:36] [ info]  flush time     | 1.000000 seconds
[2021/12/28 09:09:36] [ info]  grace          | 5 seconds
[2021/12/28 09:09:36] [ info]  daemon         | 0
[2021/12/28 09:09:36] [ info] ___________
[2021/12/28 09:09:36] [ info]  inputs:
[2021/12/28 09:09:36] [ info]      dummy
[2021/12/28 09:09:36] [ info] ___________
[2021/12/28 09:09:36] [ info]  filters:
[2021/12/28 09:09:36] [ info] ___________
[2021/12/28 09:09:36] [ info]  outputs:
[2021/12/28 09:09:36] [ info]      forward.0
[2021/12/28 09:09:36] [ info] ___________
[2021/12/28 09:09:36] [ info]  collectors:
[2021/12/28 09:09:36] [ info] [engine] started (pid=1)
[2021/12/28 09:09:36] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2021/12/28 09:09:36] [debug] [storage] [cio stream] new stream registered: dummy.0
[2021/12/28 09:09:36] [ info] [storage] version=1.1.5, initializing...
[2021/12/28 09:09:36] [ info] [storage] in-memory
[2021/12/28 09:09:36] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/28 09:09:36] [ info] [cmetrics] version=0.2.2
[2021/12/28 09:09:36] [debug] [forward:forward.0] created event channels: read=19 write=20
[2021/12/28 09:09:36] [debug] [router] match rule dummy.0:forward.0
[2021/12/28 09:09:36] [ info] [sp] stream processor started
[2021/12/28 09:09:37] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:37] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:37] [debug] [output:forward:forward.0] send options records=1 chunk='f88d408fbfa0e6e2d261b5295a4636da'
[2021/12/28 09:09:37] [debug] [output:forward:forward.0] protocol: received ACK f88d408fbfa0e6e2d261b5295a4636da
[2021/12/28 09:09:37] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:37] [debug] [out coro] cb_destroy coro_id=0
[2021/12/28 09:09:37] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:38] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:38] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:38] [debug] [output:forward:forward.0] send options records=1 chunk='c69dd65f8ea2125401baf32deffea629'
[2021/12/28 09:09:38] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:38] [debug] [output:forward:forward.0] protocol: received ACK c69dd65f8ea2125401baf32deffea629
[2021/12/28 09:09:38] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:38] [debug] [out coro] cb_destroy coro_id=1
[2021/12/28 09:09:38] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:39] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:39] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:39] [debug] [output:forward:forward.0] send options records=1 chunk='9da31558114036db3e783261db92cd6d'
[2021/12/28 09:09:39] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:39] [debug] [output:forward:forward.0] protocol: received ACK 9da31558114036db3e783261db92cd6d
[2021/12/28 09:09:39] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:39] [debug] [out coro] cb_destroy coro_id=2
[2021/12/28 09:09:39] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:40] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:40] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:40] [debug] [output:forward:forward.0] send options records=1 chunk='c6a3503f1eb7699eecf7856891a63f16'
[2021/12/28 09:09:40] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:40] [debug] [output:forward:forward.0] protocol: received ACK c6a3503f1eb7699eecf7856891a63f16
[2021/12/28 09:09:40] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:40] [debug] [out coro] cb_destroy coro_id=3
[2021/12/28 09:09:40] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:41] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:41] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:41] [debug] [output:forward:forward.0] send options records=1 chunk='07c9a7f0a71c3b58c3e787834ffe3977'
[2021/12/28 09:09:41] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:41] [debug] [output:forward:forward.0] protocol: received ACK 07c9a7f0a71c3b58c3e787834ffe3977
[2021/12/28 09:09:41] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:41] [debug] [out coro] cb_destroy coro_id=4
[2021/12/28 09:09:41] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:42] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:42] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:42] [debug] [output:forward:forward.0] send options records=1 chunk='419fa2a553615aa81c9e15fe2b663bb4'
[2021/12/28 09:09:42] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:42] [debug] [output:forward:forward.0] protocol: received ACK 419fa2a553615aa81c9e15fe2b663bb4
[2021/12/28 09:09:42] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:42] [debug] [out coro] cb_destroy coro_id=5
[2021/12/28 09:09:42] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:43] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:43] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:43] [debug] [output:forward:forward.0] send options records=1 chunk='34b00e6477e0e114338f0297bdeaeedd'
[2021/12/28 09:09:43] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:43] [debug] [output:forward:forward.0] protocol: received ACK 34b00e6477e0e114338f0297bdeaeedd
[2021/12/28 09:09:43] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:43] [debug] [out coro] cb_destroy coro_id=6
[2021/12/28 09:09:43] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:44] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:44] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:44] [debug] [output:forward:forward.0] send options records=1 chunk='3f2a19ad4304a35e16011836fbca3b6a'
[2021/12/28 09:09:44] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:44] [debug] [output:forward:forward.0] protocol: received ACK 3f2a19ad4304a35e16011836fbca3b6a
[2021/12/28 09:09:44] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:44] [debug] [out coro] cb_destroy coro_id=7
[2021/12/28 09:09:44] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:45] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:45] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:45] [debug] [output:forward:forward.0] send options records=1 chunk='841874f4941cfb85465872100747cc22'
[2021/12/28 09:09:45] [debug] [upstream] KA connection #25 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:45] [debug] [output:forward:forward.0] protocol: received ACK 841874f4941cfb85465872100747cc22
[2021/12/28 09:09:45] [debug] [upstream] KA connection #25 to flb:24224 is now available
[2021/12/28 09:09:45] [debug] [out coro] cb_destroy coro_id=8
[2021/12/28 09:09:45] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:46] [debug] [upstream] KA connection #25 to flb:24224 has been disconnected by the remote service
[2021/12/28 09:09:46] [debug] [task] created task=0x7fb6714371e0 id=0 OK
[2021/12/28 09:09:46] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:46] [debug] [output:forward:forward.0] send options records=1 chunk='12f264440a4f044473c5df1d455b34ef'
[2021/12/28 09:09:46] [debug] [net] socket #25 could not connect to 172.17.0.2:24224
[2021/12/28 09:09:46] [debug] [net] could not connect to flb:24224
[2021/12/28 09:09:46] [debug] [upstream] connection #-1 failed to flb:24224
[2021/12/28 09:09:46] [error] [output:forward:forward.0] no upstream connections available
[2021/12/28 09:09:46] [debug] [out coro] cb_destroy coro_id=9
[2021/12/28 09:09:46] [debug] [retry] new retry created for task_id=0 attempts=1
[2021/12/28 09:09:46] [ warn] [engine] failed to flush chunk '1-1640682585.401107928.flb', retry in 9 seconds: task_id=0, input=dummy.0 > output=forward.0 (out_id=0)
[2021/12/28 09:09:47] [debug] [task] created task=0x7fb671437280 id=1 OK
[2021/12/28 09:09:47] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:47] [debug] [output:forward:forward.0] send options records=1 chunk='a86c0d00ddf9278fa953e6e8b1055ee1'
[2021/12/28 09:09:48] [debug] [task] created task=0x7fb671437320 id=2 OK
[2021/12/28 09:09:48] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:48] [debug] [output:forward:forward.0] send options records=1 chunk='a1c04b4a832f7214704e82deb11987f7'
[2021/12/28 09:09:49] [debug] [task] created task=0x7fb6714373c0 id=3 OK
[2021/12/28 09:09:49] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:49] [debug] [output:forward:forward.0] send options records=1 chunk='d88b798dacbae53d5ddfab747175aad3'
[2021/12/28 09:09:50] [debug] [task] created task=0x7fb671437460 id=4 OK
[2021/12/28 09:09:50] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:50] [debug] [output:forward:forward.0] send options records=1 chunk='16b595a3fdef131db7228654c4745ed6'
[2021/12/28 09:09:51] [debug] [task] created task=0x7fb671437500 id=5 OK
[2021/12/28 09:09:51] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:51] [debug] [output:forward:forward.0] send options records=1 chunk='f6bcae198ea9b56e2526dab8dc5b3fdd'
[2021/12/28 09:09:52] [debug] [task] created task=0x7fb6714375a0 id=6 OK
[2021/12/28 09:09:52] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:52] [debug] [output:forward:forward.0] send options records=1 chunk='8c9328e3f731a6d709516bedfc4e3274'
[2021/12/28 09:09:52] [debug] [output:forward:forward.0] protocol: received ACK 8c9328e3f731a6d709516bedfc4e3274
[2021/12/28 09:09:52] [debug] [upstream] KA connection #31 to flb:24224 is now available
[2021/12/28 09:09:52] [debug] [out coro] cb_destroy coro_id=15
[2021/12/28 09:09:52] [debug] [task] destroy task=0x7fb6714375a0 (task_id=6)
[2021/12/28 09:09:52] [debug] [output:forward:forward.0] protocol: received ACK f6bcae198ea9b56e2526dab8dc5b3fdd
[2021/12/28 09:09:52] [debug] [upstream] KA connection #30 to flb:24224 is now available
[2021/12/28 09:09:52] [debug] [out coro] cb_destroy coro_id=14
[2021/12/28 09:09:52] [debug] [task] destroy task=0x7fb671437500 (task_id=5)
[2021/12/28 09:09:52] [debug] [output:forward:forward.0] protocol: received ACK d88b798dacbae53d5ddfab747175aad3
[2021/12/28 09:09:52] [debug] [upstream] KA connection #28 to flb:24224 is now available
[2021/12/28 09:09:52] [debug] [out coro] cb_destroy coro_id=12
[2021/12/28 09:09:52] [debug] [task] destroy task=0x7fb6714373c0 (task_id=3)
[2021/12/28 09:09:53] [debug] [task] created task=0x7fb6714373c0 id=3 OK
[2021/12/28 09:09:53] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:53] [debug] [output:forward:forward.0] send options records=1 chunk='8573a6bbcf2cdd94b22d8d570ed83597'
[2021/12/28 09:09:53] [debug] [upstream] KA connection #31 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:53] [debug] [output:forward:forward.0] protocol: received ACK 8573a6bbcf2cdd94b22d8d570ed83597
[2021/12/28 09:09:53] [debug] [upstream] KA connection #31 to flb:24224 is now available
[2021/12/28 09:09:53] [debug] [out coro] cb_destroy coro_id=16
[2021/12/28 09:09:53] [debug] [task] destroy task=0x7fb6714373c0 (task_id=3)
[2021/12/28 09:09:53] [debug] [output:forward:forward.0] protocol: received ACK 16b595a3fdef131db7228654c4745ed6
[2021/12/28 09:09:53] [debug] [upstream] KA connection #29 to flb:24224 is now available
[2021/12/28 09:09:53] [debug] [out coro] cb_destroy coro_id=13
[2021/12/28 09:09:53] [debug] [task] destroy task=0x7fb671437460 (task_id=4)
[2021/12/28 09:09:54] [debug] [task] created task=0x7fb671437460 id=3 OK
[2021/12/28 09:09:54] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:54] [debug] [output:forward:forward.0] send options records=1 chunk='37623b46d1a40e5fafdf6fa00ef3d6dc'
[2021/12/28 09:09:54] [debug] [upstream] KA connection #30 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:54] [debug] [output:forward:forward.0] protocol: received ACK 37623b46d1a40e5fafdf6fa00ef3d6dc
[2021/12/28 09:09:54] [debug] [upstream] KA connection #30 to flb:24224 is now available
[2021/12/28 09:09:54] [debug] [out coro] cb_destroy coro_id=17
[2021/12/28 09:09:54] [debug] [task] destroy task=0x7fb671437460 (task_id=3)
[2021/12/28 09:09:54] [debug] [output:forward:forward.0] protocol: received ACK a86c0d00ddf9278fa953e6e8b1055ee1
[2021/12/28 09:09:54] [debug] [upstream] KA connection #26 to flb:24224 is now available
[2021/12/28 09:09:54] [debug] [out coro] cb_destroy coro_id=10
[2021/12/28 09:09:54] [debug] [task] destroy task=0x7fb671437280 (task_id=1)
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] send options records=1 chunk='12f264440a4f044473c5df1d455b34ef'
[2021/12/28 09:09:55] [debug] [upstream] KA connection #28 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:55] [debug] [task] created task=0x7fb671437280 id=1 OK
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] request 26 bytes to flush
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] send options records=1 chunk='c441254df2bb1f7659ec8a2822905c69'
[2021/12/28 09:09:55] [debug] [upstream] KA connection #31 to flb:24224 has been assigned (recycled)
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] protocol: received ACK 12f264440a4f044473c5df1d455b34ef
[2021/12/28 09:09:55] [debug] [upstream] KA connection #28 to flb:24224 is now available
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] protocol: received ACK c441254df2bb1f7659ec8a2822905c69
[2021/12/28 09:09:55] [debug] [upstream] KA connection #31 to flb:24224 is now available
[2021/12/28 09:09:55] [debug] [out coro] cb_destroy coro_id=18
[2021/12/28 09:09:55] [debug] [out coro] cb_destroy coro_id=19
[2021/12/28 09:09:55] [debug] [task] destroy task=0x7fb6714371e0 (task_id=0)
[2021/12/28 09:09:55] [debug] [task] destroy task=0x7fb671437280 (task_id=1)
[2021/12/28 09:09:55] [debug] [output:forward:forward.0] protocol: received ACK a1c04b4a832f7214704e82deb11987f7
[2021/12/28 09:09:55] [debug] [upstream] KA connection #27 to flb:24224 is now available
[2021/12/28 09:09:55] [debug] [out coro] cb_destroy coro_id=11
[2021/12/28 09:09:55] [debug] [task] destroy task=0x7fb671437320 (task_id=2)

Stop the receiver's fluent-bit container:

$ date -u; time docker stop -t 10 flb; date -u
Tue Dec 28 09:09:45 UTC 2021
flb
docker stop -t 10 flb  0.13s user 0.07s system 32% cpu 0.631 total
Tue Dec 28 09:09:46 UTC 2021

$ date -u -r 1640682585
Tue Dec 28 09:09:45 UTC 2021

Start a new receiver's fluent-bit container:

$ docker run -it --rm --name flb -p 24224:24224 fluent/fluent-bit:1.8.11 /fluent-bit/bin/fluent-bit -i forward -o stdout -f 1
Fluent Bit v1.8.11
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2021/12/28 09:09:52] [ info] [engine] started (pid=1)
[2021/12/28 09:09:52] [ info] [storage] version=1.1.5, initializing...
[2021/12/28 09:09:52] [ info] [storage] in-memory
[2021/12/28 09:09:52] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/12/28 09:09:52] [ info] [cmetrics] version=0.2.2
[2021/12/28 09:09:52] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
[2021/12/28 09:09:52] [ info] [sp] stream processor started
[0] dummy.log: [1640682591.403176775, {"message"=>"dummy"}]
[1] dummy.log: [1640682590.401135928, {"message"=>"dummy"}]
[2] dummy.log: [1640682588.401809275, {"message"=>"dummy"}]
[0] dummy.log: [1640682592.400777895, {"message"=>"dummy"}]
[1] dummy.log: [1640682589.401097982, {"message"=>"dummy"}]
[0] dummy.log: [1640682593.402738501, {"message"=>"dummy"}]
[1] dummy.log: [1640682586.401733685, {"message"=>"dummy"}]
[0] dummy.log: [1640682585.401103938, {"message"=>"dummy"}]
[1] dummy.log: [1640682594.400630681, {"message"=>"dummy"}]
[2] dummy.log: [1640682587.400374160, {"message"=>"dummy"}]
[0] dummy.log: [1640682595.404689874, {"message"=>"dummy"}]
[0] dummy.log: [1640682596.403182806, {"message"=>"dummy"}]
[0] dummy.log: [1640682597.400721050, {"message"=>"dummy"}]
[0] dummy.log: [1640682598.401246479, {"message"=>"dummy"}]

As you can see, the last record processed by the 1st receiver's fluent-bit container:

[0] dummy.log: [1640682584.400569917, {"message"=>"dummy"}]

The first record processed by the 2nd receiver's fluent-bit container:

[0] dummy.log: [1640682591.403176775, {"message"=>"dummy"}]

In this case, you can see that the records between 1640682585 and 1640682590 are also received. There were no lost records.

The result when the receiver is restarted depends on the retry logic of the sender, that is, it may be a problem of docker fluentd log driver.

0Delta pushed a commit to 0Delta/fluent-bit that referenced this issue Jan 20, 2022
When Fluent Bit receives a SIGTERM, internally it was only taking
care of the server socket leaving the active connections in an
open state.

This patch changes that behavior and if the plugin is paused or
if the engine receives SIGTERM (which will trigger a pause), now
the agent will drop the active TCP connections.

Signed-off-by: Eduardo Silva <[email protected]>
@edsiper
Copy link
Member

edsiper commented Mar 23, 2023

This was fixed in 31d26a2

@minamijoyo
Copy link
Author

Hi @edsiper, Thank you for the fix!

Although I can still see some records being lost on shutdown even with the current latest Docker 23.0.1 + fluent-bit v2.0.10 with --log-opt fluentd-request-ack=true, I think there is another problem on the side of Docker's fluentd log driver. I found a related issue on the moby project. moby/moby#43690 (comment)

I linked this to the above moby's issue and I'm going to close this one. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants