Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudwatch_logs connection initialization error in versions 1.7 and 1.8 (1.6.10 works fine) #3966

Closed
KLForsythe opened this issue Aug 17, 2021 · 8 comments
Labels

Comments

@KLForsythe
Copy link

Bug Report

Describe the bug

I have configured my output to send to cloudwatch_logs, however, I am getting the following errors:
[aws_client] connection initialization error
[output:cloudwatch_logs:cloudwatch_logs.0] Failed to create log group

I'd appreciate any help in figuring out what the problem is.

To Reproduce

Expected behavior

The log output should be successfully directed to a newly created log group on Cloudwatch fluent-bit-cloudwatch

Your Environment

  • Version used: 1.8.3
  • Configuration:

fluent-bit.conf

[SERVICE]
    # Flush
    # =====
    # Set an interval of seconds before to flush records to a destination
    Flush        5

    # Daemon
    # ======
    # Instruct Fluent Bit to run in foreground or background mode.
    Daemon       Off

    Log_Level    info

    # Parsers_File
    # ============
    # Specify an optional 'Parsers' configuration file
    Parsers_File parsers.conf
    Plugins_File plugins.conf

    # HTTP Server
    # ===========
    # Enable/Disable the built-in HTTP Server for metrics
    HTTP_Server  Off
    HTTP_Listen  0.0.0.0
    HTTP_Port    2020

[INPUT]
    name    tail
    path    /config/home-assistant.log
    db      test.db
    skip_long_lines on

[OUTPUT]
    Name cloudwatch_logs
    Match   *
    region us-east-1
    log_group_name fluent-bit-cloudwatch
    log_stream_prefix from-fluent-bit-
    log_retention_days 7
    auto_create_group true
  • Environment name and version (e.g. Kubernetes? What version?): docker with balenaCloud
    I am using docker-compose as my installation method (unrelated services not shown here) (docker-compose version is required by balenaCloud):
version: '2.1'
volumes:
  config:
  store:
services:
  logger:
    build: ./logger
    restart: on-failure
    volumes:
      - 'config:/config'
      - "store:/usr/src/app/store"

Dockerfile

FROM fluent/fluent-bit:1.8.3
COPY . /fluent-bit/etc/.
  • Server type and version:
  • Operating System and version: balenaOS 2.80.3+rev1
  • Filters and plugins: cloudwatch_logs

Additional context

I'd like to move to a more robust logging solution than I currently have, using fluent-bit to send logs to AWS Cloudwatch (or alternatively S3. I also tested the S3 plugin, with a similar message aws_client] connection initialization error).

Log details:

17.08.21 10:42:36 (-0400)  logger  Fluent Bit v1.8.3
17.08.21 10:42:36 (-0400)  logger  * Copyright (C) 2019-2021 The Fluent Bit Authors
17.08.21 10:42:36 (-0400)  logger  * Copyright (C) 2015-2018 Treasure Data
17.08.21 10:42:36 (-0400)  logger  * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
17.08.21 10:42:36 (-0400)  logger  * https://fluentbit.io
17.08.21 10:42:36 (-0400)  logger  
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [engine] started (pid=1)
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [storage] version=1.1.1, initializing...
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [storage] in-memory
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [cmetrics] version=0.1.6
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [sp] stream processor started
17.08.21 10:42:36 (-0400)  logger  [2021/08/17 14:42:36] [ info] [input:tail:tail.0] inotify_fs_add(): inode=5248 watch_fd=1 name=/config/home-assistant.log
17.08.21 10:42:41 (-0400)  logger  [2021/08/17 14:42:41] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group fluent-bit-cloudwatch
17.08.21 10:42:41 (-0400)  logger  [2021/08/17 14:42:41] [error] [aws_client] connection initialization error
17.08.21 10:42:41 (-0400)  logger  [2021/08/17 14:42:41] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Failed to create log group
17.08.21 10:42:41 (-0400)  logger  [2021/08/17 14:42:41] [ warn] [engine] failed to flush chunk '1-1629211356.799374817.flb', retry in 7 seconds: task_id=0, input=tail.0 > output=cloudwatch_logs.0 (out_id=0)
@KLForsythe
Copy link
Author

Update: Based on this issue #2895 I tried version 1.6.3 and version 1.6.10, and found that, with those 2 versions,fluent-bit was able to connect to AWS.

Sending logs to cloudwatch still caused errors, but:
logger [2021/08/17 20:38:47] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Could not find sequence token in response: {"rejectedLogEventsInfo":{"tooOldLogEventEndIndex":1}}

Sending data to S3 succeeded, however, and I can see that the logs are in the following format in S3:

{"date":"2021-08-17T20:45:39.077267Z","log":"2021-08-17 20:45:39 DEBUG (Thread-3) [paho.mqtt.client] Sending PINGREQ"}

My current assumption is that I need to adjust the date format (probably to epoch) for cloudwatch to realize the logs are current. If I recall correctly, there was good documentation on how to do that, so I don't expect it to be an issue.

However, I would like to be able to use the most recent version of fluent-bit. I tested version 1.7, and the connection failure reappeared in that version.

@KLForsythe KLForsythe changed the title cloudwatch_logs connection initialization error cloudwatch_logs connection initialization error in versions 1.7 and 1.8 (1.6.10 works fine) Aug 18, 2021
@KLForsythe
Copy link
Author

Update:

On further investigation, the {"rejectedLogEventsInfo":{"tooOldLogEventEndIndex":1}} error response from AWS Cloudwatch
is related to #3640 - and a fix exists, but only on later versions.

So, using an older version of fluent-bit will not address the issue, in order to send logs successfully to Cloudwatch, a fix for the connection initialization error.

However, it is interesting that @nbertram (who reported and fixed the issue in #3640) did not encounter the same connection initialization error - we are both using balenaOS.

I'm continuing to troubleshoot this issue, but any ideas or help would be very much appreciated.

@nbertram
Copy link
Contributor

You're probably using a 64 bit build from docker hub, while I'm using a custom 32 bit build because I'm targeting the raspberrypi3 (non-64) target. That may well have something to do with it. Though of course 64 bit should've been working prior to my patch, and my patch probably didn't change the behaviour on 64, only made 32 behave the same.

Have you tried tracing the traffic to Cloudwatch to see what's happening? I used mitmproxy to capture the data and see that the timestamps being sent were different from what stdout got. I imagine mitmproxy might be a little more difficult to use with the docker fluent-bit, because you'd need to put a custom CA into the image.

@KLForsythe
Copy link
Author

@nbertram I am also targeting raspberrpi (3B+), which is a 32-bit build. The docker builds provided by fluent/fluent-bit include a build specifically for arm32v7, which is among those I've tested and still encountered the "connection initialization error".

Before building your own custom 32 bit build, did you try the fluent-bit docker builds for arm32v7? It sounds like a custom build for the rpi may be the needed solution.

@nbertram
Copy link
Contributor

@KLForsythe we're using the "essentials" deployment type, so I couldn't run a sidecar fluent-bit container. The off-the-shelf binary builds for Raspbian didn't run for me on the Balena Debian image (just bombed with a bus error, probably minorly different shared libs), so I was forced to do a build (which I did off master) and copy the binary in.

I'll have a look at the upstream docker build though, to see if it's much different from my build. I can at least try the minimum test from my original issue to see if it works on that build.

There's currently no unit tests around message formatting in the cloudwatch driver, which makes it a bit hard to see if anything has regressed on a particular build. When I ran the test suite on my ARMv7 build a lot of the tests didn't pass already.

@KLForsythe
Copy link
Author

@nbertram Thanks for taking a look. I'm working with balena's microservices deployment, so separate containers are definitely the preferred solution.

I have not tried building from master and copying the binary in as you did. I may try that. At the moment, I'm using the old version of fluent-bit (1.6.10) which can connect to AWS and uploading to S3 instead of cloudwatch (since it doesn't have your fix). This is obviously not ideal.

I'll see if building from the latest branch and copying the binary works. Definitely a preferable solution.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 24, 2021
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants