Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrelated data injected into s3 key via $TAG[] #5938

Closed
dshackith opened this issue Aug 25, 2022 · 10 comments
Closed

Unrelated data injected into s3 key via $TAG[] #5938

dshackith opened this issue Aug 25, 2022 · 10 comments

Comments

@dshackith
Copy link

dshackith commented Aug 25, 2022

Bug Report

Describe the bug
When populating a tag from kubernetes metadata, and then using the parts of that tag to construct the s3_key_format, I am getting errors from s3, and the path it is attempting to use contains additional characters in the s3 key. This presents in a couple different ways and error messages. In particular this seems to only occur with the record["kubernetes"]["container_name"] key (immediately following this value), but not for every record. I am currently using a Lua script to populate keys and then using those keys in rewrite_tag filter. I have also used the the keys directly via record accessor with the same results, leading me to believe the problem is some sort of overflow in the s3 plugin.

The most typical error:

[2022/08/25 19:25:49] [error] [output:s3:s3.0] Raw PutObject response: HTTP/1.1 403 Forbidden
...
/kafka-logs/k8s/testing/test/proving-ground/fluentd/2022/08/25/fluentd-0-fluentd%C3%82%C2%A0%C3%83%C2%97%40%C3%83%C2%B5%C3%82%C2%B3%7F_192432_4tL5gSpL.log

Another example that was not a failure that contains kubernetes metadata for the fluent-bit pod that is emitting the message even though the log is for fluentd pod running on the same node:

[2022/08/25 17:42:57] [ info] [output:s3:s3.0] Successfully uploaded object /k8s/testing/test/proving-ground/fluentd-1-fluentd0474f9c3ac5b75573d79c1eb28072c1266b","namespace_name":"sequoia-system","annotations":{"kubernetes_174224_UqvxP8H4.log

And another that seems to contain part of a URL:

[2022/08/25 19:37:26] [ info] [output:s3:s3.0] Successfully uploaded object /k8s/testing/test/proving-ground/fluentd/2022/08/25/fluentd-0-fluentdhttp://crl_193646_2o23tLJx.log

And another from a pod that is not fluentd that includes 'uent-bit` after the container_name:

[2022/08/25 19:30:55] [ info] [output:s3:s3.0] Successfully uploaded object /k8s/testing/test/proving-ground/efs-csi-node/2022/08/25/efs-csi-node-kvmdt-liveness-probeuent-bit_192853_PnS7no80.log

This entry has the chunk path included in the s3 key:

[2022/08/26 14:39:06] [ info] [output:s3:s3.0] Successfully uploaded object /k8s/testing/test/proving-ground/prometheus-adaptor/2022/08/26/prometheus-adapter-74744b94c6-qjhtt/prometheus-adapter/tmp/fluent-bit/s3/kafka-logs/2022-08-26T14:09:05/11972352166397090799-6237271235102498668_143609_qT3Nw5Y4.log

Expected behavior
I expect the documented process for using tag parts to produce a reliable and consistent path based on the keys populated into the tag.
I do not expect random data to end up in the s3 key.

Your Environment

  • Version used: 1.9.4 & 1.9.7
  • Configuration: Running in Kubernetes 1.21 as a daemonset with the following config:
[INPUT]
        Name tail
        Path /var/log/containers/*.log
        Parser docker
        Tag kube.*
        Refresh_Interval 5
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On
        Docker_Mode On
[FILTER]
        Name kubernetes
        Match kube.*
        K8S-Logging.Parser On
        K8S-Logging.Exclude On
        Buffer_Size 1024K
        Use_Kubelet true
        Kubelet_Port 10250
[FILTER]
        name lua
        alias set_std_keys
        match kube.*
        script filters.lua
        call set_std_keys 
[FILTER]
        name rewrite_tag
        match kube.*
        rule $log ^.*$ s3.${ENVIRONMENT}.${EKS_CLUSTER_NAME}.$app_name.$pod_container_name true
        emitter_name tag_for_s3
[OUTPUT]
        name s3
        match s3.*
        region us-east-1
        bucket org-logs
        total_file_size 10M
        upload_timeout 2m
        use_put_object On
        content_type application/json
        compression gzip
        preserve_data_ordering On
        s3_key_format /k8s/testing/$TAG[1]/$TAG[2]/$TAG[3]/%Y/%m/%d/$TAG[4]_%H%M%S_$UUID.log

filters.lua:
function set_std_keys(tag, timestamp, record)
        -- Pull up pod_name
        if record["kubernetes"]["pod_name"] then
          record["pod_name"] = record["kubernetes"]["pod_name"]
        end
        -- Pull up container_name
        if record["kubernetes"]["container_name"] then
          record["container_name"] = record["kubernetes"]["container_name"]
        end
        -- Set pod_container_name
        if record["pod_name"] then
          record["pod_container_name"] = record["pod_name"] .. "-" .. record["container_name"]
        end
        -- Set app_name based on potential labels, and fall back to pod_name
        if record["kubernetes"]["labels"]["app"] then
          app = record["kubernetes"]["labels"]["app"]
        elseif record["kubernetes"]["labels"]["k8s-app"] then
          app = record["kubernetes"]["labels"]["k8s-app"]
        else
          app = record["kubernetes"]["pod_name"]
        end
        record["app_name"] = app
        return 2, timestamp, record
      end
  • Environment name and version: Kubernetes 1.21

Additional context
I have attempted to do some string manipulation on the container_name key, but it makes no difference in the problem, leading me to believe there is something in the s3 plugin that is "leaking".

The larger context for this work is moving log processing into fluent-bit from existing Fluentd aggregator. This is the very first addition, and it is shaking my confidence in the maturity of fluent-bit because this seems like one of the most simple use cases. We have several other operations in the queue for transition but we would like to see this working well before moving on to those.

@dshackith
Copy link
Author

It looks like this might be specific to using preserve_data_ordering, as this problem goes away when that is set to Off

@nallenscott
Copy link

Can confirm this is happening in the aws-for-fluent-bit distro as well. As @dshackith pointed out, disabling preserve_data_ordering seems to "fix" the problem.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Nov 26, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2022

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2022
@cameronattard
Copy link

Just wasted hours troubleshooting this. I think this should be reopened.

@PettitWesley
Copy link
Contributor

@dshackith Can you please open an issue on the AWS repo for this. https://github.com/aws/aws-for-fluent-bit

The preserve_data_ordering feature doesn't impact the S3 code path IIRC, so the behavior shouldn't change due to that.

@dshackith
Copy link
Author

@PettitWesley My understanding is that the s3 functionality is in the core fluent-bit codebase, and not specific to the aws-for-fluent-bit container, or the plugins bundled there. I am not using aws-for-fluent-bit image.

@dshackith
Copy link
Author

I note that there are multiple places in the out_s3 plugin code that interact with preserve_data_ordering

@PettitWesley
Copy link
Contributor

@dshackith having an issue in the AWS distro just makes it easier for me and my team to track and put priority on it. AWS distro is just a packaging of this same upstream repo code.

@dshackith
Copy link
Author

This comment indicates that this might have been tracked down and resolved: aws/aws-for-fluent-bit#541 (comment)
Posting here to make sure anyone coming across this finds that thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants