Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log group name key #49

Merged
merged 5 commits into from
Jun 30, 2020
Merged

Conversation

kwizzn
Copy link

@kwizzn kwizzn commented Jun 9, 2020

Issue #46

Add support for dynamic log group names via a log_group_name_key option.
This picks up the work from @owlwalks and replaces PR #24.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sonofachamp
Copy link
Contributor

sonofachamp commented Jun 10, 2020

Thank you for this contribution!

I did some manual testing with your code changes and am running into an issue.

Here's the verbose output:

→ ./bin/fluent-bit -e ../../../go/amazon-cloudwatch-logs-for-fluent-bit/bin/cloudwatch.so -i forward -o cloudwatch -p "region=us-west-2" -p "log_stream_name=testing" -p "auto_create_group=true" -p "log_group_name_key=container_id" -v
Fluent Bit v1.5.0
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2020/06/09 17:08:50] [ info] Configuration:
[2020/06/09 17:08:50] [ info]  flush time     | 5.000000 seconds
[2020/06/09 17:08:50] [ info]  grace          | 5 seconds
[2020/06/09 17:08:50] [ info]  daemon         | 0
[2020/06/09 17:08:50] [ info] ___________
[2020/06/09 17:08:50] [ info]  inputs:
[2020/06/09 17:08:50] [ info]      forward
[2020/06/09 17:08:50] [ info] ___________
[2020/06/09 17:08:50] [ info]  filters:
[2020/06/09 17:08:50] [ info] ___________
[2020/06/09 17:08:50] [ info]  outputs:
[2020/06/09 17:08:50] [ info]      cloudwatch.0
[2020/06/09 17:08:50] [ info] ___________
[2020/06/09 17:08:50] [ info]  collectors:
[2020/06/09 17:08:50] [debug] [storage] [cio stream] new stream registered: forward.0
[2020/06/09 17:08:50] [ info] [storage] version=1.0.4, initializing...
[2020/06/09 17:08:50] [ info] [storage] in-memory
[2020/06/09 17:08:50] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/06/09 17:08:50] [ info] [engine] started (pid=6574)
[2020/06/09 17:08:50] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2020/06/09 17:08:50] [debug] [in_fw] Listen='0.0.0.0' TCP_Port=24224
[2020/06/09 17:08:50] [ info] [input:forward:forward.0] listening on 0.0.0.0:24224
INFO[0000] [cloudwatch 0] plugin parameter log_group = '' 
INFO[0000] [cloudwatch 0] plugin parameter log_group_name_key = 'container_id' 
INFO[0000] [cloudwatch 0] plugin parameter log_stream_prefix = '' 
INFO[0000] [cloudwatch 0] plugin parameter log_stream = 'testing' 
INFO[0000] [cloudwatch 0] plugin parameter region = 'us-west-2' 
INFO[0000] [cloudwatch 0] plugin parameter log_key = '' 
INFO[0000] [cloudwatch 0] plugin parameter role_arn = '' 
INFO[0000] [cloudwatch 0] plugin parameter auto_create_group = 'true' 
INFO[0000] [cloudwatch 0] plugin parameter endpoint = '' 
INFO[0000] [cloudwatch 0] plugin parameter credentials_endpoint =  
INFO[0000] [cloudwatch 0] plugin parameter log_format = '' 
[2020/06/09 17:08:50] [debug] [router] default match rule forward.0:cloudwatch.0
[2020/06/09 17:08:50] [ info] [sp] stream processor started
[2020/06/09 17:09:00] [debug] [task] created task=0x5612c53ce140 id=0 OK
INFO[0009] [cloudwatch 0] Created log group ff432030af5dd64fb39872ca8e3ecb4d09c315ef3d81471e1d2c4e136f487c3b 
[2020/06/09 17:09:00] [debug] [task] destroy task=0x5612c53ce140 (task_id=0)
[2020/06/09 17:09:05] [debug] [task] created task=0x5612c53d0ab0 id=0 OK
[2020/06/09 17:09:05] [debug] [task] destroy task=0x5612c53d0ab0 (task_id=0)
[2020/06/09 17:09:15] [debug] [task] created task=0x5612c53d0280 id=0 OK
ResourceNotFoundException: The specified log group does not exist.
	status code: 400, request id: 07c68462-9415-4b44-8d43-a41618203f8d
[2020/06/09 17:09:15] [debug] [retry] new retry created for task_id=0 attemps=1
[2020/06/09 17:09:15] [ warn] [engine] failed to flush chunk '6574-1591747753.133464367.flb', retry in 9 seconds: task_id=0, input=forward.0 > output=cloudwatch.0
^C[engine] caught signal (SIGINT)
[2020/06/09 17:09:17] [debug] [task] destroy task=0x5612c53d0280 (task_id=0)
[2020/06/09 17:09:17] [debug] [retry] task retry=0x5612c53c19b0, invalidated from the scheduler
[2020/06/09 17:09:17] [debug] [GO] running exit callback

I may be doing something wrong, but I'll expand on what I'm doing to produce this result.

I have a dummy app that spits out log messages at a specific interval. If I run Fluent Bit with the configuration outlined in the invocation above everything works great. The log group gets created from the container_id and logs are shipped, no problem.

If I leave Fluent Bit running, but modify my dummy app and restart it (key here is a new container_id), then the plugin is failing to create the log group for the new container_id. I'm not immediately sure why this is as it looks like the logic should handle creation of the new log stream, but it seems like something that would need to be worked out to support dynamic log_group names. If I'm missing something please let me know. I'll continue digging to try and pinpoint what's going on exactly within the plugin.

@kwizzn
Copy link
Author

kwizzn commented Jun 10, 2020

Weird, I just tried to reproduce what you described but it works for me. Whatever log comes in after fluent-bit starts creates its log group properly and starts shipping. I use a quite common config:

    [PARSER]
      Name        docker
      Format      json
      Time_Key    time
      Time_Format %Y-%m-%dT%H:%M:%S.%L
      Time_Keep   On

    [INPUT]
      Name              tail
      Tag               kube.*
      Path              /var/log/containers/*.log
      Parser            docker
      DB                /var/log/flb_kube.db
      Mem_Buf_Limit     512MB
      Skip_Long_Lines   On
      Refresh_Interval  10

    [FILTER]
      Name                kubernetes
      Match               kube.*
      Kube_URL            https://kubernetes.default.svc:443
      Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
      Merge_Log           On
      Merge_Log_Key       log_processed
      K8S-Logging.Parser  On
      K8S-Logging.Exclude Off

    [FILTER]
      Name lua
      Match *
      script kubernetes.lua
      call append_log_group_name

    [OUTPUT]
      Name cloudwatch
      Match *
      region eu-west-1
      auto_create_group true
      log_group_name_key log_group_name
      log_stream_prefix /
      log_key log

The lua function adds log_group_name:

    function append_log_group_name(tag, timestamp, record)
      record["log_group_name"] = "/kube/" .. record["kubernetes"]["namespace_name"] .. "/" .. record["kubernetes"]["container_name"]
      return 1, timestamp, record
    end

@sonofachamp
Copy link
Contributor

I found what's causing the issue I ran into. I specified -p "log_stream_name=testing" which seems to be matching records from another group that has the same stream name, so when it tries pushing records the log group doesn't exist.

func (output *OutputPlugin) getLogStream(tag, groupName string) (*logStream, error) {
	// find log stream by tag
	name := output.getStreamName(tag)
	stream, ok := output.streams[name]
	if ok {
		return stream, nil
	}

https://github.com/kwizzn/amazon-cloudwatch-logs-for-fluent-bit/blob/3dc6b6598e566429d10c172173a9a2ce42d907d8/cloudwatch/cloudwatch.go#L275-L281

Removing the log_stream_name and adding log_stream_prefix works great. Essentially the log_group_name_key and log_stream_name configuration options don't play nicely together. This seems like something we should handle.

@kwizzn
Copy link
Author

kwizzn commented Jun 10, 2020

Would you agree that this is an edge case that can be addressed after this is merged? If this doesn't break the status quo, let's be pragmatic and merge it so that it can already bring value.
My fear is that given the limited time I can dedicate, this PR will diverge again and is unlikely to land anytime soon. What do you think?

@sonofachamp
Copy link
Contributor

sonofachamp commented Jun 11, 2020

My fear is that given the limited time I can dedicate, this PR will diverge again and is unlikely to land anytime soon. What do you think?

We understand and really appreciate your contribution. Would you be open to merging your changes into an upstream branch that isn't master, so that we can pick it up from here and resolve the log_group_key_name and log_stream_name configuration conflict? We don't want to merge this into the master branch and do a release until this is resolved.

I did a bit more digging and I think the simplest approach might be to store and retrieve the logStreams in the map with a key value of groupName + streamName to avoid duplicate streamName values causing conflicts.

@kwizzn
Copy link
Author

kwizzn commented Jun 12, 2020

I am totally open to merging this to another upstream branch. Just let me know where and I'll update the PR.

@sonofachamp
Copy link
Contributor

Awesome, I created a log_group_name_key branch on this repository.

@kwizzn kwizzn changed the base branch from master to log_group_name_key June 12, 2020 18:20
@kwizzn
Copy link
Author

kwizzn commented Jun 12, 2020

Cool, done.

@kwizzn
Copy link
Author

kwizzn commented Jun 29, 2020

@hossain-rayhan, @hencrice, can we move forward or is there anything that I can do to unblock this?

@hencrice
Copy link
Contributor

Sorry wasn't aware I was assigned on this until now. Will take a look tomorrow

if ok {
return stream, nil
}
// extra check, empty groupName indicates flush phase, but still land here meaning unable to find stream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like this is no longer the case, could you remove this comment?

@hencrice hencrice merged commit 8cf837f into aws:log_group_name_key Jun 30, 2020
@kwizzn kwizzn deleted the log_group_name_key branch July 2, 2020 05:15
@davidnewhall
Copy link
Contributor

Looks like this was merged into #24. Should #24 be re-opened and considered now? Does it need more work?

@kwizzn
Copy link
Author

kwizzn commented Aug 17, 2020

I am also confused how this could end up on a dead track. The implementation is working (we use it in production) and imho ready for a merge into master. What's the blocker?

@PettitWesley
Copy link
Contributor

@davidnewhall @kwizzn The comments that @sonofachamp raised need to be fixed before it can be merged to master. Someone on our team will eventually take this up, but it's not our highest priority item, so possibly not for a little while.

@PettitWesley
Copy link
Contributor

PettitWesley commented Aug 17, 2020

Before this change there was only 1 log group ever to track, now their can be multiple. Which means the plugin needs to store a map of log group name => log stream names. That needs to be done before it can be release-ready IMO.

@davidnewhall
Copy link
Contributor

Thanks for the explanation. I'll try to dig around in this branch this week. Maybe I can figure it out, but I don't want to give any promises. I think the approach for this changes slightly after #78, and may even be easier. We shall see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants