Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] pass mode to resolve amazon-cloudwatch-agent-ctl restart panic #1168

Merged
merged 2 commits into from
May 9, 2024

Conversation

lisguo
Copy link
Contributor

@lisguo lisguo commented May 8, 2024

Description of the issue

Some customers who do not have a toml config in the config directory are seeing a panic when running /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a cond-restart

[ec2-user@ip-172-31-22-51 ~]$ sudo rm /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
[ec2-user@ip-172-31-22-51 ~]$ sudo touch /opt/aws/amazon-cloudwatch-agent/etc/restart
[ec2-user@ip-172-31-22-51 ~]$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a cond-restart
amazon-cloudwatch-agent is not configured. Applying amazon-cloudwatch-agent default configuration.
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default.tmp
Start configuration validation...
2024/05/08 17:43:55 Invalid mode --config. Valid mode values are ec2, onPrem, onPremise, and withIRSA.
panic: Invalid mode --config. Valid mode values are ec2, onPrem, onPremise, and withIRSA.

goroutine 1 [running]:
log.Panicf({0x40fbc0e?, 0x8?}, {0xc000bbfc60?, 0xc000bbfc70?, 0xc000b52ba0?})
        log/log.go:439 +0x65
github.com/aws/amazon-cloudwatch-agent/translator/context.(*Context).SetMode(0x7ffff69bd79d?, {0x7ffff69bd79d?, 0xc?})
        github.com/aws/amazon-cloudwatch-agent/translator/context/context.go:136 +0x271
main.initFlags()
        github.com/aws/amazon-cloudwatch-agent/cmd/config-translator/translator.go:66 +0x945
main.main()
        github.com/aws/amazon-cloudwatch-agent/cmd/config-translator/translator.go:80 +0x31

Description of changes

Add ${mode} as a second command arg to agent_start()

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Ran the command again and saw the command succeeded:

[ec2-user@ip-172-31-22-51 ~]$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a cond-restart

amazon-cloudwatch-agent is not configured. Applying amazon-cloudwatch-agent default configuration.
I! Trying to detect region from ec2 D! [EC2] Found active network interface I! imds retry client will retry 1 timesSuccessfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default.tmp
Start configuration validation...
2024/05/08 17:46:06 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default.tmp ...
2024/05/08 17:46:06 I! Valid Json input schema.
2024/05/08 17:46:06 D! ec2tagger processor required because append_dimensions is set
2024/05/08 17:46:06 D! pipeline hostDeltaMetrics has no receivers
2024/05/08 17:46:06 Configuration validation first phase succeeded
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
I! imds retry client will retry 1 times
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@lisguo lisguo requested a review from sethAmazon May 8, 2024 19:19
@lisguo lisguo requested a review from a team as a code owner May 8, 2024 19:19
@sethAmazon
Copy link
Contributor

@lisguo
Copy link
Contributor Author

lisguo commented May 8, 2024

Would it be better to add an empty check here? https://github.com/aws/amazon-cloudwatch-agent/blob/main/translator/util/sdkutil.go#L36-L38

Not sure if that would solve the problem. The issue is with the panic located in SetMode:

log.Panicf("Invalid mode %s. Valid mode values are %s, %s, %s, and %s.", mode, config.ModeEC2, config.ModeOnPrem, config.ModeOnPremise, config.ModeWithIRSA)

But either way, it appears that the ctl code is error prone since the agent_start function assumes that there is a second argument, and I don't see any call to agent_start without it except in this case.

@sethAmazon
Copy link
Contributor

That part of the code runs after the choice check. Do you mind we get on a slack call on this?

@lisguo lisguo merged commit 7c9e0c2 into main May 9, 2024
6 checks passed
@lisguo lisguo deleted the cond-restart branch May 9, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants