Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch-config should error if the input file is not valid JSON #404

Closed
ecerulm opened this issue Mar 14, 2022 · 8 comments
Closed

fetch-config should error if the input file is not valid JSON #404

ecerulm opened this issue Mar 14, 2022 · 8 comments

Comments

@ecerulm
Copy link
Contributor

ecerulm commented Mar 14, 2022

Currently the following command:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/var/cwagent-config.json

will overwrite /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml even if the input JSON file is not valid JSON.

It will print out No json config files found, use the default one

2022/03/14 10:10:41 unable to scan config dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d with error: unable to parse json, error: invalid character '}' after array element
No json config files found, use the default one

But the exit value is still 0 (indicating no error). My proposal is that if the input file is invalid it should error (non zero exit code) and refuse to overwrite the TOML file.. Otherwise it's very easy to miss that the configuration was incorrectly updated.

Full output:

****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:/opt/aws/amazon-cloudwatch-agent/var/cwagent-config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_cwagent-config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2022/03/14 10:07:14 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_cwagent-config.json.tmp ...
2022/03/14 10:07:14 unable to scan config dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d with error: unable to parse json, error: invalid character '}' after array element
No json config files found, use the default one
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
amazon-cloudwatch-agent has already been stopped
@SaxyPandaBear
Copy link
Contributor

If you run

sudo /opt/aws/amazon-cloudwatch-agent/bin/config-translator -input /opt/aws/amazon-cloudwatch-agent/var/cwagent-config.json -output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
echo $?

What does it output for the exit code?


I'm curious if the config-translator executable returns a non-zero exit code when the input JSON is invalid. Just trying to see how much needs to get untangled for this. I already saw that in the control script, it does something like:

${command} || return

which I think kicks out of the function if there's a non-zero exit code there.

@SaxyPandaBear
Copy link
Contributor

Actually just confirmed it for myself:

3c22fbe51e1b:~ andhuy$ sudo /opt/aws/amazon-cloudwatch-agent/bin/config-translator -input /Users/andhuy/Desktop/foo.json -output /Users/andhuy/Desktop/something.toml
Password:
2022/03/15 14:07:24 Reading json config file path: /Users/andhuy/Desktop/foo.json ...
2022/03/15 14:07:24 E! Failed to generate merged json config: unable to get old json config file with error: unable to parse json, error: invalid character 'N' looking for beginning of object key string
2022/03/15 14:07:24 Configuration validation first phase failed. Agent version: 1.0. Verify the JSON input is only using features supported by this version.
3c22fbe51e1b:~ andhuy$ echo $?
1

I think that the change would be straightforward, however I think that the existing fail-through behavior complicates things for existing users because it exposes any invalid configuration that they are currently using by failing the agent potentially - although if their configuration is invalid, then they wouldn't see any of the metrics/logs that they want and they'd ideally fix that themselves anyway

@ecerulm
Copy link
Contributor Author

ecerulm commented Mar 15, 2022

amazon-cloudwatch-agent-ctl -a fetch-config calls config-translator with --multi-config default and when that is present config-translator exits with 0 . On the other hand, if that option is not present the exit code is 99:

Here is the same config-translator command with and without --multi-config default, in both cases there is a file at /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_invalid.json :

sudo /opt/aws/amazon-cloudwatch-agent/bin/config-translator -input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json  -input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d -output /home/vagrant/amazon-cloudwatch-agent.toml  --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --mode onPrem --multi-config default ;echo $?
No json config files found, use the default one
Valid Json input schema.
I! Detecting run_as_user...
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region:  eu-west-1
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
0

sudo /opt/aws/amazon-cloudwatch-agent/bin/config-translator -input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json  -input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d -output /home/vagrant/amazon-cloudwatch-agent.toml  --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --mode onPrem ;echo $?
No json config files found, use the default one
Valid Json input schema.
I! Detecting run_as_user...
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region:  eu-west-1
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
99

@SaxyPandaBear
Copy link
Contributor

Ahh I see. I missed that config - and also didn't think it would have such a big change to the exit code or behavior.

@ecerulm
Copy link
Contributor Author

ecerulm commented Mar 15, 2022

I think that the existing fail-through behavior complicates things for existing users because it exposes any invalid configuration that they are currently using by failing the agent potentially

I think at least the amazon-cloudwatch-agent-ctl -a fetch-config should produce an extra ERRORs detected in configuration as the last line of output. It's really easy to edit the config file , introduce a syntax error and go unnoticed (at least that's what happened to me, and that's why I wrote this issue).

I understand that "fixing" this issue with a non-zero exit code may be a non backward compatible change for those with invalid configs (hopefully those would be very few), but it will help most people to get correct configurations working. Another alternative would be to add some option to amazon-cloudwatch-agent-ctl like --validate or --strict that actually fails on errors.

@SaxyPandaBear
Copy link
Contributor

I think that adding an optional flag to denote that the control script should fail on invalid configurations, as well as including a more prominent error on failure to translate are a good compromise

@ecerulm
Copy link
Contributor Author

ecerulm commented May 18, 2022

@SaxyPandaBear , @khanhntd . It's my understanding from #412 that the plan to fix this issue with a better log statements rather than an actual error, so I guess this should be closed too as "WON'T DO", or?

@SaxyPandaBear
Copy link
Contributor

@ecerulm yeah will be closing out this issue. I think I misjudged just how tightly coupled everything is

@SaxyPandaBear SaxyPandaBear closed this as not planned Won't fix, can't repro, duplicate, stale May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants