Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent handling of invalid user data #1346

Closed
bcressey opened this issue Feb 23, 2021 · 6 comments · Fixed by #1387
Closed

inconsistent handling of invalid user data #1346

bcressey opened this issue Feb 23, 2021 · 6 comments · Fixed by #1387
Labels
type/bug Something isn't working
Milestone

Comments

@bcressey
Copy link
Contributor

Image I'm using:
aws-k8s-1.18 v1.05

What I expected to happen:
I expected the same failure mode for "invalid TOML" as for "unrecognized settings" from early-boot-config - in both cases, the desired settings are not usable in the provided form, so the system should respond in the same way.

What actually happened:
Invalid TOML triggers a cascading series of service startup failures:

   11.842376] early-boot-config[2778]: Provider error: Error parsing TOML user data: expected a table key, found a right bracket at line 1 column 1
[FAILED] Failed to start Bottlerocket userdata configuration system.
See 'systemctl status early-boot-config.service' for details.
[DEPEND] Dependency failed for Bottlerocket �nd dynamic configuration complete.
[DEPEND] Dependency failed for A versatile i�tion of the Network Time Protocol.
[FAILED] Failed to start containerd container runtime.
See 'systemctl status containerd.service' for details.
[DEPEND] Dependency failed for Kubelet.
[DEPEND] Dependency failed for Call signpost�ter all required targets are met..
[FAILED] Failed to start containerd runtime for host containers.
See 'systemctl status host-containerd.service' for details.
         Starting User-specified setting generators...
[   11.932268] sundog[2853]: Setting generator 'schnauzer settings.host-containers.control.source' failed with exit code 1 - stderr: Failed to render setting 'settings.host-containers.control.source' from template '{{ ecr-prefix settings.aws.region }}/bottlerocket-control:v0.4.1': Error rendering "Unnamed template" line 1, col 1: TemplateHelperError: Expected an AWS region, got 'null' in template dynamic template
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
         Starting Applies settings to create config files...
[   12.001676] thar-be-settings[2914]: Restart command failed - '/usr/bin/host-containers': 18:52:08 [ERROR] Failed to handle host container 'admin': Host containers 'admin' missing field 'source'
[   12.004636] thar-be-settings[2914]: 18:52:08 [ERROR] Failed to handle host container 'control': Host containers 'control' missing field 'source'
[FAILED] Failed to start Applies settings to create config files.

An unrecognized setting triggers a more compact set of failures that allows the default host containers to start:

[  OK  ] Stopped containerd runtime for host containers.
[   34.815484] early-boot-config[2854]: Error PATCHing '/settings?tx=bottlerocket-launch': Status 400 when PATCHing /settings?tx=bottlerocket-launch: Json deserialize error: unknown field `standalone-mode`, expected one of `cluster-name`, `cluster-certificate`, `api-server`, `node-labels`, `node-taints`, `max-pods`, `cluster-dns-ip`, `cluster-domain`, `node-ip`, `pod-infra-container-image` at line 1 column 162
[FAILED] Failed to start Bottlerocket userdata configuration system.
See 'systemctl status early-boot-config.service' for details.
[DEPEND] Dependency failed for Bottlerocket �nd dynamic configuration complete.
         Starting containerd container runtime...
         Starting containerd runtime for host containers...
[  OK  ] Finished Applies settings to create config files.
[  OK  ] Started containerd runtime for host containers.
         Starting Host container: control...
[  OK  ] Started Host container: control.
[  OK  ] Started containerd container runtime.
[  OK  ] Reached target Multi-User System.
[  OK  ] Reached target Graphical Interface.

The system is easier to troubleshoot in this state since the admin container can be enabled to examine logs in more detail.

How to reproduce the problem:
Launch an instance with invalid TOML in the userdata.

@bcressey bcressey added type/bug Something isn't working priority/p1 labels Feb 23, 2021
@tjkirch
Copy link
Contributor

tjkirch commented Feb 23, 2021

Should dependencies still try to start with default values? What if your user data was trying to set custom host containers, because you don't want to run the default AWS-based ones? I don't think it's safe to assume. To debug, you're going to be looking at journal output, which you can see on the console in any case, without having to access the node. In the error you pasted it's pretty clear what happened, I think; "unknown field foo" or "error parsing TOML" with some more details. If the real problem is that it's hard / unclear how to check console output, we can improve / document that...

@bcressey bcressey changed the title invalid user data prevents host containers from running inconsistent handling of invalid user data Feb 24, 2021
@bcressey
Copy link
Contributor Author

I've updated the title with what I consider the real issue, which is that the behavior is inconsistent. The failure mode should be the same.

@bcressey
Copy link
Contributor Author

Should dependencies still try to start with default values? What if your user data was trying to set custom host containers, because you don't want to run the default AWS-based ones? I don't think it's safe to assume.

I agree. It's definitely surprising that the default host containers are used when custom host containers are specified along with an invalid setting. It's not a huge leap from "surprising" to a serious misconfiguration with security or operational implications.

The "invalid TOML" path is what we intend.

@zmrow
Copy link
Contributor

zmrow commented Mar 1, 2021

I think there is a misunderstanding.

For AWS instances, early-boot-config fetches the IID (for region) and user-data from IMDS, and if fetching and parsing both is successful, sends them to the API.

@bcressey In this specific case, settings.aws.region doesn't get set because early-boot-config bombs out early because of invalid TOML. The system attempts to start the host containers but can't because the host container "source" template includes settings.aws.region. In this case, your system was hosed regardless. A few lines down (pardon the formatting):

...sundog[2853]...Failed to render setting 'settings.host-containers.control.source' 
from template '{{ ecr-prefix settings.aws.region}}'

AWS is special currently because it has multiple "sources" for data (even though they're both IMDS). Perhaps we should support PATCH-ing data to the API immediately after we fetch it successfully? In this case, your host containers would have started successfully, but the system still would have been crippled.

@bcressey
Copy link
Contributor Author

bcressey commented Mar 2, 2021

I expect the same behavior from these two cases. I'm aware that the behavior is different, but that's not a misunderstanding, it's a bug.

@gregdek gregdek added this to the techdebt milestone Apr 1, 2021
@arnaldo2792 arnaldo2792 linked a pull request Apr 7, 2021 that will close this issue
@arnaldo2792
Copy link
Contributor

We fixed this in #1387

@jhaynes jhaynes modified the milestones: techdebt, 1.0.8 Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants