Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sc2: fix containerd toml population #120

Merged
merged 29 commits into from
Dec 19, 2024
Merged

Conversation

csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Dec 10, 2024

It looks like the CoCo operator does modifications to the containerd config file (in /etc/containerd/config.toml) after all runtime classes have been created.

This creates a race condition on the config file, as we only wait for the runtime classes to be created, and then proceed to install our local container registry, which in turn also modifies the config file.

This bug was very hard to find, and involved running at the same time: inotifywait -m -r /etc/containerd/ and the deployment script with a --debug flag. Also, for some reason it only appeared consistently in the TDX server, but not on the SNP one.

In the process, I make more robust other parts of the code that could also be leading to race conditions or errors, particularly after crashed deployments.

Fixes #119
Fixes #122

@csegarragonz csegarragonz marked this pull request as ready for review December 10, 2024 14:59
@csegarragonz csegarragonz force-pushed the containerd-config-fix branch 2 times, most recently from 0baabe4 to 0ccf4fb Compare December 10, 2024 17:29
@csegarragonz csegarragonz marked this pull request as draft December 18, 2024 19:23
@csegarragonz csegarragonz marked this pull request as ready for review December 19, 2024 12:09
@csegarragonz csegarragonz merged commit e9ac038 into main Dec 19, 2024
5 checks passed
@csegarragonz csegarragonz deleted the containerd-config-fix branch December 19, 2024 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Copy artifacts from container images without running them Containerd config not updating for TDX
1 participant