-
Notifications
You must be signed in to change notification settings - Fork 519
feat: modify container runtime data dir #3072
Conversation
@@ -282,6 +282,9 @@ installContainerd() { | |||
ensureContainerd() { | |||
wait_for_file 1200 1 /etc/systemd/system/containerd.service.d/exec_start.conf || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | |||
wait_for_file 1200 1 /etc/containerd/config.toml || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | |||
{{- if HasContainerDataDir}} | |||
echo -e "root = \"{{GetContainerDataDir}}\"\n$(cat /etc/containerd/config.toml)" > /etc/containerd/config.toml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this is ugly. An alternative is doing it in cloud-init for non-vhd installs and doing it in the VHD otherwise. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we don't want to just get the whole generic key/val implementation over with first of all? See the way we use the GetKubeletConfigKeyVals
go template layer func in cloud-init (parts/k8s/cloud-init/masternodecustomdata.yml
and parts/k8s/cloud-init/nodecustomdata.yml
) to accommodate generic key/val injection at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part I wasn't sure about was how much to generalize this for docker vs containerd because the config files are totally different.
Are you thinking we could do a func like for kubelet, 1 each for docker and containerd, and have those functions accept the raw map[string]string
and spit out the raw CRI-specific JSON/TOML? Then we can avoid any conditionals in the bash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly 💯
Not trying to introduce scope creep, but I think we'll thank ourselves later for doing it the generic way. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to avoid "containerdConfig" and "dockerConfig", but we're kind of hiding it no matter what
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm on board with that, will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can stick with a single containerRuntimeConfig
, and then have per-runtime go template helper funcs. As long as we can assume that key/val strings are common across all runtimes (I think we can), then we can use a single interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a good way to handle errors in the template funcs? I want to do something like this, but struggling to find a good place to drop the Marshal to catch the error. I could do raw string manipulation, but that feels error prone (pun intended) and I wouldn't be guaranteed to construct valid json anyway.
// GetDockerConfig returns the full docker daemon configuration including user overrides.
func (k *KubernetesConfig) GetDockerConfig() (string, error) {
config := map[string]interface{}{
"live-restore": true,
"log-driver": "json-file",
"log-opts": {
"max-size": "50m",
"max-file": "5",
},
}
dataDir, ok := k.ContainerRuntimeConfig[ContainerDataDirKey]
if ok {
config["data-root"] = dataDir
}
b, err := json.MarshalIndent(config, "", " ")
return string(b), err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and containerd uses toml, even more fun
Codecov Report
@@ Coverage Diff @@
## master #3072 +/- ##
==========================================
+ Coverage 71.03% 71.05% +0.02%
==========================================
Files 147 147
Lines 25553 25671 +118
==========================================
+ Hits 18152 18241 +89
- Misses 6271 6294 +23
- Partials 1130 1136 +6
Continue to review full report at Codecov.
|
Thank you @alexeldeib! |
5424520
to
e2ae09a
Compare
@jackfrancis how do you feel about this? Can get some more test coverage if it's agreeable. Curious what you think about adding types like this for containerd/docker config and stashing it internally in common. |
76453bd
to
22fe938
Compare
if k.ContainerRuntime == Containerd { | ||
_, err := common.GetContainerdConfig(k.ContainerRuntimeConfig, nil) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized the way I'm doing this now, these errors are basically guaranteed not to get hit...they're reducing coverage in the diff but I think keeping them is correct for future sanity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay for functional options
} | ||
|
||
// GetContainerdConfig transforms the default containerd config with overrides. Overrides may be nil. | ||
func GetContainerdConfig(opts map[string]string, overrides []func(*ContainerdConfig) error) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with the functional setters to work around the fact that there are templated values in the docker/containerd config normally, but we don't want that logic in the helper functions (for testing purposes and circular imports).
I realize now that having both map[string]string and []func(*ContainerdConfig) error) is a bit redundant, for example we could have a separate helper with signature func help(map[string]string) []func(*ContainerdConfig) error)
for containerd/docker each and use only functional setters to construct the final object
pkg/api/common/const.go
Outdated
DefaultContainerdConfig = ContainerdConfig{ | ||
Version: 2, | ||
// should this be true? seems maybe not https://github.com/containerd/containerd/blob/master/docs/ops.md#base-configuration | ||
Subreaper: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought recommended docs say to set this to true? Is false a kube thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpuguy83 should be able to answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the setting. It is not part of the config anymore and never really made sense I don't think.
The config was changed to no_subreaper
in containerd/containerd#1822 and later removed altogether as far as I can tell (can't find where it was removed, but the config doesn't exist).
@@ -282,6 +282,9 @@ installContainerd() { | |||
ensureContainerd() { | |||
wait_for_file 1200 1 /etc/systemd/system/containerd.service.d/exec_start.conf || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | |||
wait_for_file 1200 1 /etc/containerd/config.toml || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | |||
{{- if HasValidContainerdConfig}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this in the cloud-init file spec directly, rather than as a post-cloud-init shell command on top of the cloud-init-paved file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know if apt will avoid overwriting the old daemon config if it already exists? That was my only concern, other than some confusion about VHD vs non-VHD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do you think it's reasonable to remove the validity check and write whatever we have once we've passed validation, or should we attempt to provide a sane default if we know it's invalid but still got here somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course doing that dropped my PR coverage again :P
} | ||
|
||
// ContainerdSandboxImageOverrider produces a function to transform containerd config by setting the SandboxImage. | ||
func ContainerdSandboxImageOverrider(image string) func(*ContainerdConfig) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an Overrider
rather than just an Override
func so we can use "curried" funcs with an image
arg baked in elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my thinking, the semantics are already fairly clear so I'm happy to keep the naming consistent (Override
) if you think it makes sense
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.untrusted] | ||
{{/* note: runc really should not be used for untrusted workloads... should we remove this? This is here because it was here before */}} | ||
runtime_type = "io.containerd.runc.v2" | ||
#EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure we keep these #EOF sentinel chars at the end of the data stream, which the CSE scripts use to determine that cloud-init has paved the entire file.
0b037ec
to
bc21273
Compare
@jackfrancis @mboersma friendly ping for another review, think this is good to go and squashed |
@alexeldeib, will do, could you add example usage of this new configuration in one of the agent pools, and the masterProfile, to: Thanks again for this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm pending E2E test config intregation of this foo + successful run
pkg/api/common/types.go
Outdated
type ContainerdConfig struct { | ||
OomScore int `toml:"oom_score,omitempty"` | ||
Root string `toml:"root,omitempty"` | ||
Subreaper bool `toml:"subreaper"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this, it's not a real config. I removed it from the containerd docs: containerd/containerd#4194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the actual generated toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gah, thought I pulled this out. will remove
@jackfrancis this would be cluster-wide since it's in KubernetesConfig like ContainerRuntime (although it definitely could be per agent pool). Would you like me to add a new file/test case with this, or update the everything cluster to use this by default? |
@alexeldeib Right, sorry, that's obviously evident in the changeset, thanks for the clarification. :) So yeah, let's put it into |
pkg/engine/template_generator.go
Outdated
"GetDockerConfig": func() string { | ||
var overrides []func(*common.DockerConfig) error | ||
|
||
if cs.Properties.HasNSeriesSKU() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already squashed but L704-709 were not present which failed N series VMs with GPUs before this
494026b
to
82277aa
Compare
} | ||
}{{end}} | ||
} | ||
{{IndentString (GetDockerConfig (IsNSeriesSKU .VMSize)) 4}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're teaching me new things about go template syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
…ir (Azure#3072) (Azure#3179)"""" This reverts commit 5af21fe.
Reason for Change:
Allow customization of the container runtime data directory.
The location of this directory will be subject to heavy disk IO and offloading it from the main OS disk can significantly stabilize the system. Solutions already exist to hack this onto nodes, but we'd like to enable it at provision time. This PR does that.
Issue Fixed:
n/a
Requirements:
Notes:
I haven't made API changes to AKS engine in a while, I wasn't sure what all tests I should add. Do we want an e2e covering this? What's the best way to test the full template generation, cmd/generate_test?