-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libnet/ipams/default: introduce a linear allocator #47768
libnet/ipams/default: introduce a linear allocator #47768
Conversation
aae4040
to
9c6196f
Compare
9c6196f
to
59e3d2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
} | ||
|
||
if p.Addr().Is4() { | ||
v4 = append(v4, p) | ||
if n.Base.Addr().Is4() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hasn't changed - but Is4
is false for an IPv4-mapped IPv6 addresses. It might be worth checking for that and storing the unmapped prefix in the IPv4 list, or just bailing out?
(We don't deal with mapped addresses in command line options either, but it might be less obvious here - the address pool just won't do anything useful. I'm not quite sure why someone would want to write IPv4 addresses as IPv6, but there's an issue somewhere asking us allow it.)
@corhere Please make sure this won't break Swarm in any way (eg. when a new leader is elected, etc...). |
59e3d2a
to
304e175
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review WIP; I haven't even finished with address_space.go. I'll be back tomorrow.
var last *ipamutils.NetworkToSplit | ||
var discarded int | ||
for i, imax := 0, len(predefined); i < imax; i++ { | ||
p := predefined[i-discarded] | ||
if last != nil && last.Overlaps(p.Base) { | ||
predefined = slices.Delete(predefined, i-discarded, i-discarded+1) | ||
discarded++ | ||
continue | ||
} | ||
last = p | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the slices
package is already being used, may as well take full advantage.
var last *ipamutils.NetworkToSplit | |
var discarded int | |
for i, imax := 0, len(predefined); i < imax; i++ { | |
p := predefined[i-discarded] | |
if last != nil && last.Overlaps(p.Base) { | |
predefined = slices.Delete(predefined, i-discarded, i-discarded+1) | |
discarded++ | |
continue | |
} | |
last = p | |
} | |
predefined = slices.CompactFunc(predefined, func(last, p *ipamutils.NetworkToSplit) bool { | |
return last.Overlaps(p.Base) | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slices.CompactFunc
works a bit differently. It expects a strict equality as it doesn't compare to the last non-duplicate found, but to 'current-1'. If you have the following subnets:
10.0.0.0/8
10.0.0.0/16
10.10.0.0/16
It tries to compare s1 == s2, and then s2 == s3. That's not what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slices.CompactFunc
works a bit differently.
That's too bad; it would have been such an elegant solution!
I don't like that slices.Delete
is being called in a loop:
Delete is O(len(s)-i)
Therefore the worst case time complexity is O(N^2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at first I thought predefined
wouldn't be filled with that many entries to really matter. But better safe than sorry, I guess. That's now fixed.
for i, allocated := range aSpace.allocated { | ||
if nw.Addr().Compare(allocated.Addr()) < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aSpace.allocated
is a sorted slice, which means binary searching is possible. Turn that O(n) search into O(log n) time complexity!
func (aSpace *addrSpace) allocatePool(nw netip.Prefix) error {
n, _ := slices.BinarySearchFunc(aSpace.allocated, nw, func(allocated, nw netip.Prefix) int { return nw.Addr().Compare(allocated.Addr()) })
aSpace.allocated = slices.Insert(aSpace.allocated, n, nw)
aSpace.subnets[nw] = newPoolData(nw)
return nil
}
Also, are duplicate allocations allowed? It would be trivial to detect this situation and return an error instead of inserting the duplicate entry into the slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new allocator should work fine with Swarm. The CNM network allocator replays allocations as static assignments if there is an existing allocation in the Swarm state.
moby/libnetwork/cnmallocator/networkallocator.go
Lines 866 to 873 in 4554d87
// If there is non-nil IPAM state always prefer those subnet | |
// configs over Spec configs. | |
if n.IPAM != nil { | |
ipamConfigs = n.IPAM.Configs | |
} else if n.Spec.IPAM != nil { | |
ipamConfigs = make([]*api.IPAMConfig, len(n.Spec.IPAM.Configs)) | |
copy(ipamConfigs, n.Spec.IPAM.Configs) | |
} |
5cfd940
to
b2fb88d
Compare
7e24653
to
37ba824
Compare
The previous allocator was subnetting address pools eagerly when the daemon started, and would then just iterate over that list whenever RequestPool was called. This was leading to high memory usage whenever IPv6 pools were configured with a target subnet size too different from the pools prefix size. For instance: pool = fd00::/8, target size = /64 -- 2 ^ (64-8) subnets would be generated upfront. This would take approx. 9 * 10^18 bits -- way too much for any human computer in 2024. Another noteworthy issue, the previous implementation was allocating a subnet, and then in another layer was checking whether the allocation was conflicting with some 'reserved networks'. If so, the allocation would be retried, etc... To make it worse, 'reserved networks' would be recomputed on every iteration. This is totally ineffective as there could be 'reserved networks' that fully overlap a given address pool (or many!). To fix this issue, a new field `Exclude` is added to `RequestPool`. It's up to each driver to take it into account. Since we don't know whether this retry loop is useful for some remote IPAM driver, it's reimplemented bug-for-bug directly in the remote driver. The new allocator uses a linear-search algorithm. It takes advantage of all lists (predefined pools, allocated subnets and reserved networks) being sorted and logically combines 'allocated' and 'reserved' through a 'double cursor' to iterate on both lists at the same time while preserving the total order. At the same time, it iterates over 'predefined' pools and looks for the first empty space that would be a good fit. Currently, the size of the allocated subnet is still dictated by each 'predefined' pools. We should consider hardcoding that size instead, and let users specify what subnet size they want. This wasn't possible before as the subnets were generated upfront. This new allocator should be able to deal with this easily. The method used for static allocation has been updated to make sure the ascending order of 'allocated' is preserved. It's bug-for-bug compatible with the previous implementation. One consequence of this new algorithm is that we don't keep track of where the last allocation happened, we just allocate the first free subnet we find. Before: - Allocate: 10.0.1.0/24, 10.0.2.0/24 ; Deallocate: 10.0.1.0/24 ; Allocate 10.0.3.0/24. Now, the 3rd allocation would yield 10.0.1.0/24 once again. As it doesn't change the semantics of the allocator, there's no reason to worry about that. Finally, about 'reserved networks'. The heuristics we use are now properly documented. It was discovered that we don't check routes for IPv6 allocations -- this can't be changed because there's no such thing as on-link routes for IPv6. (Kudos to Rob Murray for coming up with the linear-search idea.) Signed-off-by: Albin Kerouanton <[email protected]>
37ba824
to
9b54897
Compare
This ensures such address pools are part of the IPv4 address space. Signed-off-by: Albin Kerouanton <[email protected]>
Nothing was validating whether address pools' `base` prefix were larger than the target subnet `size` they're associated to. As such invalid address pools would yield no subnet, the error could go unnoticed. Signed-off-by: Albin Kerouanton <[email protected]>
9b54897
to
500eff0
Compare
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker/docker](https://togithub.com/docker/docker) | major | `26.1.4` -> `27.0.3` | --- ### Release Notes <details> <summary>docker/docker (docker/docker)</summary> ### [`v27.0.3`](https://togithub.com/moby/moby/releases/tag/v27.0.3) [Compare Source](https://togithub.com/docker/docker/compare/v27.0.2...v27.0.3) #### 27.0.3 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 27.0.3 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A27.0.3) - [moby/moby, 27.0.3 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A27.0.3) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v27.0.3/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v27.0.3/docs/api/version-history.md). ##### Bug fixes and enhancements - Fix a regression that incorrectly reported a port mapping from a host IPv6 address to an IPv4-only container as an error. [moby/moby#48090](https://togithub.com/moby/moby/pull/48090) - Fix a regression that caused duplicate subnet allocations when creating networks. [moby/moby#48089](https://togithub.com/moby/moby/pull/48089) - Fix a regression resulting in "fail to register layer: failed to Lchown" errors when trying to pull an image with rootless enabled on a system that supports native overlay with user-namespaces. [moby/moby#48086](https://togithub.com/moby/moby/pull/48086) ### [`v27.0.2`](https://togithub.com/moby/moby/releases/tag/v27.0.2) [Compare Source](https://togithub.com/docker/docker/compare/v27.0.1-rc.1...v27.0.2) #### 27.0.2 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 27.0.2 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A27.0.2) - [moby/moby, 27.0.2 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A27.0.2) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v27.0.2/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v27.0.2/docs/api/version-history.md). ##### Bug fixes and enhancements - Fix a regression that caused port numbers to be ignored when parsing a Docker registry URL. [docker/cli#5197](https://togithub.com/docker/cli/pull/5197), [docker/cli#5198](https://togithub.com/docker/cli/pull/5198) ##### Removed - api/types: deprecate `ContainerJSONBase.Node` field and `ContainerNode` type. These definitions were used by the standalone ("classic") Swarm API, but never implemented in the Docker Engine itself. [moby/moby#48055](https://togithub.com/moby/moby/pull/48055) ### [`v27.0.1`](https://togithub.com/moby/moby/releases/tag/v27.0.1) [Compare Source](https://togithub.com/docker/docker/compare/v26.1.4...v27.0.1-rc.1) #### 27.0.1 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 27.0.0 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A27.0.0) - [moby/moby, 27.0.0 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A27.0.0) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v27.0.1/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v27.0.1/docs/api/version-history.md). ##### New - containerd image store: Add `--platform` flag to `docker image push` and improve the default behavior when not all platforms of the multi-platform image are available locally. [docker/cli#4984](https://togithub.com/docker/cli/pull/4984), [moby/moby#47679](https://togithub.com/moby/moby/pull/47679) - Add support to `docker stack deploy` for `driver_opts` in a service's networks. [docker/cli#5125](https://togithub.com/docker/cli/pull/5125) - Consider additional `/usr/local/libexec` and `/usr/libexec` paths when looking up the userland proxy binaries by a name with a `docker-` prefix. [moby/moby#47804](https://togithub.com/moby/moby/pull/47804) ##### Bug fixes and enhancements - `*client.Client` instances are now always safe for concurrent use by multiple goroutines. Previously, this could lead to data races when the `WithAPIVersionNegotiation()` option is used. [moby/moby#47961](https://togithub.com/moby/moby/pull/47961) - Fix a bug causing the Docker CLI to leak Unix sockets in `$TMPDIR` in some cases. [docker/cli#5146](https://togithub.com/docker/cli/pull/5146) - Don't ignore a custom seccomp profile when used in conjunction with `--privileged`. [moby/moby#47500](https://togithub.com/moby/moby/pull/47500) - rootless: overlay2: support native overlay diff when using rootless-mode with Linux kernel version 5.11 and later. [moby/moby#47605](https://togithub.com/moby/moby/pull/47605) - Fix the `StartInterval` default value of healthcheck to reflect the documented value of 5s. [moby/moby#47799](https://togithub.com/moby/moby/pull/47799) - Fix `docker save` and `docker load` not ending on the daemon side when the operation was cancelled by the user, for example with <kbd>Ctrl+C</kbd>. [moby/moby#47629](https://togithub.com/moby/moby/pull/47629) - The `StartedAt` property of containers is now recorded before container startup, guaranteeing that the `StartedAt` is always before `FinishedAt`. [moby/moby#47003](https://togithub.com/moby/moby/pull/47003) - The internal DNS resolver used by Windows containers on Windows now forwards requests to external DNS servers by default. This enables `nslookup` to resolve external hostnames. This behaviour can be disabled via `daemon.json`, using `"features": { "windows-dns-proxy": false }`. The configuration option will be removed in a future release. [moby/moby#47826](https://togithub.com/moby/moby/pull/47826) - Print a warning when the CLI does not have permissions to read the configuration file. [docker/cli#5077](https://togithub.com/docker/cli/pull/5077) - Fix a goroutine and file-descriptor leak on container attach. [moby/moby#45052](https://togithub.com/moby/moby/pull/45052) - Clear the networking state of all stopped or dead containers during daemon start-up. [moby/moby#47984](https://togithub.com/moby/moby/pull/47984) - Write volume options JSON atomically to avoid "invalid JSON" errors after system crash. [moby/moby#48034](https://togithub.com/moby/moby/pull/48034) - Allow multiple macvlan networks with the same parent. [moby/moby#47318](https://togithub.com/moby/moby/pull/47318) - Allow BuildKit to be used on Windows daemons that advertise it. [docker/cli#5178](https://togithub.com/docker/cli/pull/5178) ##### Networking - Allow sysctls to be set per-interface during container creation and network connection. [moby/moby#47686](https://togithub.com/moby/moby/pull/47686) - In a future release, this will be the only way to set per-interface sysctl options. For example, on the command line in a `docker run` command,`--network mynet --sysctl net.ipv4.conf.eth0.log_martians=1` will be rejected. Instead, you must use `--network name=mynet,driver-opt=com.docker.network.endpoint.sysctls=net.ipv4.conf.IFNAME.log_martians=1`. ##### IPv6 - `ip6tables` is no longer experimental. You may remove the `experimental` configuration option and continue to use IPv6, if it is not required by any other features. - `ip6tables` is now enabled for Linux bridge networks by default. [moby/moby#47747](https://togithub.com/moby/moby/pull/47747) - This makes IPv4 and IPv6 behaviors consistent with each other, and reduces the risk that IPv6-enabled containers are inadvertently exposed to the network. - There is no impact if you are running Docker Engine with `ip6tables` enabled (new default). - If you are using an IPv6-enabled bridge network without `ip6tables`, this is likely a breaking change. Only published container ports (`-p` or `--publish`) are accessible from outside the Docker bridge network, and outgoing connections masquerade as the host. - To restore the behavior of earlier releases, no `ip6tables` at all, set `"ip6tables": false` in `daemon.json`, or use the CLI option `--ip6tables=false`. Alternatively, leave `ip6tables` enabled, publish ports, and enable direct routing. - With `ip6tables` enabled, if `ip6tables` is not functional on your host, Docker Engine will start but it will not be possible to create an IPv6-enabled network. ##### IPv6 network configuration improvements - A Unique Local Address (ULA) base prefix is automatically added to `default-address-pools` if this parameter wasn't manually configured, or if it contains no IPv6 prefixes. [moby/moby#47853](https://togithub.com/moby/moby/pull/47853) - Prior to this release, to create an IPv6-enabled network it was necessary to use the `--subnet` option to specify an IPv6 subnet, or add IPv6 ranges to `default-address-pools` in `daemon.json`. - Starting in this release, when a bridge network is created with `--ipv6` and no IPv6 subnet is defined by those options, an IPv6 Unique Local Address (ULA) base prefix is used. - The ULA prefix is derived from the Engine host ID such that it's unique across hosts and over time. - IPv6 address pools of any size can now be added to `default-address-pools`. [moby/moby#47768](https://togithub.com/moby/moby/pull/47768) - IPv6 can now be enabled by default on all custom bridge networks using `"default-network-opts": { "bridge": {"com.docker.network.enable_ipv6": "true"}}` in `daemon.json`, or `dockerd --default-network-opt=bridge=com.docker.network.enable_ipv6=true`on the comand line. [moby/moby#47867](https://togithub.com/moby/moby/pull/47867) - Direct routing for IPv6 networks, with `ip6tables` enabled. [moby/moby#47871](https://togithub.com/moby/moby/pull/47871) - Added bridge driver option `com.docker.network.bridge.gateway_mode_ipv6=<nat|routed>`. - The default behavior, `nat`, is unchanged from previous releases running with `ip6tables` enabled. NAT and masquerading rules are set up for each published container port. - When set to `routed`, no NAT or masquerading rules are configured for published ports. This enables direct IPv6 access to the container, if the host's network can route packets for the container's address to the host. Published ports will be opened in the container's firewall. - When a port mapping only applies to `routed` mode, only addresses `0.0.0.0` or `::` are allowed and a host port must not be given. - Note that published container ports, in `nat` or `routed` mode, are accessible from any remote address if routing is set up in the network, unless the Docker host's firewall has additional restrictions. For example: `docker network create --ipv6 -o com.docker.network.bridge.gateway_mode_ipv6=routed mynet`. - The option `com.docker.network.bridge.gateway_mode_ipv4=<nat|routed>` is also available, with the same behavior but for IPv4. - If firewalld is running on the host, Docker creates policy `docker-forwarding` to allow forwarding from any zone to the `docker` zone. This makes it possible to configure a bridge network with a routable IPv6 address, and no NAT or masquerading. [moby/moby#47745](https://togithub.com/moby/moby/pull/47745) - When a port is published with no host port specified, or a host port range is given, the same port will be allocated for IPv4 and IPv6. [moby/moby#47871](https://togithub.com/moby/moby/pull/47871) - For example `-p 80` will result in the same ephemeral port being allocated for `0.0.0.0` and `::`, and `-p 8080-8083:80` will pick the same port from the range for both address families. - Similarly, ports published to specific addresses will be allocated the same port. For example, `-p 127.0.0.1::80 -p '[::1]::80'`. - If no port is available on all required addresses, container creation will fail. - Environment variable `DOCKER_ALLOW_IPV6_ON_IPV4_INTERFACE`, introduced in release 26.1.1, no longer has any effect. [moby/moby#47963](https://togithub.com/moby/moby/pull/47963) - If IPv6 could not be disabled on an interface because of a read-only `/proc/sys/net`, the environment variable allowed the container to start anyway. - In this release, if IPv4 cannot be disabled for an interface, IPv6 can be explicitly enabled for the network simply by using `--ipv6` when creating it. Other workarounds are to configure the OS to disable IPv6 by default on new interfaces, mount `/proc/sys/net` read-write, or use a kernel with no IPv6 support. - For IPv6-enabled bridge networks, do not attempt to replace the bridge's kernel-assigned link local address with `fe80::1`. [moby/moby#47787](https://togithub.com/moby/moby/pull/47787) ##### Removed - Deprecate experimental GraphDriver plugins. [moby/moby#48050](https://togithub.com/moby/moby/pull/48050), [docker/cli#5172](https://togithub.com/docker/cli/pull/5172) - pkg/archive: deprecate `NewTempArchive` and `TempArchive`. These types were only used in tests and will be removed in the next release. [moby/moby#48002](https://togithub.com/moby/moby/pull/48002) - pkg/archive: deprecate `CanonicalTarNameForPath` [moby/moby#48001](https://togithub.com/moby/moby/pull/48001) - Deprecate pkg/dmesg. This package was no longer used, and will be removed in the next release. [moby/moby#47999](https://togithub.com/moby/moby/pull/47999) - Deprecate `pkg/stringid.ValidateID` and `pkg/stringid.IsShortID` [moby/moby#47995](https://togithub.com/moby/moby/pull/47995) - runconfig: deprecate `SetDefaultNetModeIfBlank` and move `ContainerConfigWrapper` to `api/types/container` [moby/moby#48007](https://togithub.com/moby/moby/pull/48007) - runconfig: deprecate `DefaultDaemonNetworkMode` and move to `daemon/network` [moby/moby#48008](https://togithub.com/moby/moby/pull/48008) - runconfig: deprecate `opts.ConvertKVStringsToMap`. This utility is no longer used, and will be removed in the next release. [moby/moby#48016](https://togithub.com/moby/moby/pull/48016) - runconfig: deprecate `IsPreDefinedNetwork`. [moby/moby#48011](https://togithub.com/moby/moby/pull/48011) ##### API - containerd image store: `POST /images/{name}/push` now supports a `platform` parameter (JSON encoded OCI Platform type) that allows selecting a specific platform-manifest from the multi-platform image. This is experimental and may change in future API versions. [moby/moby#47679](https://togithub.com/moby/moby/pull/47679) - `POST /services/create` and `POST /services/{id}/update` now support `OomScoreAdj`. [moby/moby#47950](https://togithub.com/moby/moby/pull/47950) - `ContainerList` api returns container annotations. [moby/moby#47866](https://togithub.com/moby/moby/pull/47866) - `POST /containers/create` and `POST /services/create` now take `Options` as part of `HostConfig.Mounts.TmpfsOptions` allowing to set options for tmpfs mounts. [moby/moby#46809](https://togithub.com/moby/moby/pull/46809) - The `Healthcheck.StartInterval` property is now correctly ignored when updating a Swarm service using API versions less than v1.44. [moby/moby#47991](https://togithub.com/moby/moby/pull/47991) - `GET /events` now supports image `create` event that is emitted when a new image is built regardless if it was tagged or not. [moby/moby#47929](https://togithub.com/moby/moby/pull/47929) - `GET /info` now includes a `Containerd` field containing information about the location of the containerd API socket and containerd namespaces used by the daemon to run containers and plugins. [moby/moby#47239](https://togithub.com/moby/moby/pull/47239) - Deprecate non-standard (config) fields in image inspect output. The `Config` field returned by this endpoint (used for `docker image inspect`) returned additional fields that are not part of the image's configuration and not part of the [Docker Image Spec] and the [OCI Image Spec]. These fields are never set (and always return the default value for the type), but are not omitted in the response when left empty. As these fields were not intended to be part of the image configuration response, they are deprecated, and will be removed in the future API versions. - Deprecate the daemon flag `--api-cors-header` and the corresponding `daemon.json` configuration option. These will be removed in the next major release. [moby/moby#45313](https://togithub.com/moby/moby/pull/45313) The following deprecated fields are currently included in the API response, but are not part of the underlying image's `Config`: [moby/moby#47941](https://togithub.com/moby/moby/pull/47941) - `Hostname` - `Domainname` - `AttachStdin` - `AttachStdout` - `AttachStderr` - `Tty` - `OpenStdin` - `StdinOnce` - `Image` - `NetworkDisabled` (already omitted unless set) - `MacAddress` (already omitted unless set) - `StopTimeout` (already omitted unless set) ##### Go SDK changes - Client API callback for the following functions now require a context parameter. [moby/moby#47536](https://togithub.com/moby/moby/pull/47536) - `client.RequestPrivilegeFunc` - `client.ImageSearchOptions.AcceptPermissionsFunc` - `image.ImportOptions.PrivilegeFunc` - Remove deprecated aliases for Image types. [moby/moby#47900](https://togithub.com/moby/moby/pull/47900) - `ImageImportOptions` - `ImageCreateOptions` - `ImagePullOptions` - `ImagePushOptions` - `ImageListOptions` - `ImageRemoveOptions` - Introduce `Ulimit` type alias for `github.com/docker/go-units.Ulimit`. The `Ulimit` type as used in the API is defined in a Go module that will transition to a new location in future. A type alias is added to reduce the friction that comes with moving the type to a new location. The alias makes sure that existing code continues to work, but its definition may change in future. Users are recommended to use this alias instead of the `units.Ulimit` directly. [moby/moby#48023](https://togithub.com/moby/moby/pull/48023) - Move and rename types, changing their import paths and exported names. [moby/moby#47936](https://togithub.com/moby/moby/pull/47936), [moby/moby#47873](https://togithub.com/moby/moby/pull/47873), [moby/moby#47887](https://togithub.com/moby/moby/pull/47887), [moby/moby#47882](https://togithub.com/moby/moby/pull/47882), [moby/moby#47921](https://togithub.com/moby/moby/pull/47921), [moby/moby#48040](https://togithub.com/moby/moby/pull/48040): - Move the following types to `api/types/container`: - `BlkioStatEntry` - `BlkioStats` - `CPUStats` - `CPUUsage` - `ContainerExecInspect` - `ContainerPathStat` - `ContainerStats` - `ContainersPruneReport` - `CopyToContainerOptions` - `ExecConfig` - `ExecStartCheck` - `MemoryStats` - `NetworkStats` - `PidsStats` - `StatsJSON` - `Stats` - `StorageStats` - `ThrottlingData` - Move the following types to `api/types/image`: - `ImagesPruneReport` - `ImageImportSource` - `ImageLoadResponse` - Move the `ExecStartOptions` type to `api/types/backend`. - Move the `VolumesPruneReport` type to `api/types/volume`. - Move the `EventsOptions` type to `api/types/events`. - Move the `ImageSearchOptions` type to `api/types/registry`. - Drop `Network` prefix and move the following types to `api/types/network`: - `NetworkCreateResponse` - `NetworkConnect` - `NetworkDisconnect` - `NetworkInspectOptions` - `EndpointResource` - `NetworkListOptions` - `NetworkCreateOptions` - `NetworkCreateRequest` - `NetworksPruneReport` - Move `NetworkResource` to `api/types/network`. ##### Packaging updates - Update Buildx to [v0.15.1](https://togithub.com/docker/buildx/releases/tag/v0.15.1). [docker/docker-ce-packaging#1029](https://togithub.com/docker/docker-ce-packaging/pull/1029) - Update BuildKit to [v0.14.1](https://togithub.com/moby/buildkit/releases/tag/v0.14.1). [moby/moby#48028](https://togithub.com/moby/moby/pull/48028) - Update runc to [v1.1.13](https://togithub.com/opencontainers/runc/releases/tag/v1.1.13) [moby/moby#47976](https://togithub.com/moby/moby/pull/47976) - Update Compose to [v2.28.1](https://togithub.com/docker/compose/releases/tag/v2.28.1). [moby/docker-ce-packaging#1032](https://togithub.com/docker/docker-ce-packaging/pull/1032) [Docker image spec]: https://togithub.com/moby/docker-image-spec/blob/v1.3.1/specs-go/v1/image.go#L19-L32 [OCI Image Spec]: https://togithub.com/opencontainers/image-spec/blob/v1.1.0/specs-go/v1/config.go#L24-L62 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 6am on monday" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/earthly/dind). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjEuMCIsInVwZGF0ZWRJblZlciI6IjM3LjQyMS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZSJdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker/docker](https://togithub.com/docker/docker) | major | `26.1.4` -> `27.0.3` | --- ### Release Notes <details> <summary>docker/docker (docker/docker)</summary> ### [`v27.0.3`](https://togithub.com/moby/moby/releases/tag/v27.0.3) [Compare Source](https://togithub.com/docker/docker/compare/v27.0.2...v27.0.3) #### 27.0.3 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 27.0.3 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A27.0.3) - [moby/moby, 27.0.3 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A27.0.3) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v27.0.3/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v27.0.3/docs/api/version-history.md). ##### Bug fixes and enhancements - Fix a regression that incorrectly reported a port mapping from a host IPv6 address to an IPv4-only container as an error. [moby/moby#48090](https://togithub.com/moby/moby/pull/48090) - Fix a regression that caused duplicate subnet allocations when creating networks. [moby/moby#48089](https://togithub.com/moby/moby/pull/48089) - Fix a regression resulting in "fail to register layer: failed to Lchown" errors when trying to pull an image with rootless enabled on a system that supports native overlay with user-namespaces. [moby/moby#48086](https://togithub.com/moby/moby/pull/48086) ### [`v27.0.2`](https://togithub.com/moby/moby/releases/tag/v27.0.2) [Compare Source](https://togithub.com/docker/docker/compare/v27.0.1-rc.1...v27.0.2) #### 27.0.2 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 27.0.2 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A27.0.2) - [moby/moby, 27.0.2 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A27.0.2) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v27.0.2/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v27.0.2/docs/api/version-history.md). ##### Bug fixes and enhancements - Fix a regression that caused port numbers to be ignored when parsing a Docker registry URL. [docker/cli#5197](https://togithub.com/docker/cli/pull/5197), [docker/cli#5198](https://togithub.com/docker/cli/pull/5198) ##### Removed - api/types: deprecate `ContainerJSONBase.Node` field and `ContainerNode` type. These definitions were used by the standalone ("classic") Swarm API, but never implemented in the Docker Engine itself. [moby/moby#48055](https://togithub.com/moby/moby/pull/48055) ### [`v27.0.1`](https://togithub.com/moby/moby/releases/tag/v27.0.1) [Compare Source](https://togithub.com/docker/docker/compare/v26.1.4...v27.0.1-rc.1) #### 27.0.1 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 27.0.0 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A27.0.0) - [moby/moby, 27.0.0 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A27.0.0) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v27.0.1/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v27.0.1/docs/api/version-history.md). ##### New - containerd image store: Add `--platform` flag to `docker image push` and improve the default behavior when not all platforms of the multi-platform image are available locally. [docker/cli#4984](https://togithub.com/docker/cli/pull/4984), [moby/moby#47679](https://togithub.com/moby/moby/pull/47679) - Add support to `docker stack deploy` for `driver_opts` in a service's networks. [docker/cli#5125](https://togithub.com/docker/cli/pull/5125) - Consider additional `/usr/local/libexec` and `/usr/libexec` paths when looking up the userland proxy binaries by a name with a `docker-` prefix. [moby/moby#47804](https://togithub.com/moby/moby/pull/47804) ##### Bug fixes and enhancements - `*client.Client` instances are now always safe for concurrent use by multiple goroutines. Previously, this could lead to data races when the `WithAPIVersionNegotiation()` option is used. [moby/moby#47961](https://togithub.com/moby/moby/pull/47961) - Fix a bug causing the Docker CLI to leak Unix sockets in `$TMPDIR` in some cases. [docker/cli#5146](https://togithub.com/docker/cli/pull/5146) - Don't ignore a custom seccomp profile when used in conjunction with `--privileged`. [moby/moby#47500](https://togithub.com/moby/moby/pull/47500) - rootless: overlay2: support native overlay diff when using rootless-mode with Linux kernel version 5.11 and later. [moby/moby#47605](https://togithub.com/moby/moby/pull/47605) - Fix the `StartInterval` default value of healthcheck to reflect the documented value of 5s. [moby/moby#47799](https://togithub.com/moby/moby/pull/47799) - Fix `docker save` and `docker load` not ending on the daemon side when the operation was cancelled by the user, for example with <kbd>Ctrl+C</kbd>. [moby/moby#47629](https://togithub.com/moby/moby/pull/47629) - The `StartedAt` property of containers is now recorded before container startup, guaranteeing that the `StartedAt` is always before `FinishedAt`. [moby/moby#47003](https://togithub.com/moby/moby/pull/47003) - The internal DNS resolver used by Windows containers on Windows now forwards requests to external DNS servers by default. This enables `nslookup` to resolve external hostnames. This behaviour can be disabled via `daemon.json`, using `"features": { "windows-dns-proxy": false }`. The configuration option will be removed in a future release. [moby/moby#47826](https://togithub.com/moby/moby/pull/47826) - Print a warning when the CLI does not have permissions to read the configuration file. [docker/cli#5077](https://togithub.com/docker/cli/pull/5077) - Fix a goroutine and file-descriptor leak on container attach. [moby/moby#45052](https://togithub.com/moby/moby/pull/45052) - Clear the networking state of all stopped or dead containers during daemon start-up. [moby/moby#47984](https://togithub.com/moby/moby/pull/47984) - Write volume options JSON atomically to avoid "invalid JSON" errors after system crash. [moby/moby#48034](https://togithub.com/moby/moby/pull/48034) - Allow multiple macvlan networks with the same parent. [moby/moby#47318](https://togithub.com/moby/moby/pull/47318) - Allow BuildKit to be used on Windows daemons that advertise it. [docker/cli#5178](https://togithub.com/docker/cli/pull/5178) ##### Networking - Allow sysctls to be set per-interface during container creation and network connection. [moby/moby#47686](https://togithub.com/moby/moby/pull/47686) - In a future release, this will be the only way to set per-interface sysctl options. For example, on the command line in a `docker run` command,`--network mynet --sysctl net.ipv4.conf.eth0.log_martians=1` will be rejected. Instead, you must use `--network name=mynet,driver-opt=com.docker.network.endpoint.sysctls=net.ipv4.conf.IFNAME.log_martians=1`. ##### IPv6 - `ip6tables` is no longer experimental. You may remove the `experimental` configuration option and continue to use IPv6, if it is not required by any other features. - `ip6tables` is now enabled for Linux bridge networks by default. [moby/moby#47747](https://togithub.com/moby/moby/pull/47747) - This makes IPv4 and IPv6 behaviors consistent with each other, and reduces the risk that IPv6-enabled containers are inadvertently exposed to the network. - There is no impact if you are running Docker Engine with `ip6tables` enabled (new default). - If you are using an IPv6-enabled bridge network without `ip6tables`, this is likely a breaking change. Only published container ports (`-p` or `--publish`) are accessible from outside the Docker bridge network, and outgoing connections masquerade as the host. - To restore the behavior of earlier releases, no `ip6tables` at all, set `"ip6tables": false` in `daemon.json`, or use the CLI option `--ip6tables=false`. Alternatively, leave `ip6tables` enabled, publish ports, and enable direct routing. - With `ip6tables` enabled, if `ip6tables` is not functional on your host, Docker Engine will start but it will not be possible to create an IPv6-enabled network. ##### IPv6 network configuration improvements - A Unique Local Address (ULA) base prefix is automatically added to `default-address-pools` if this parameter wasn't manually configured, or if it contains no IPv6 prefixes. [moby/moby#47853](https://togithub.com/moby/moby/pull/47853) - Prior to this release, to create an IPv6-enabled network it was necessary to use the `--subnet` option to specify an IPv6 subnet, or add IPv6 ranges to `default-address-pools` in `daemon.json`. - Starting in this release, when a bridge network is created with `--ipv6` and no IPv6 subnet is defined by those options, an IPv6 Unique Local Address (ULA) base prefix is used. - The ULA prefix is derived from the Engine host ID such that it's unique across hosts and over time. - IPv6 address pools of any size can now be added to `default-address-pools`. [moby/moby#47768](https://togithub.com/moby/moby/pull/47768) - IPv6 can now be enabled by default on all custom bridge networks using `"default-network-opts": { "bridge": {"com.docker.network.enable_ipv6": "true"}}` in `daemon.json`, or `dockerd --default-network-opt=bridge=com.docker.network.enable_ipv6=true`on the comand line. [moby/moby#47867](https://togithub.com/moby/moby/pull/47867) - Direct routing for IPv6 networks, with `ip6tables` enabled. [moby/moby#47871](https://togithub.com/moby/moby/pull/47871) - Added bridge driver option `com.docker.network.bridge.gateway_mode_ipv6=<nat|routed>`. - The default behavior, `nat`, is unchanged from previous releases running with `ip6tables` enabled. NAT and masquerading rules are set up for each published container port. - When set to `routed`, no NAT or masquerading rules are configured for published ports. This enables direct IPv6 access to the container, if the host's network can route packets for the container's address to the host. Published ports will be opened in the container's firewall. - When a port mapping only applies to `routed` mode, only addresses `0.0.0.0` or `::` are allowed and a host port must not be given. - Note that published container ports, in `nat` or `routed` mode, are accessible from any remote address if routing is set up in the network, unless the Docker host's firewall has additional restrictions. For example: `docker network create --ipv6 -o com.docker.network.bridge.gateway_mode_ipv6=routed mynet`. - The option `com.docker.network.bridge.gateway_mode_ipv4=<nat|routed>` is also available, with the same behavior but for IPv4. - If firewalld is running on the host, Docker creates policy `docker-forwarding` to allow forwarding from any zone to the `docker` zone. This makes it possible to configure a bridge network with a routable IPv6 address, and no NAT or masquerading. [moby/moby#47745](https://togithub.com/moby/moby/pull/47745) - When a port is published with no host port specified, or a host port range is given, the same port will be allocated for IPv4 and IPv6. [moby/moby#47871](https://togithub.com/moby/moby/pull/47871) - For example `-p 80` will result in the same ephemeral port being allocated for `0.0.0.0` and `::`, and `-p 8080-8083:80` will pick the same port from the range for both address families. - Similarly, ports published to specific addresses will be allocated the same port. For example, `-p 127.0.0.1::80 -p '[::1]::80'`. - If no port is available on all required addresses, container creation will fail. - Environment variable `DOCKER_ALLOW_IPV6_ON_IPV4_INTERFACE`, introduced in release 26.1.1, no longer has any effect. [moby/moby#47963](https://togithub.com/moby/moby/pull/47963) - If IPv6 could not be disabled on an interface because of a read-only `/proc/sys/net`, the environment variable allowed the container to start anyway. - In this release, if IPv4 cannot be disabled for an interface, IPv6 can be explicitly enabled for the network simply by using `--ipv6` when creating it. Other workarounds are to configure the OS to disable IPv6 by default on new interfaces, mount `/proc/sys/net` read-write, or use a kernel with no IPv6 support. - For IPv6-enabled bridge networks, do not attempt to replace the bridge's kernel-assigned link local address with `fe80::1`. [moby/moby#47787](https://togithub.com/moby/moby/pull/47787) ##### Removed - Deprecate experimental GraphDriver plugins. [moby/moby#48050](https://togithub.com/moby/moby/pull/48050), [docker/cli#5172](https://togithub.com/docker/cli/pull/5172) - pkg/archive: deprecate `NewTempArchive` and `TempArchive`. These types were only used in tests and will be removed in the next release. [moby/moby#48002](https://togithub.com/moby/moby/pull/48002) - pkg/archive: deprecate `CanonicalTarNameForPath` [moby/moby#48001](https://togithub.com/moby/moby/pull/48001) - Deprecate pkg/dmesg. This package was no longer used, and will be removed in the next release. [moby/moby#47999](https://togithub.com/moby/moby/pull/47999) - Deprecate `pkg/stringid.ValidateID` and `pkg/stringid.IsShortID` [moby/moby#47995](https://togithub.com/moby/moby/pull/47995) - runconfig: deprecate `SetDefaultNetModeIfBlank` and move `ContainerConfigWrapper` to `api/types/container` [moby/moby#48007](https://togithub.com/moby/moby/pull/48007) - runconfig: deprecate `DefaultDaemonNetworkMode` and move to `daemon/network` [moby/moby#48008](https://togithub.com/moby/moby/pull/48008) - runconfig: deprecate `opts.ConvertKVStringsToMap`. This utility is no longer used, and will be removed in the next release. [moby/moby#48016](https://togithub.com/moby/moby/pull/48016) - runconfig: deprecate `IsPreDefinedNetwork`. [moby/moby#48011](https://togithub.com/moby/moby/pull/48011) ##### API - containerd image store: `POST /images/{name}/push` now supports a `platform` parameter (JSON encoded OCI Platform type) that allows selecting a specific platform-manifest from the multi-platform image. This is experimental and may change in future API versions. [moby/moby#47679](https://togithub.com/moby/moby/pull/47679) - `POST /services/create` and `POST /services/{id}/update` now support `OomScoreAdj`. [moby/moby#47950](https://togithub.com/moby/moby/pull/47950) - `ContainerList` api returns container annotations. [moby/moby#47866](https://togithub.com/moby/moby/pull/47866) - `POST /containers/create` and `POST /services/create` now take `Options` as part of `HostConfig.Mounts.TmpfsOptions` allowing to set options for tmpfs mounts. [moby/moby#46809](https://togithub.com/moby/moby/pull/46809) - The `Healthcheck.StartInterval` property is now correctly ignored when updating a Swarm service using API versions less than v1.44. [moby/moby#47991](https://togithub.com/moby/moby/pull/47991) - `GET /events` now supports image `create` event that is emitted when a new image is built regardless if it was tagged or not. [moby/moby#47929](https://togithub.com/moby/moby/pull/47929) - `GET /info` now includes a `Containerd` field containing information about the location of the containerd API socket and containerd namespaces used by the daemon to run containers and plugins. [moby/moby#47239](https://togithub.com/moby/moby/pull/47239) - Deprecate non-standard (config) fields in image inspect output. The `Config` field returned by this endpoint (used for `docker image inspect`) returned additional fields that are not part of the image's configuration and not part of the [Docker Image Spec] and the [OCI Image Spec]. These fields are never set (and always return the default value for the type), but are not omitted in the response when left empty. As these fields were not intended to be part of the image configuration response, they are deprecated, and will be removed in the future API versions. - Deprecate the daemon flag `--api-cors-header` and the corresponding `daemon.json` configuration option. These will be removed in the next major release. [moby/moby#45313](https://togithub.com/moby/moby/pull/45313) The following deprecated fields are currently included in the API response, but are not part of the underlying image's `Config`: [moby/moby#47941](https://togithub.com/moby/moby/pull/47941) - `Hostname` - `Domainname` - `AttachStdin` - `AttachStdout` - `AttachStderr` - `Tty` - `OpenStdin` - `StdinOnce` - `Image` - `NetworkDisabled` (already omitted unless set) - `MacAddress` (already omitted unless set) - `StopTimeout` (already omitted unless set) ##### Go SDK changes - Client API callback for the following functions now require a context parameter. [moby/moby#47536](https://togithub.com/moby/moby/pull/47536) - `client.RequestPrivilegeFunc` - `client.ImageSearchOptions.AcceptPermissionsFunc` - `image.ImportOptions.PrivilegeFunc` - Remove deprecated aliases for Image types. [moby/moby#47900](https://togithub.com/moby/moby/pull/47900) - `ImageImportOptions` - `ImageCreateOptions` - `ImagePullOptions` - `ImagePushOptions` - `ImageListOptions` - `ImageRemoveOptions` - Introduce `Ulimit` type alias for `github.com/docker/go-units.Ulimit`. The `Ulimit` type as used in the API is defined in a Go module that will transition to a new location in future. A type alias is added to reduce the friction that comes with moving the type to a new location. The alias makes sure that existing code continues to work, but its definition may change in future. Users are recommended to use this alias instead of the `units.Ulimit` directly. [moby/moby#48023](https://togithub.com/moby/moby/pull/48023) - Move and rename types, changing their import paths and exported names. [moby/moby#47936](https://togithub.com/moby/moby/pull/47936), [moby/moby#47873](https://togithub.com/moby/moby/pull/47873), [moby/moby#47887](https://togithub.com/moby/moby/pull/47887), [moby/moby#47882](https://togithub.com/moby/moby/pull/47882), [moby/moby#47921](https://togithub.com/moby/moby/pull/47921), [moby/moby#48040](https://togithub.com/moby/moby/pull/48040): - Move the following types to `api/types/container`: - `BlkioStatEntry` - `BlkioStats` - `CPUStats` - `CPUUsage` - `ContainerExecInspect` - `ContainerPathStat` - `ContainerStats` - `ContainersPruneReport` - `CopyToContainerOptions` - `ExecConfig` - `ExecStartCheck` - `MemoryStats` - `NetworkStats` - `PidsStats` - `StatsJSON` - `Stats` - `StorageStats` - `ThrottlingData` - Move the following types to `api/types/image`: - `ImagesPruneReport` - `ImageImportSource` - `ImageLoadResponse` - Move the `ExecStartOptions` type to `api/types/backend`. - Move the `VolumesPruneReport` type to `api/types/volume`. - Move the `EventsOptions` type to `api/types/events`. - Move the `ImageSearchOptions` type to `api/types/registry`. - Drop `Network` prefix and move the following types to `api/types/network`: - `NetworkCreateResponse` - `NetworkConnect` - `NetworkDisconnect` - `NetworkInspectOptions` - `EndpointResource` - `NetworkListOptions` - `NetworkCreateOptions` - `NetworkCreateRequest` - `NetworksPruneReport` - Move `NetworkResource` to `api/types/network`. ##### Packaging updates - Update Buildx to [v0.15.1](https://togithub.com/docker/buildx/releases/tag/v0.15.1). [docker/docker-ce-packaging#1029](https://togithub.com/docker/docker-ce-packaging/pull/1029) - Update BuildKit to [v0.14.1](https://togithub.com/moby/buildkit/releases/tag/v0.14.1). [moby/moby#48028](https://togithub.com/moby/moby/pull/48028) - Update runc to [v1.1.13](https://togithub.com/opencontainers/runc/releases/tag/v1.1.13) [moby/moby#47976](https://togithub.com/moby/moby/pull/47976) - Update Compose to [v2.28.1](https://togithub.com/docker/compose/releases/tag/v2.28.1). [moby/docker-ce-packaging#1032](https://togithub.com/docker/docker-ce-packaging/pull/1032) [Docker image spec]: https://togithub.com/moby/docker-image-spec/blob/v1.3.1/specs-go/v1/image.go#L19-L32 [OCI Image Spec]: https://togithub.com/opencontainers/image-spec/blob/v1.1.0/specs-go/v1/config.go#L24-L62 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 6am on monday" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/earthly/dind). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40MjEuMCIsInVwZGF0ZWRJblZlciI6IjM3LjQyMS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZSJdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
- What I did
This previous allocator was subnetting address pools eagerly when the daemon started, and would then just iterate over that list whenever RequestPool was called. This was leading to high memory usage whenever IPv6 pools were configured with a target subnet size too different from the pools prefix size.
For instance: pool = fd00::/8, target size = /64 -- 2 ^ (64-8) subnets would be generated upfront. This would take approx. 9 * 10^18 bits -- way too much for any human computer in 2024.
Another noteworthy issue, the previous implementation was allocating a subnet, and then in another layer was checking whether the allocation was conflicting with some 'reserved networks'. If so, the allocation would be retried, etc... To make it worse, 'reserved networks' would be recomputed on every iteration. This is totally ineffective as there could be 'reserved networks' that fully overlap a given address pool (or many!).
To fix this issue, a new field
Exclude
is added toRequestPool
. It's up to each driver to take it into account. Since we don't know whether this retry loop is useful for some remote IPAM driver, it's reimplemented bug-for-bug directly in the remote driver.The new allocator uses a linear-search algorithm. It takes advantage of all lists (predefined pools, allocated subnets and reserved networks) being sorted and logically combines 'allocated' and 'reserved' through a 'double cursor' to iterate on both lists at the same time while preserving the total order. At the same time, it iterates over 'predefined' pools and looks for the first empty space that would be a good fit.
Currently, the size of the allocated subnet is still dictated by each 'predefined' pools. We should consider hardcoding that size instead, and let users specify what subnet size they want. This wasn't possible before as the subnets were generated upfront. This new allocator should be able to deal with this easily.
The method used for static allocation has been updated to make sure the ascending order of 'allocated' is preserved. It's bug-for-bug compatible with the previous implementation.
One consequence of this new algorithm is that we don't keep track of where the last allocation happened, we just allocate the first free subnet we find.
Before:
Now, the 3rd allocation would yield 10.0.1.0/24 once again.
As it doesn't change the semantics of the allocator, there's no reason to worry about that.
Finally, about 'reserved networks'. The heuristics we use are now properly documented. It was discovered that we don't check routes for IPv6 allocations -- this can't be changed because there's no such thing as on-link routes for IPv6.
(Kudos to Rob Murray for coming up with the linear-search idea.)
- How to verify it
CI -- a bunch of tests have been added, some have been rewritten.
Or manually by creating, deleting and re-creating networks.
- Description for the changelog
- Introduce a new subnet allocator that can deal with IPv6 address pools of any size
- A picture of a cute animal (not mandatory but encouraged)