Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for shared-size parameter for images queries #42531

Merged
merged 2 commits into from
Jul 19, 2021

Conversation

rvolosatovs
Copy link
Contributor

@rvolosatovs rvolosatovs commented Jun 16, 2021

- What I did

Guard shared size computations with shared-size boolean URL parameter.
The reasoning for this change is to be able to query image shared size without having to rely on the more heavyweight /system/df endpoint.

- How I did it

Add an additional boolean parameter to ImageService.Images and add a URL parameter to the API endpoint mapping to it's value.
On a fast search over the codebase I did not see another URL parameter consisting of 2 words, so not sure about the naming convention, if there is one - went for kebab case.

- How to verify it

root@28f161c787d2:/go/src/github.com/docker/docker# docker build bundles -f bundles/ubuntu.Dockerfile 
Sending build context to Docker daemon  203.5MB
Step 1/2 : FROM ubuntu
 ---> 9873176a8ff5
Step 2/2 : RUN echo test > testfile
 ---> Running in 99dad2de2d7f
Removing intermediate container 99dad2de2d7f
 ---> 3036176f9a83
Successfully built 3036176f9a83
root@28f161c787d2:/go/src/github.com/docker/docker# curl --unix-socket /var/run/docker.sock http://localhost/images/json?shared-size=1 | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   998  100   998    0     0   974k      0 --:--:-- --:--:-- --:--:--  974k
[
  {
    "Containers": -1,
    "Created": 1624443646,
    "Id": "sha256:3036176f9a83226548a7b19a2d2bcc922945c5f43b3d591eec719e9dcb22e602",
    "Labels": null,
    "ParentId": "sha256:9873176a8ff5ac192ce4d7df8a403787558b9f3981a4c4d74afb3edceeda451c",
    "RepoDigests": [
      "<none>@<none>"
    ],
    "RepoTags": [
      "<none>:<none>"
    ],
    "SharedSize": 72742757,
    "Size": 72742762,
    "VirtualSize": 72742762
  },
  {
    "Containers": -1,
    "Created": 1623972689,
    "Id": "sha256:9873176a8ff5ac192ce4d7df8a403787558b9f3981a4c4d74afb3edceeda451c",
    "Labels": null,
    "ParentId": "",
    "RepoDigests": [
      "ubuntu@sha256:aba80b77e27148d99c034a987e7da3a287ed455390352663418c0f2ed40417fe"
    ],
    "RepoTags": [
      "ubuntu:latest"
    ],
    "SharedSize": 72742757,
    "Size": 72742757,
    "VirtualSize": 72742757
  }
] 

Also with filtering:

root@28f161c787d2:/go/src/github.com/docker/docker# curl --unix-socket /var/run/docker.sock 'http://localhost/images/json?shared-size=1&filters=%7B%22reference%22%3A%7B%22ubuntu%22%3Atrue%7D%7D'  | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   335  100   335    0     0   327k      0 --:--:-- --:--:-- --:--:--  327k
[
  {
    "Containers": -1,
    "Created": 1623972689,
    "Id": "sha256:9873176a8ff5ac192ce4d7df8a403787558b9f3981a4c4d74afb3edceeda451c",
    "Labels": null,
    "ParentId": "",
    "RepoDigests": [
      "ubuntu@sha256:aba80b77e27148d99c034a987e7da3a287ed455390352663418c0f2ed40417fe"
    ],
    "RepoTags": [
      "ubuntu:latest"
    ],
    "SharedSize": 72742757,
    "Size": 72742757,
    "VirtualSize": 72742757
  }
]

- Description for the changelog

/images endpoint now supports boolean shared-size URL parameter, which enables shared size computation when set

@thaJeztah
Copy link
Member

Changing that boolean was my first idea, but (I think I left some notes on that on the internal ticket, but may have forgotten); we're mostly interested in the SharedSize; Setting the withExtraAttrs option will also collect the number of containers, which (at least at this point), I don't think we need. (Getting the list of containers may (potentially) have other implications, so if we don't need it (currently), we should probably avoid that)

Having a quick look, I think we should modify the logic in

if withExtraAttrs {
// lazily init variables
if imagesMap == nil {
allContainers = i.containers.List()
allLayers = i.layerStore.Map()
imagesMap = make(map[*image.Image]*types.ImageSummary)
layerRefs = make(map[layer.ChainID]int)
}
// Get container count
newImage.Containers = 0
for _, c := range allContainers {
if c.ImageID == id {
newImage.Containers++
}
}
// count layer references
rootFS := *img.RootFS
rootFS.DiffIDs = nil
for _, id := range img.RootFS.DiffIDs {
rootFS.Append(id)
chid := rootFS.ChainID()
layerRefs[chid]++
if _, ok := allLayers[chid]; !ok {
return nil, fmt.Errorf("layer %v was not found (corruption?)", chid)
}
}
imagesMap[img] = newImage
}
to split the "size" part from the "container count" part.

Looking at that code, I think there's an issue; the code was written for docker system df, which will call the function without any filters set;

func (i *ImageService) Images(imageFilters filters.Args, all bool, withExtraAttrs bool) ([]*types.ImageSummary, error) {

However, the SharedSize calculation is performed after filtering (which works in the docker system df case, as it doesn't do any filtering), but it will calculate the wrong shared sizes if the list is filtered (it would only take other images in the result into account).

Even if we would put the calculation at the start of the loop;

for id, img := range allImages {

It looks like allImages is not always "all images";

if danglingOnly {
allImages = i.imageStore.Heads()
} else {
allImages = i.imageStore.Map()
}

@rvolosatovs rvolosatovs changed the title api: request extra attrs on images query Compute image SharedSize by default Jun 22, 2021
@rvolosatovs
Copy link
Contributor Author

rvolosatovs commented Jun 22, 2021

Updated the OP, added 200fab4 , which is based on top of #42550

@thaJeztah I suppose the safest way to handle this would be to make this feature opt-in, do you agree? I think we could add a shared-size URL parameter to control this, what do you think?

@rvolosatovs rvolosatovs changed the title Compute image SharedSize by default Add support for shared-size parameter for images queries Jun 23, 2021
@rvolosatovs
Copy link
Contributor Author

rvolosatovs commented Jun 23, 2021

@thaJeztah I suppose the safest way to handle this would be to make this feature opt-in, do you agree? I think we could add a shared-size URL parameter to control this, what do you think?

Done in 0c51846

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to also update

moby/api/swagger.yaml

Lines 7185 to 7192 in 2bd46ed

parameters:
- name: "all"
in: "query"
description: "Show all images. Only images from a final layer (no children) are shown by default."
type: "boolean"
default: false
- name: "filters"
in: "query"

And to add a mention of the new query-arg in https://github.com/moby/moby/blob/2bd46ed7e5a016f21b9259cec269340420fa7a63/docs/api/version-history.md

api/server/router/image/image_routes.go Show resolved Hide resolved
@rvolosatovs rvolosatovs force-pushed the image_shared_size branch 2 times, most recently from cdaf727 to c99a946 Compare June 28, 2021 15:38
@rvolosatovs rvolosatovs requested a review from thaJeztah June 28, 2021 15:38
@tianon
Copy link
Member

tianon commented Jun 28, 2021

I was having trouble understanding the motivation for this change (and feeling like I should already understand what's going on here) and discussed with @thaJeztah, and he provided the following explanation which helped me understand the intent much better:

the system df command reuses the existing backend, but passes the "with extra attributes" argument, to calculate the shared size of images (and number of containers that use an image)

The PR is to expose the shared size option on the /images/json (docker images / docker image ls) endpoint, so that the information can also be retrieved outside of docker system df

Can we add something simple like that to explain the motivation to the PR description/commit? 🙏 😇

@rvolosatovs
Copy link
Contributor Author

I was having trouble understanding the motivation for this change (and feeling like I should already understand what's going on here) and discussed with @thaJeztah, and he provided the following explanation which helped me understand the intent much better:

the system df command reuses the existing backend, but passes the "with extra attributes" argument, to calculate the shared size of images (and number of containers that use an image)
The PR is to expose the shared size option on the /images/json (docker images / docker image ls) endpoint, so that the information can also be retrieved outside of docker system df

Can we add something simple like that to explain the motivation to the PR description/commit? pray innocent

Added

The reasoning for this change is to be able to query image shared size without having to rely on the more heavyweight /system/df endpoint.

to the OP, please let me know if you think this is not enough and whether we should record this in git history as well, for example.

@thaJeztah
Copy link
Member

thaJeztah commented Jun 28, 2021 via email

@rvolosatovs rvolosatovs force-pushed the image_shared_size branch 2 times, most recently from 4f0ed6d to 7a53506 Compare June 29, 2021 08:22
@rvolosatovs rvolosatovs requested a review from thaJeztah June 29, 2021 08:22
@rvolosatovs rvolosatovs force-pushed the image_shared_size branch 2 times, most recently from 0313854 to 185ac93 Compare June 30, 2021 09:20
@thaJeztah
Copy link
Member

Note: this should also update the default API version; we can merge #42063 before this once review progresses (I'll rebase that one, so that you can possibly include that in this PR; we can merge it separately or as part of this one (I somewhat prefer separately to make it easier to find back, but not either way would be fine)

@@ -26,7 +26,11 @@ func (daemon *Daemon) SystemDiskUsage(ctx context.Context) (*types.DiskUsage, er
}

// Get all top images with extra attributes
allImages, err := daemon.imageService.Images(filters.NewArgs(), false, true)
allImages, err := daemon.imageService.Images(ctx, types.ImageListOptions{
Filters: filters.NewArgs(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but wondering if we should change Filters to be a pointer, so that it can be left nil if no filtering is used

api/swagger.yaml Show resolved Hide resolved
api/server/router/image/image_routes.go Show resolved Hide resolved
api/server/router/image/image_routes.go Show resolved Hide resolved
@thaJeztah
Copy link
Member

Weird; Swagger validation is for some reason failing on a non-related change;

[2021-06-30T09:28:44.409Z] The swagger spec at "api/swagger.yaml" is invalid against swagger specification 2.0. see errors :
[2021-06-30T09:28:44.409Z] - definitions.NetworkSettings.properties.SecondaryIPAddresses.description in body must be of type string: "null"
[2021-06-30T09:28:44.409Z] - definitions.NetworkSettings.properties.SecondaryIPv6Addresses.description in body must be of type string: "null"

Looks like it complains about the empty description, but no clue why it fails on this PR (and not on other ones);

moby/api/swagger.yaml

Lines 1304 to 1318 in 7b9275c

# TODO is SecondaryIPAddresses actually used?
SecondaryIPAddresses:
description: ""
type: "array"
items:
$ref: "#/definitions/Address"
x-nullable: true
# TODO is SecondaryIPv6Addresses actually used?
SecondaryIPv6Addresses:
description: ""
type: "array"
items:
$ref: "#/definitions/Address"
x-nullable: true

@thaJeztah
Copy link
Member

opened #42583 to update the swagger files 🤷‍♂️

@rvolosatovs
Copy link
Contributor Author

rvolosatovs commented Jun 30, 2021

Weird; Swagger validation is for some reason failing on a non-related change;

[2021-06-30T09:28:44.409Z] The swagger spec at "api/swagger.yaml" is invalid against swagger specification 2.0. see errors :
[2021-06-30T09:28:44.409Z] - definitions.NetworkSettings.properties.SecondaryIPAddresses.description in body must be of type string: "null"
[2021-06-30T09:28:44.409Z] - definitions.NetworkSettings.properties.SecondaryIPv6Addresses.description in body must be of type string: "null"

Looks like it complains about the empty description, but no clue why it fails on this PR (and not on other ones);

moby/api/swagger.yaml

Lines 1304 to 1318 in 7b9275c

# TODO is SecondaryIPAddresses actually used?
SecondaryIPAddresses:
description: ""
type: "array"
items:
$ref: "#/definitions/Address"
x-nullable: true
# TODO is SecondaryIPv6Addresses actually used?
SecondaryIPv6Addresses:
description: ""
type: "array"
items:
$ref: "#/definitions/Address"
x-nullable: true

Yes, saw that as well, but assumed that is a known issue! Thanks for the fix!

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good (left some thoughts/feedback inline, but no strong "blockers").

If we go ahead with this PR, I suggest we merge #42063 first (separately), to make the API version update more easily discoverable (then rebase this PR)

api/types/client.go Outdated Show resolved Hide resolved
api/server/router/image/image_routes.go Outdated Show resolved Hide resolved
api/swagger.yaml Outdated Show resolved Hide resolved
@thaJeztah
Copy link
Member

TestClientWithRequestTimeout failure is unrelated; flaky test will hopefully be fixed by #42580

=== Failed
=== FAIL: pkg/plugins TestClientWithRequestTimeout (0.00s)
    client_test.go:254: assertion failed: expected an error, got nil: expected error

Failure on Windows is TestNetworkDBIslands - also a flaky test

Let me kick CI

@rvolosatovs rvolosatovs force-pushed the image_shared_size branch 2 times, most recently from 180445d to 9e4cbf3 Compare July 6, 2021 16:18
@rvolosatovs
Copy link
Contributor Author

Blocked by #42063 (will rebase and rerequest review once it is merged)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

@tianon @tonistiigi PTAL

Roman Volosatovs added 2 commits July 13, 2021 13:45
This makes it easier to add more options to the backend without having to change
the signature.

While we're changing the signature, also adding a context.Context, which is not
currently used, but probably should be at some point.

Signed-off-by: Roman Volosatovs <[email protected]>
The reasoning for this change is to be able to query image shared size without having to rely on the more heavyweight `/system/df` endpoint.

Signed-off-by: Roman Volosatovs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants