Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JFROG Artifactory Compatibility #383

Merged
merged 6 commits into from
Mar 15, 2023

Conversation

Danielku15
Copy link
Contributor

Resolves #382

This PR enables compatibility with JFROG Artifactory docker registries to push images.

There are two compat issues:

  • Artifactory parses the history of the layer config. This parsing seems buggy which leads to the manifest to be rejected. A new option allows omitting of the history in the config. Auto detecting would require a workflow change that we contact the registries before we start building the image.
  • The Accept HTTP header is validated on certain API calls which leads to failed pushes. I changed to only provide the headers on pull/load related API calls.

I tested the changes against our company internal Artifactory instance by adapting the EndToEndTests.ApiEndToEndWithRegistryPushAndPull and I can successfully push against our registry:

image

@baronfel baronfel requested a review from a team March 8, 2023 13:57
Copy link
Member

@baronfel baronfel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thank you for the investigation and fix. Is there any way we can detect from the error response during the push that

a) the registry is JFrog (and so we might suggest setting the new flag in the user's project file/props)
b) the error is informative enough to suggest doing the same irrespective of registry?

We may need to have a new docs section for specific guidance for registries.

I would like to add an explicit test for jfrog in baronfel/sdk-container-demo, but it doesn't look like there's a image or a public free service that I can use. Do you have any pointers there?

@Danielku15
Copy link
Contributor Author

a) the registry is JFrog (and so we might suggest setting the new flag in the user's project file/props)

The error is rather cryptic but yes it would be possible to detect JFrog Artifactory. They have a special HTTP Header in their API responses we could use:
image

If the manifest upload fails with such a "manifest invalid" error, and we detect a JFrog/Artifactory header, we could report an additional info.

b) the error is informative enough to suggest doing the same irrespective of registry?

Unfortunately not really. It is partially a bug and/or a very strict validation of layers against history items. The error is always a "manifest invalid" which could also have other causes. The specific thing about history was tricky to find out.
An interesting analysis was done here.

I would like to add an explicit test for jfrog in baronfel/sdk-container-demo, but it doesn't look like there's a image or a public free service that I can use. Do you have any pointers there?

There is a JFrog Artifactory OSS Docker image you could use.

docker pull docker.bintray.io/jfrog/artifactory-oss:latest (https://www.jfrog.com/confluence/display/RTF6X/Installing+with+Docker)

@baronfel
Copy link
Member

baronfel commented Mar 8, 2023

So an alternative here would be to try and 'fix' the history in our generated images, creating a history entry for any empty layers that don't have one?

@Danielku15
Copy link
Contributor Author

So an alternative here would be to try and 'fix' the history in our generated images, creating a history entry for any empty layers that don't have one?

In theory it could be an approach/attempt to create a history which satisfies the JFrog Artifactory checks. But there is not really a formal specification behind this it seems. So we would need to rely on "decompiling" and finding out the current implementation and hope it will not change. 😓

@baronfel
Copy link
Member

baronfel commented Mar 9, 2023

@Danielku15 there is some precedent here - here's how jib handles this mismatch for example(relevant issue discussion) - I'd really like to try that before adding a new flag that users have to remember.

@Danielku15
Copy link
Contributor Author

@baronfel I will give it a try and see how it goes. I think handling too many history entries and removing some shouldn't be needed. Then already the push of the base layer should fail and that cannot be solved by sdk-container-builds anyhow. What do you think?

@Danielku15
Copy link
Contributor Author

Made the change and tested it against our Artifactory instance. I can confirm that matching the number of non empty_layer history entries and layers must match to make Artifactory accept it.

@Danielku15 Danielku15 marked this pull request as draft March 10, 2023 10:23
@Danielku15
Copy link
Contributor Author

Danielku15 commented Mar 10, 2023

I had to convert this PR back into a draft. I updated our internal build of sdk-container-builds to the latest state of this branch. While the whole build and push works, pulling back the image currently fails with a:

tag: Pulling from docker-dev/test-image
3f9582a2cbe7: Already exists
80b88e0117ce: Already exists
7324147f661f: Already exists
662491a71d9f: Already exists
c8dbc1086d50: Already exists
97f661bbb5bf: Verifying Checksum
filesystem layer verification failed for digest sha256:97f661bbb5bfc9c57c3c3975de5114e3807c1512aa67bd90ef2885f553246dac

I have to dig into this topic first. I am not sure at this point if this problem still relates to Artifactory or is a general problem with the layer creation.

**Edit: ** It seems that Artifactory handles the Content-Range header wrong on chunked uploads in the way we send it. In the patch response we always receive Range: 0-65535 and also on the server I can see that the blob has only the size of 1 chunk instead of being concatenated. Uploading the blobs as whole single request works fine. I would assume that normally a docker push also sends things chunked and we have normal images which are handled fine by Artifactory.

@Danielku15
Copy link
Contributor Author

Danielku15 commented Mar 13, 2023

@baronfel It seems chunking is not supported by Artifactory correctly. Most tools are not trying to upload things in smaller chunks but it might be rather a recovery mechanism. My proposal would be to maybe fully remove the chunked upload and upload only a single full chunk in all cases. GCR already has problems hence this code exists.

This might also be a root cause for #315 Upload in many small chunks can cost quite some time. Just uploading it as a whole is likely a lot faster (increasing buffer size might be boost further).

Here my analysis to justify that the removal of chunked upload would be a good choice:

GoogleContainerTools/jib (12.6k stars)
They are not using chunked uploads.
https://github.com/GoogleContainerTools/jib/blob/3043460ad34cd3ecefbef2e72fd264f6bc56fa3c/jib-core/src/main/java/com/google/cloud/tools/jib/registry/RegistryClient.java#L542-L566
https://github.com/GoogleContainerTools/jib/blob/3043460ad34cd3ecefbef2e72fd264f6bc56fa3c/jib-core/src/main/java/com/google/cloud/tools/jib/http/BlobHttpContent.java#L26

https://github.com/containers/image (733 stars)
They are not using chunked uploads but there is a comment for maybe extending it.
https://github.com/containers/image/blob/3d74c68115363294edbc4a644abceeb94dd1cc77/docker/docker_image_dest.go#L163

https://github.com/devcontainers/cli/ (635 stars)
They are not using chunked uploads.
https://github.com/devcontainers/cli/blob/946841a3503921a0528280a9ac4f9c5b7190f7b6/src/spec-configuration/containerCollectionsOCIPush.ts#L257

https://github.com/openSUSE/openSUSE-release-tools (50 stars)
They are not using chunked uploads.
https://github.com/openSUSE/openSUSE-release-tools/blob/7eca6b9c0362731f560684a78e3c92f32b5cdf8a/docker_registry.py#L207

Original Docker CLI
The blob upload is also not chunked. The log I captured through fiddler (cleaned from sensitive information) follows. Notice the patch call which does a simple upload without any Content-Range set.

POST /v2/docker-dev/apiendtoendwithlocalload/blobs/uploads/ HTTP/1.1
Host: artifactory.mycompany.com
User-Agent: docker/20.10.22 go/go1.18.9 git-commit/42c8b31 kernel/5.15.49-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.22 \(windows\))
Content-Length: 0
Authorization: Bearer ###
Accept-Encoding: gzip
Connection: close

    HTTP/1.1 202 Accepted
    Content-Length: 0
    Date: Mon, 13 Mar 2023 10:07:01 GMT
    Docker-Distribution-Api-Version: registry/2.0
    Docker-Upload-Uuid: 87aff5ea-08ce-4909-92f0-70f7e3ce6b9d
    Location: https://artifactory.mycompany.com/v2/docker-dev/apiendtoendwithlocalload/blobs/uploads/87aff5ea-08ce-4909-92f0-70f7e3ce6b9d
    X-Artifactory-Id: ###
    X-Artifactory-Node-Id: ###
    X-Jfrog-Version: Artifactory/7.41.14 74114900
    Connection: close
    Set-Cookie: ###; path=/; Httponly; Secure
    Strict-Transport-Security: max-age=16070400; includeSubDomains


PATCH /v2/docker-dev/apiendtoendwithlocalload/blobs/uploads/87aff5ea-08ce-4909-92f0-70f7e3ce6b9d HTTP/1.1
Host: artifactory.mycompany.com
User-Agent: docker/20.10.22 go/go1.18.9 git-commit/42c8b31 kernel/5.15.49-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.22 \(windows\))
Transfer-Encoding: chunked
Authorization: Bearer ###
Accept-Encoding: gzip
Connection: close

    HTTP/1.1 202 Accepted
    Content-Length: 0
    Date: Mon, 13 Mar 2023 10:07:09 GMT
    Docker-Distribution-Api-Version: registry/2.0
    Docker-Upload-Uuid: 87aff5ea-08ce-4909-92f0-70f7e3ce6b9d
    Location: https://artifactory.mycompany.com/v2/docker-dev/apiendtoendwithlocalload/blobs/uploads/87aff5ea-08ce-4909-92f0-70f7e3ce6b9d
    Range: 0-32513513
    X-Artifactory-Id: ###
    X-Artifactory-Node-Id: ###
    X-Jfrog-Version: Artifactory/7.41.14 74114900
    Connection: close
    Set-Cookie: ###; path=/; Httponly; Secure
    Strict-Transport-Security: max-age=16070400; includeSubDomains


PUT /v2/docker-dev/apiendtoendwithlocalload/blobs/uploads/87aff5ea-08ce-4909-92f0-70f7e3ce6b9d?digest=sha256%3A21a01b1b748806a10b92e6ee8c1171a41075dfa010dfa15b5d11867c4d00c2ad HTTP/1.1
Host: artifactory.mycompany.com
User-Agent: docker/20.10.22 go/go1.18.9 git-commit/42c8b31 kernel/5.15.49-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.22 \(windows\))
Content-Length: 0
Authorization: Bearer ###
Accept-Encoding: gzip
Connection: close

    HTTP/1.1 201 Created
    Content-Length: 0
    Date: Mon, 13 Mar 2023 10:07:09 GMT
    Docker-Content-Digest: sha256:21a01b1b748806a10b92e6ee8c1171a41075dfa010dfa15b5d11867c4d00c2ad
    Docker-Distribution-Api-Version: registry/2.0
    Location: https://artifactory.mycompany.com/v2/docker-dev/apiendtoendwithlocalload/blobs/sha256:21a01b1b748806a10b92e6ee8c1171a41075dfa010dfa15b5d11867c4d00c2ad
    X-Artifactory-Id: ###
    X-Artifactory-Node-Id: ###
    X-Jfrog-Version: Artifactory/7.41.14 74114900
    Connection: close
    Set-Cookie: ###; path=/; Httponly; Secure
    Strict-Transport-Security: max-age=16070400; includeSubDomains

@baronfel
Copy link
Member

baronfel commented Mar 13, 2023

I talked with the team and we'd accept a change in default for chunking from enabled-by-default to disabled-by-default 👍 Once that change happens I can test it over on baronfel/sdk-container-demo to make sure we don't regress on any of the other providers.

@Danielku15
Copy link
Contributor Author

Made the change by adding a new setting hidden behind a environment variable to control the chunked upload.

@Danielku15 Danielku15 marked this pull request as ready for review March 13, 2023 16:53
@baronfel
Copy link
Member

Thanks for adding that flag! That seems to work for most registries (including a JFrog instance I stood up in the cloud on trial), but GitHub is failing on the 'finalize' call: https://github.com/baronfel/sdk-container-demo/actions/runs/4407492891/jobs/7721270965. I don't see what would cause this initially, though.

@Danielku15
Copy link
Contributor Author

No clue at this point what could cause this 🤔 . I will have to debug that registry locally to see what's going on there. The only thing I see from reading the code is that we maybe not dispose all IDisposables like all HttpResponseMessage and StreamContent. But I would expect the server to report an error on the blob upload (PATCH), not on the finish of the upload session (PUT). I have to double check the response if it reports something suspicious.

@Danielku15
Copy link
Contributor Author

@baronfel Can you please try to re-run the job? I tested some things locally and cannot directly reproduce this error locally and was able to push a image to GHCR. I could see a different error during my tests I had to workaround:

For some reason, on the blob upload call, ghcr.io does not respond with a 401 Unauthorized but just drops the TCP connection. I locally made a small workaround to get the auth info filled directly and the push succeeded fine. I will open a different issue for this authentication handler "problem".

@baronfel
Copy link
Member

Here's job re-run number 3 - still the same error. We have a GitHub request ID this time due to the logging I added so I've reached out to our contacts on GitHub Packages to see if they have any more detailed information.

@baronfel
Copy link
Member

@Danielku15 I added you and the other maintainers of this repo as collaborators on baronfel/sdk-container-demo, so you should be able to more easily test iterations of changes to this repo via the GitHub Actions using the secrets I've already provisioned.

@Danielku15
Copy link
Contributor Author

@baronfel Used the main branch package and can confirm that it is already broken on main: https://github.com/baronfel/sdk-container-demo/actions/runs/4424217344/jobs/7757795404 AWS reports some problem with the layer digests:

None of the provided layer digests match the calculated layer digest

I haven't checked yet what could cause it. Things are quite busy for me these days but I try to catch up on this topic soon.

@baronfel
Copy link
Member

I confirm your results (which is quite confusing). I'll merge this and we can continue to dig in.

@baronfel baronfel merged commit 6a6571f into dotnet:main Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upload incompatible with JFROG Artifactory
3 participants