Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix image push in Makefile for image-push jobs #1039

Conversation

tobiasgiese
Copy link
Member

@tobiasgiese tobiasgiese commented Nov 3, 2021

What this PR does / why we need it:

This PR should fix the failing post-cluster-api-provider-openstack-push-images job.
It seems that the first failing job was on September 2nd 11:02am, which was PR #981.

last successful job: Sep 01 10:35:07
first failing job: Sep 02 11:02:17

I found the following errors:

ERROR: (gcloud.builds.submit) build 4b63deb7-f8ff-45ca-8d00-977233c67a78 completed with status "FAILURE"

and

no such manifest: gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-arm:v20210902-v0.4.0-58-gbff2282b

If I run a make docker-build-arm docker-push-arm on the main branch:

$ make docker-build-arm docker-push-arm
make ARCH=arm docker-build
[...]
 => [builder 7/7] RUN --mount=type=cache,target=/root/.cache/go-build     --mount=type=cache,target=/go/pkg/mod     CGO_ENABLED=0 GOOS=linux GOARCH=arm     go build -ldflags "${ldflags} -extldflags '-static'"     -o manager .                                                                                                                                                      [...]
docker push gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-amd64:dev
The push refers to repository [gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-amd64]
35bc2872bd34: Preparing 
6d75f23be3dd: Preparing 
[...]

Note: it's defaulting the image tag to amd64

And the same command on this branch:

make docker-build-arm docker-push-arm
make ARCH=arm docker-build
[...]
 => CACHED [builder 8/8] RUN --mount=type=cache,target=/root/.cache/go-build     --mount=type=cache,target=/go/pkg/mod     CGO_ENABLED=0 GOOS=linux GOARCH=arm     go build -ldflags "${ldflags} -[...]
docker push gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-arm:dev
The push refers to repository [gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-arm]
35bc2872bd34: Preparing 
6d75f23be3dd: Preparing 
[...]

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • if necessary:
    • includes documentation
    • adds unit tests

/hold

Tobias Giese [email protected], Daimler TSS GmbH, legal info/Impressum

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 3, 2021
@netlify
Copy link

netlify bot commented Nov 3, 2021

✔️ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

🔨 Explore the source changes: 2c6ebb6

🔍 Inspect the deploy log: https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/618447807014300008895887

😎 Browse the preview: https://deploy-preview-1039--kubernetes-sigs-cluster-api-openstack.netlify.app

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 3, 2021
@tobiasgiese tobiasgiese force-pushed the tobiasgiese/fix-push-images-job branch from 3438b45 to 1a42a5b Compare November 3, 2021 21:23
@tobiasgiese tobiasgiese changed the title Re-add cache folder to fix push-images job Fix image push in Makefile for image-push jobs Nov 3, 2021
@tobiasgiese tobiasgiese changed the title Fix image push in Makefile for image-push jobs 🐛 Fix image push in Makefile for image-push jobs Nov 3, 2021
@tobiasgiese tobiasgiese force-pushed the tobiasgiese/fix-push-images-job branch from 1a42a5b to 2fcee3b Compare November 3, 2021 21:37
@tobiasgiese
Copy link
Member Author

/retest

@tobiasgiese tobiasgiese force-pushed the tobiasgiese/fix-push-images-job branch 2 times, most recently from f0b8e80 to 0c8d3df Compare November 4, 2021 07:47
@sbueringer
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2021
Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@sbueringer
Copy link
Member

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2021
@tobiasgiese
Copy link
Member Author

We should wait for kubernetes/test-infra#24227 to be merged first

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 4, 2021
@mdbooth
Copy link
Contributor

mdbooth commented Nov 4, 2021

We can simplify this bug as follows:

ROOT_DIR_RELATIVE := .
include $(ROOT_DIR_RELATIVE)/common.mk

TAG ?= dev
ARCH ?= amd64

STAGING_REGISTRY := gcr.io/k8s-staging-capi-openstack
REGISTRY ?= $(STAGING_REGISTRY)

IMAGE_NAME ?= capi-openstack-controller
CONTROLLER_IMG ?= $(REGISTRY)/$(IMAGE_NAME)
CONTROLLER_IMG_TAG ?= $(CONTROLLER_IMG)-$(ARCH):$(TAG)

.PHONY: docker-build
docker-build: ## Build the docker image for controller-manager
        /usr/bin/echo $(ARCH) $(CONTROLLER_IMG_TAG)

docker-build-%:
        $(MAKE) ARCH=$* docker-build

If we run make docker-build-arm we get:

$ make docker-build-arm
make ARCH=arm docker-build
make[1]: Entering directory '/home/mbooth/tmp'
/usr/bin/echo arm gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-amd64:dev
arm gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-amd64:dev
make[1]: Leaving directory '/home/mbooth/tmp'

Note the output line:

arm gcr.io/k8s-staging-capi-openstack/capi-openstack-controller-amd64:dev

We can see that the value of ARCH is 'arm', as expected, but the value of CONTROLLER_IMG_TAG, which contains ARCH, substituted the wrong value.

Note that CONTROLLER_IMG_TAG is a recursively expanded variable, which is the default kind of make variable. This means that the variable is expanded at the point of use, not the point of definition. But we can clearly see in this case that the expanded value of ARCH in CONTROLLER_IMG_TAG is not the same as the value of ARCH used on the same line.

The problem is that common.mk includes the directive .EXPORT_ALL_VARIABLES, which exists for compatibility with legacy versions of Make. Unfortunately its effect is global, and it changes the expected behaviour of make in this case. Specifically the parent make expands the value of CONTROLLER_IMG_TAG prior to calling the sub-make, which then does not substitute it because it is defined with ?=. Both of the following changes cause it to work again in different ways:

  • Change the definition of CONTROLLER_IMG_TAG to use = instead of ?=. This works because the submake simply overrides the value that it was passed.
  • Remove .EXPORT_ALL_VARIABLES from common.mk. This works because CONTROLLER_IMG_TAG is no longer passed implicitly by value to the subshell and there defaults as expected in the submake.

Of these I prefer the latter solution because the global behaviour of .EXPORT_ALL_VARIABLES is extremely surprising, and took 2 of us an entire morning to debug.

We introduced common.mk in #782, and @sbueringer explicitly mentioned in the PR that this was to sync us up with AWS. Looking at AWS, it was introduced there in kubernetes-sigs/cluster-api-provider-aws#2096 by @randomvariable. The PR doesn't explain why it was added there and the directive doesn't include a comment, but presumably of the commands added requires an environment variable to be set which is defined in the Makefile.

My preference would be to remove the directive and then fix whatever it broke by exporting any environment variables explicitly to the shell or sub-make which requires it. @randomvariable I don't suppose you remember what it was?

@randomvariable
Copy link
Member

.EXPORT_ALL_VARIABLES I mostly set to make it more Bash like, even though I normally hate Bash and global scope is bad. It's up to you really - no preference one way or another.

@mdbooth
Copy link
Contributor

mdbooth commented Nov 4, 2021

.EXPORT_ALL_VARIABLES I mostly set to make it more Bash like, even though I normally hate Bash and global scope is bad. It's up to you really - no preference one way or another.

Looking at this Makefile some more we use a huge number of submakes, most of which seem to be unnecessary. For example:

.PHONY: generate
generate: ## Generate code
        $(MAKE) generate-go
        $(MAKE) generate-manifests

This could simply be:

.PHONY: generate
generate: generate-go generate-manifests

As currently written we either need .EXPORT_ALL_VARIABLES or we need to explicitly export (CONTROLLER_GEN, CONVERSION_GEN, DEFAULTER_GEN), CRD_ROOT, WEBHOOK_ROOT, and RBAC_ROOT (but not to the CONTROLLER_GEN, CONVERSION_GEN, and DEFAULTER_GEN because they are 'simply' expanded with no default) to the submake.

A slightly more complex example may actually relate to the reason this was introduced, as it invokes gojq.

.PHONY: $(RELEASE_DIR)/$(MANIFEST_FILE).yaml
$(RELEASE_DIR)/$(MANIFEST_FILE).yaml:
       $(MAKE) compiled-manifest \
               PROVIDER=$(MANIFEST_FILE) \
               OLD_IMG=$(CONTROLLER_ORIGINAL_IMG) \
               MANIFEST_IMG=$(CONTROLLER_IMG) \
               CONTROLLER_NAME=$(CONTROLLER_NAME) \
               PROVIDER_CONFIG_DIR=$(CONFIG_DIR) \
               NAMESPACE=$(NAMESPACE)

.PHONY: compiled-manifest
compiled-manifest: $(RELEASE_DIR) $(KUSTOMIZE)
      $(MAKE) image-patch-source-manifest
      $(MAKE) image-patch-pull-policy
      $(MAKE) image-patch-kustomization
      $(KUSTOMIZE) build $(IMAGE_PATCH_DIR)/$(PROVIDER) > $(RELEASE_DIR)/$(PROVIDER).yaml

Can be written as:

compiled_manifest=$(RELEASE_DIR)/$(MANIFEST_FILE).yaml

.PHONY: $(compiled_manifest)
$(compiled_manifest): PROVIDER=$(MANIFEST_FILE)
$(compiled_manifest): OLD_IMG=$(CONTROLLER_ORIGINAL_IMG)
$(compiled_manifest): MANIFEST_IMG=$(CONTROLLER_IMG)
$(compiled_manifest): PROVIDER_CONFIG_DIR=$(CONFIG_DIR)
$(compiled_manifest): $(KUSTOMIZE) image-patch-source-manifest image-patch-pull-policy image-patch-kustomization | $(RELEASE_DIR)
          $(KUSTOMIZE) build $(IMAGE_PATCH_DIR)/$(PROVIDER) > $(RELEASE_DIR)/$(PROVIDER).yaml

Anyway, I strongly suspect that most of these things are never going to be a real issue to anybody, so rather than a mass cleanup it might be worth just removing .EXPORT_ALL_VARIABLES and fix up other targets as we encounter the need.

@mdbooth
Copy link
Contributor

mdbooth commented Nov 4, 2021

TL;DR

  • We use submakes far more than necessary
  • Submakes require a lot of explicit variable passing unless you specify .EXPORT_ALL_VARIABLES
  • .EXPORT_ALL_VARIABLES makes the semantics of variable expansion across a submake boundary really weird and hard to debug

A thorough fix would be:

  • Remove .EXPORT_ALL_VARIABLES
  • Fix resulting issues as they arise by:
    • Removing use of submakes where possible
    • Explicit variable passing where not possible

Alternative:

  • A big comment at the top of the Makefile pointing out the current situation
  • fix the current issue by changing the definition of CONTROLLER_IMG_TAG from ?= to = (and a comment)
  • possibly do the 'thorough' fix over time

@chrischdi
Copy link
Member

Thanks for the detailed analysis.

Just removing .EXPORT_ALL_VARIABLES sounds fine for me :-)

@tobiasgiese tobiasgiese force-pushed the tobiasgiese/fix-push-images-job branch from c4ebf02 to 12329eb Compare November 4, 2021 18:53
@tobiasgiese
Copy link
Member Author

tobiasgiese commented Nov 4, 2021

From me also many thanks for the detailed analysis @mdbooth!

I just pushed the commit with the removal of .EXPORT_ALL_VARIABLES. Everything else (i.e., reduce the submakes and overall refactoring of the Makefile) should be a follow-up PR

@tobiasgiese
Copy link
Member Author

/retest

@mdbooth
Copy link
Contributor

mdbooth commented Nov 4, 2021

That failed because we're no longer passing the Make variable DOCKER_BUILDKIT=1 to docker as an environment variable.

@tobiasgiese tobiasgiese force-pushed the tobiasgiese/fix-push-images-job branch from 12329eb to 727519e Compare November 4, 2021 20:39
@tobiasgiese
Copy link
Member Author

tobiasgiese commented Nov 4, 2021

That failed because we're no longer passing the Make variable DOCKER_BUILDKIT=1 to docker as an environment variable.

Thanks, got it. Fixed it. I also looked for other environment variables, that should have been all of them.

@mdbooth
Copy link
Contributor

mdbooth commented Nov 4, 2021

I just threw up #1040 so I can try things, but it's not going to replace this PR. Feel free to look at anything I throw in there or not.

@tobiasgiese
Copy link
Member Author

tobiasgiese commented Nov 5, 2021

@mdbooth the e2e tests are green now. Exporting the Docker and PATH variable seems to fix it.
So I would suggest to remove the hold label and merge this PR. WDYT?

Copy link
Contributor

@mdbooth mdbooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chrischdi, mdbooth, sbueringer, tobiasgiese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tobiasgiese
Copy link
Member Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2021
@k8s-ci-robot k8s-ci-robot merged commit b1f3767 into kubernetes-sigs:main Nov 5, 2021
@tobiasgiese tobiasgiese deleted the tobiasgiese/fix-push-images-job branch November 5, 2021 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants