Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] BuildStrategy: Cannot Use Context Dir as Working Directory #1573

Open
1 task done
adambkaplan opened this issue Apr 11, 2024 · 8 comments
Open
1 task done

[BUG] BuildStrategy: Cannot Use Context Dir as Working Directory #1573

adambkaplan opened this issue Apr 11, 2024 · 8 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation.
Milestone

Comments

@adambkaplan
Copy link
Member

Is there an existing issue for this?

  • I have searched the existing issues

Kubernetes Version

k8s: 1.28.7
Tekton Pipelines: 0.56.2

Shipwright Version

0.12.0

Current Behavior

When authoring a build strategy, builds risk failure if a build strategy step has its workingDir set to a sub-directory and source is cloned from git. Git expects the target directory of any clone action to be empty.

Setting workingDir to a sub-directory of the source root (ex: contextDir) results in errors like the following:

2024/04/11 16:39:28 /usr/bin/git -c safe.directory=/workspace/source clone -h
2024/04/11 16:39:28 /usr/bin/git -c safe.directory=/workspace/source submodule -h
2024/04/11 16:39:28 /usr/bin/git -c safe.directory=/workspace/source clone --quiet --no-tags --single-branch --depth 1 -- https://github.com/redhat-developer-demos/quinoa-wind-turbine.git /workspace/source
2024/04/11 16:39:28 fatal: destination path '/workspace/source' already exists and is not an empty directory. (exit code 128)

Expected Behavior

Ideally build steps succeed if the directory is a subPath of the working directory. However, this may prove difficult due to the way Tekton, Kubernetes, and potentially the underlying container runtime operate (everything runs in a single TaskRun/Pod today).

Steps To Reproduce

  1. Create a build strategy that has a step workingDir set to a sub-path of $(params.shp-source-root)
  2. Run a Build with this strategy that clones source from git.

Anything else?

This is perhaps something that we document as a known issue - ex: in a guide for Build Strategy authors.

@adambkaplan adambkaplan added the kind/bug Categorizes issue or PR as related to a bug. label Apr 11, 2024
@adambkaplan adambkaplan added the kind/documentation Categorizes issue or PR as related to documentation. label Apr 18, 2024
@adambkaplan
Copy link
Member Author

Refinement - this is likely a limitation of how Tekton creates containers in a TaskRun. A guide for build strategy authors should call this out.

@adambkaplan adambkaplan added this to the Backlog milestone Apr 18, 2024
@SaschaSchwarze0
Copy link
Member

@adambkaplan I would consider this a bug in our Git step implementation. The step could check the existing sub-directories and if there are any, clone into a temporary directory and then move content. Or delete the sub-directories (would that break the parallel steps) and after clone finished verify that they were recreated.

@adambkaplan
Copy link
Member Author

The step could check the existing sub-directories and if there are any, clone into a temporary directory and then move content.

This feels like a lot of extra work, and can be error/risk prone. I also think this encourages "bad" behavior of expecting additional content to exist alongside source code as part of a build process. Things like dependency caches IMO should be configurable and located outside of the source code directory. I'm personally fine keeping this as a known issue/limitation, as this really only impacts strategy authors/platform teams.

Or delete the sub-directories (would that break the parallel steps) and after clone finished verify that they were recreated.

I think that would break the other steps in the build. IIRC Tekton has an "entrypoint" mechanism that starts all TaskRun containers at the same time, then effectively pauses/sleeps them to execute in the right order.

@SaschaSchwarze0
Copy link
Member

I'm personally fine keeping this as a known issue/limitation, as this really only impacts strategy authors/platform teams.

I would say we document it but would still try to resolve it. It is a not so nice limitation and we have own build strategies that work around it like https://github.com/shipwright-io/build/blob/v0.13.0/samples/v1beta1/buildstrategy/ko/buildstrategy_ko_cr.yaml#L104. So, I can understand that build strategy authors run into that issue.

What I would be interested is your opinion on the Tekton behavior. Tekton could easily remove the workingDir from the container and start the step command in the workingDir from their entrypoint (or fail if at that time the directory does not exist).

@SaschaSchwarze0
Copy link
Member

Interesting. I can locally reproduce the Git behavior, but in Shipwright, this sometimes works:

cat <<EOF | kubectl create -f -
apiVersion: shipwright.io/v1beta1
kind: BuildStrategy
metadata:
  name: source-context-working-dir
spec:
  steps:
    - name: noop
      image: registry.access.redhat.com/ubi9/ubi-minimal
      workingDir: $(params.shp-source-context)
      command:
        - ls
      args:
        - $(params.shp-source-context)
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
EOF

$ shp build create source-context --source-url https://github.com/shipwright-io/sample-go --source-context-dir source-build --output-image dummy
Created build "source-context"

$ shp build run source-context --follow
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
Pod "source-context-5f8m2-kvg8b-pod" is in state "Pending"...
succeeded event for pod "source-context-5f8m2-kvg8b-pod" arrived before or in place of running event so dumping logs now
*** Pod "source-context-5f8m2-kvg8b-pod", container "step-source-default": ***

2024/11/16 19:44:39 Info: ssh (/usr/bin/ssh): OpenSSH_8.7p1, OpenSSL 3.0.7 1 Nov 2022
2024/11/16 19:44:39 Info: git (/usr/bin/git): git version 2.43.5
2024/11/16 19:44:39 Info: git-lfs (/usr/bin/git-lfs): git-lfs/3.4.1 (GitHub; linux arm64; go 1.21.13 (Red Hat 1.21.13-3.el9_4) X:strictfipsruntime)
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source clone -h
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source submodule -h
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source clone --quiet --no-tags --single-branch --depth 1 -- https://github.com/shipwright-io/sample-go /workspace/source
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source -C /workspace/source submodule update --init --recursive --depth 1
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source -C /workspace/source rev-parse --abbrev-ref HEAD
2024/11/16 19:44:39 Successfully loaded https://github.com/shipwright-io/sample-go (main) into /workspace/source
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source -C /workspace/source rev-parse --verify HEAD
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source -C /workspace/source log -1 --pretty=format:%an
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source -C /workspace/source show --no-patch --format=%ct
2024/11/16 19:44:39 /usr/bin/git -c safe.directory=/workspace/source -C /workspace/source rev-parse --abbrev-ref HEAD

*** Pod "source-context-5f8m2-kvg8b-pod", container "step-noop": ***

go.mod
main.go

Pod "source-context-5f8m2-kvg8b-pod" has succeeded!

The Pod contains the working-dir-initializer from Tekton which creates source-build, so /workspace/source is not empty.

grafik

The next run then failed:

$ shp build run source-context --follow
Pod "source-context-xphz7-dr42s-pod" is in state "Pending"...
Pod "source-context-xphz7-dr42s-pod" is in state "Pending"...
Pod "source-context-xphz7-dr42s-pod" is in state "Pending"...
Pod "source-context-xphz7-dr42s-pod" is in state "Pending"...
Pod "source-context-xphz7-dr42s-pod" is in state "Pending"...
Pod "source-context-xphz7-dr42s-pod" is in state "Pending"...
Pod "source-context-xphz7-dr42s-pod" in "Running" state, starting up log tail
[prepare] 2024/11/16 19:55:14 Entrypoint initialization
[source-default] 2024/11/16 19:55:16 Info: ssh (/usr/bin/ssh): OpenSSH_8.7p1, OpenSSL 3.0.7 1 Nov 2022
[source-default] 2024/11/16 19:55:16 Info: git (/usr/bin/git): git version 2.43.5
[source-default] 2024/11/16 19:55:16 Info: git-lfs (/usr/bin/git-lfs): git-lfs/3.4.1 (GitHub; linux arm64; go 1.21.13 (Red Hat 1.21.13-3.el9_4) X:strictfipsruntime)
[source-default] 2024/11/16 19:55:16 /usr/bin/git -c safe.directory=/workspace/source clone -h
[source-default] 2024/11/16 19:55:16 /usr/bin/git -c safe.directory=/workspace/source submodule -h
[source-default] 2024/11/16 19:55:16 /usr/bin/git -c safe.directory=/workspace/source clone --quiet --no-tags --single-branch --depth 1 -- https://github.com/shipwright-io/sample-go /workspace/source
[source-default] 2024/11/16 19:55:17 error: unable to create file source-build/go.mod: Permission denied
[source-default] error: unable to create file source-build/main.go: Permission denied
[source-default] fatal: unable to checkout working tree
[source-default] warning: Clone succeeded, but checkout failed.
[source-default] You can inspect what was checked out with 'git status'
[source-default] and retry with 'git restore --source=HEAD :/' (exit code 128)
[noop] 2024/11/16 19:55:18 Skipping step because a previous step failed
BuildRun "source-context-xphz7" has failed at step "step-source-default" because of GitError: error:  unable to create file source-build/go.mod: Permission denied
error:  unable to create file source-build/main.go: Permission denied
fatal:  unable to checkout working tree
warning:  Clone succeeded, but checkout failed.
and retry with 'git restore --source=HEAD : /' (exit code 128)
Step details:
  {
    "name": "step-source-default",
    "state": {
      "terminated": {
        "exitCode": 128,
        "reason": "Error",
        "message": "[{\"key\":\"shp-error-message\",\"value\":\"error:  unable to create file source-build/go.mod: Permission denied\\nerror:  unable to create file source-build/main.go: Permission denied\\nfatal:  unable to checkout working tree\\nwarning:  Clone succeeded, but checkout failed.\\nand retry with 'git restore --source=HEAD : /' (exit code 128)\",\"type\":1},{\"key\":\"shp-error-reason\",\"value\":\"GitError\",\"type\":1},{\"key\":\"StartedAt\",\"value\":\"2024-11-16T19:55:16.560Z\",\"type\":3}]",
        "startedAt": "2024-11-16T19:55:16Z",
        "finishedAt": "2024-11-16T19:55:17Z",
        "containerID": "containerd://4d2c77b2795590817e361399142fa2b2bebc71b02ea2213a9c0d530a55f829af"
      }
    },
    "lastState": {},
    "ready": false,
    "restartCount": 0,
    "image": "sha256:21f50544d04a70b16869df715ee818441b80042a1ce880b1f4c42b6722e20f3d",
    "imageID": "registry.saschaschwarze.de/shipwright-io/git@sha256:89e2f3d3f0bcb2657c5c446ecf3fb54197d4640b83824ac7de000ed5db1cf59b",
    "containerID": "containerd://4d2c77b2795590817e361399142fa2b2bebc71b02ea2213a9c0d530a55f829af",
    "started": false
  }
ERROR: buildrun pod "source-context-xphz7-dr42s-pod" has failed

@SaschaSchwarze0
Copy link
Member

There is a weird timing issue. Sometimes the directory created by the working dir initializer is there, and sometimes not (yet). I was able to always reproduce the second behavior by doing a sleep of one second before doing the clone.

I then tried to implement my idea from #1573 (comment) = to clone to a different directory and then move content over.

Code works perfect, but Tekton is broken here. The working dir initializer creates directories as root and without write permissions. That's just non-sense. Any Tekton workload that runs as non-root cannot work that way. Below is a codebase where I had commented out the "restore" in case the target entry already exists (source-build in my case), I also changed the build strategy to list the source root. One can see that all cloned content is present, but source-build has the wrong user to properly run it.

$ shp build run source-context --follow
Pod "source-context-svr5k-c8gt7-pod" is in state "Pending"...
Pod "source-context-svr5k-c8gt7-pod" is in state "Pending"...
Pod "source-context-svr5k-c8gt7-pod" is in state "Pending"...
Pod "source-context-svr5k-c8gt7-pod" is in state "Pending"...
Pod "source-context-svr5k-c8gt7-pod" is in state "Pending"...
Pod "source-context-svr5k-c8gt7-pod" in "Running" state, starting up log tail
[prepare] 2024/11/16 20:44:49 Entrypoint initialization
[source-default] 2024/11/16 20:44:51 Info: ssh (/usr/bin/ssh): OpenSSH_8.7p1, OpenSSL 3.2.2 4 Jun 2024
[source-default] 2024/11/16 20:44:51 Info: git (/usr/bin/git): git version 2.43.5
[source-default] 2024/11/16 20:44:51 Info: git-lfs (/usr/bin/git-lfs): git-lfs/3.4.1 (GitHub; linux arm64; go 1.21.13 (Red Hat 1.21.13-3.el9_4) X:strictfipsruntime)
[source-default] 2024/11/16 20:44:51 /usr/bin/git -c safe.directory=/workspace/source clone -h
[source-default] 2024/11/16 20:44:51 /usr/bin/git -c safe.directory=/workspace/source submodule -h
[source-default] 2024/11/16 20:44:52 /usr/bin/git -c safe.directory=/workspace/source/.tmp clone --quiet --no-tags --single-branch --depth 1 -- https://github.com/shipwright-io/sample-go /workspace/source/.tmp
[source-default] 2024/11/16 20:44:52 /usr/bin/git -c safe.directory=/workspace/source/.tmp -C /workspace/source/.tmp submodule update --init --recursive --depth 1
[source-default] 2024/11/16 20:44:53 /usr/bin/git -c safe.directory=/workspace/source/.tmp -C /workspace/source/.tmp rev-parse --abbrev-ref HEAD
[source-default] 2024/11/16 20:44:53 Successfully loaded https://github.com/shipwright-io/sample-go (main) into /workspace/source/.tmp
[source-default] 2024/11/16 20:44:53 /usr/bin/git -c safe.directory=/workspace/source/.tmp -C /workspace/source/.tmp rev-parse --verify HEAD
[source-default] 2024/11/16 20:44:53 /usr/bin/git -c safe.directory=/workspace/source/.tmp -C /workspace/source/.tmp log -1 --pretty=format:%an
[source-default] 2024/11/16 20:44:53 /usr/bin/git -c safe.directory=/workspace/source/.tmp -C /workspace/source/.tmp show --no-patch --format=%ct
[source-default] 2024/11/16 20:44:53 /usr/bin/git -c safe.directory=/workspace/source/.tmp -C /workspace/source/.tmp rev-parse --abbrev-ref HEAD
[noop] total 56
[noop] drwxrwxrwx 8 root root  4096 Nov 16 20:44 .
[noop] drwxrwxrwx 3 root root  4096 Nov 16 20:44 ..
[noop] drwxr-xr-x 8 1000 1000  4096 Nov 16 20:44 .git
[noop] drwxr-xr-x 3 1000 1000  4096 Nov 16 20:44 .github
[noop] -rw-r--r-- 1 1000 1000     5 Nov 16 20:44 .shpignore
[noop] -rw-r--r-- 1 1000 1000 11357 Nov 16 20:44 LICENSE
[noop] -rw-r--r-- 1 1000 1000   291 Nov 16 20:44 OWNERS
[noop] -rw-r--r-- 1 1000 1000  1377 Nov 16 20:44 README.md
[noop] drwxr-xr-x 2 1000 1000  4096 Nov 16 20:44 docker-build
[noop] drwxr-xr-x 2 1000 1000  4096 Nov 16 20:44 docker-build-with-args
[noop] drwxr-xr-x 2 root root  4096 Nov 16 20:44 source-build
[noop] drwxr-xr-x 3 1000 1000  4096 Nov 16 20:44 source-build-with-package

Let me try to find out if one can customize the working dir initialzer, otherwise I'll open an issue there.

@SaschaSchwarze0
Copy link
Member

Tekton creates the directory with 0755 permission.

Tekton runs workingdirinit as root in its default configuration. One can set the feature flag "set-security-context" to "true" to force it to run as non-root, but that is still not guaranteed to be the user that our build runs as.

Related Tekton issue: tektoncd/pipeline#6842

Idea: if the build strategy has a global securityContext with runAs, then we could set this in the podTemplate security context?

@SaschaSchwarze0
Copy link
Member

Idea: if the build strategy has a global securityContext with runAs, then we could set this in the podTemplate security context?

Does not work. I do not know why. A Pod with securityContext.runAsUser set to 1000 and an initContainer with the securityContext coming from set-security-context=true which has runAsNonRoot but no runAsUser. As a result, the directory is still created as root. Confused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation.
Projects
Status: No status
Development

No branches or pull requests

2 participants