Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windocker: script.sh.copy: No such file or directory #3714

Closed
sxa opened this issue Aug 14, 2024 · 10 comments · Fixed by adoptium/ci-jenkins-pipelines#1117
Closed

windocker: script.sh.copy: No such file or directory #3714

sxa opened this issue Aug 14, 2024 · 10 comments · Fixed by adoptium/ci-jenkins-pipelines#1117
Assignees
Labels
docker os:windows secure-dev Issues specific to SSDF/SLSA compliance work

Comments

@sxa
Copy link
Member

sxa commented Aug 14, 2024

This is seen periodically in the windbld jobs - maybe just after error conditions on previous runs but that is not certain. It is often resolved by removing the C:\jw\workspace\build-scripts directory, although I have seen situations where I've done that, run another build which has failed, cleared it again and it works, so it's unclear if we're experience some delay somewhere in the clearup having the desired effect. The root cause of this error is currently unknown.

Noting that to run a test multiple times without taking up a full build cycle you can set "JAVA_TO_BUILD": "jdkXXu", in the job which will start the job but abort with an error about the java version after the point at which this failure occurs.

@sxa
Copy link
Member Author

sxa commented Aug 19, 2024

Noting that ls -l on the host shows that the user of the files under build-scripts\job\jdk21u\windbld@tmp\durable*` (including script.sh.copy) as the user that jenkins is running under on the host. When the same ls is run in a container it shows as Unknown+User:Unknown+Group. Files created within the container (such as the workspace directory under windbld) shows as ContainerUser:ContainerUser when viewed from inside the container. Confusingly, those also show as the same user that jenkins is running at when looked at on the host.

@sxa sxa self-assigned this Aug 19, 2024
@sxa
Copy link
Member Author

sxa commented Aug 19, 2024

The attempt to use .gitconfig in C:\jw isn't working. If Iissue a git config --global -l from within the workflow I get a failure:

12:22:29  + git config --global -l
12:22:29  fatal: unable to read config file '/cygdrive/c/jw/.gitconfig': No such file or directory

If I issue that immediately after adding a safe.directory parameter with git config then it shows the correct value so it's using a git configuration from elsewhere at that point.

If I move it out of the way then it fails earlier in the pipeline:

12:47:36  [CHECKOUT] Checking out User Pipelines https://github.com/sxa/ci-jenkins-pipelines.git : windows_docker_support
[Pipeline] checkout
12:47:36  The recommended git tool is: git
12:47:36  No credentials specified
12:47:36  Warning: JENKINS-30600: special launcher org.jenkinsci.plugins.docker.workflow.WithContainerStep$Decorator$1@1f132a55; decorates hudson.plugins.cygpath.CygpathLauncherDecorator$1@cef8bfd will be ignored (a typical symptom is the Git executable not being run inside a designated container)
12:47:36  Cloning the remote Git repository
12:47:36  ERROR: Error cloning remote repo 'origin'
12:47:36  hudson.plugins.git.GitException: Command "git fetch --tags --force --progress -- https://github.com/sxa/ci-jenkins-pipelines.git +refs/heads/*:refs/remotes/origin/*" returned status code 128:

So for now it seems that this gitconfig file, and whatever it's using when I explicitly add in the safe.directory setting, are both required, so I'll leave both in place. Note that before I set the safe.directory options I have configured that the HOME variable is set to the jw directory via a sh -c set command.

@sxa
Copy link
Member Author

sxa commented Aug 20, 2024

Ref: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/

Note that after a successful (ish) jdk8u build (169) I had two consecutive failures trying to kick off jdk21u (170,171), but then the third one (172) passed that step without requiring the workspace to be moved out of the way.

@sxa sxa added the secure-dev Issues specific to SSDF/SLSA compliance work label Aug 22, 2024
@sxa
Copy link
Member Author

sxa commented Aug 29, 2024

Based on some investigations in #3723 I tried changing the ownership of the @tmp directory so that it was definitely owned by ContainerUser but that didn't make a difference. The first time running after I explicitly removed the @tmp directory the job started to run through successfully. We will see if that is repeatable.

@sxa
Copy link
Member Author

sxa commented Aug 30, 2024

Answer: No.
After jdk21u completed (subject to #3709) in windbld run 242, jobs 244 and 245 failed, but the following 246 passed - all were run after clearing out the @tmp and cyclonedx-lib directories.

247 run afterwards then went straight through without problems (Again after removing those two directories).

So we still have inconsistencies. I'm thinking it would be nice to get a simple pipeline which starts a container and is able to demonstrate this, since out multi-thousand line monolith isn't ideal for problem reproduction/raising upstream,

@sxa
Copy link
Member Author

sxa commented Sep 2, 2024

I've just tested this using a standalone jenkins pipeline:

pipeline {
    agent any
    stages {
        stage('Test Docker on Windows') {
            agent { docker { image 'notrhel_build_image' } }
                steps {
                        println('Attempting to run commands in docker container')
                        sh(script: 'cmd /c echo Hello')
                        sh(script: 'hostname')
                        sh(script: 'ls -l c:/')
                        sh(script: 'ls -l c:/workspace')
                        sh(script: 'ls -l c:/workspace/workspace')
                        sh(script: 'ls -l c:/workspace/workspace/windtest')
                }
        }
   }
}

Running a sequence of jobs I had the error after a varying amount of failures: 5,1,6,0,0,1,1,0,0,0,0 (The 6 passed all of them!!)

@sxa
Copy link
Member Author

sxa commented Sep 2, 2024

Running the same jobs with bat() instead of sh() appears to pass reliably. Intriguing ...

@sxa
Copy link
Member Author

sxa commented Sep 5, 2024

Noting also that having git bash in the path first (before the Cygwin one) makes no difference - the error still occurs.

@sxa
Copy link
Member Author

sxa commented Sep 9, 2024

Memo to self: We have some functions executed in Windows pipelines that are run on either Windows or UNIX systems depending on the pipeline - specifically the writeMetadata function https://github.com/adoptium/ci-jenkins-pipelines/blob/4bfdbb67722dd7e96b256511ac6586e749650524/pipelines/build/common/openjdk_build_pipeline.groovy#L1280

  • Windows/jdk8u build with ENABLE_SIGNER=false - temperamental with sh (windbld#484) (eyecatcher
  • Windows/jdk21u: Gives Batch scripts can only be run on Windows nodes in listArchives at same place in windbld#482 when allowBat=true in listArchives (After OUT OF DOCKER NODE when it switches back to jenkins-worker

I'm going to leave this with sh in these cases for now, and switch attention to another PR.

@sxa
Copy link
Member Author

sxa commented Sep 10, 2024

Memo to self: We have some functions executed in Windows pipelines that are run on either Windows or UNIX systems depending on the pipeline

Now sorted that case - using an isUnix() test which actually tests for "Not Windows" as the machine it's running on which is much better than my previous check which tested whether we were doing a windows build in a docker pipeline. (I hadn't spotted that built-in function previously)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docker os:windows secure-dev Issues specific to SSDF/SLSA compliance work
Projects
None yet
1 participant