Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact test fail on first hydra command [error mounting "tmpfs" to rootfs at ".../.local": create mountpoint for .../.local mount: mkdirat .../.local: file exists: unknown.] #9645

Open
fruch opened this issue Jan 2, 2025 · 17 comments · Fixed by #9650
Assignees
Labels
Bug Something isn't working right Infrastructure P0 Critical

Comments

@fruch
Copy link
Contributor

fruch commented Jan 2, 2025

recently we run multiple times (2 so far), in a failure as such:

[2025-01-02T08:53:30.461Z] + echo 'Creating Argus test run ...'
[2025-01-02T08:53:30.461Z] Creating Argus test run ...
[2025-01-02T08:53:30.461Z] + [[ -n '' ]]
[2025-01-02T08:53:30.462Z] + export SCT_CLUSTER_BACKEND=gce
[2025-01-02T08:53:30.462Z] + SCT_CLUSTER_BACKEND=gce
[2025-01-02T08:53:30.462Z] + export SCT_CONFIG_FILES=test-cases/artifacts/debian11.yaml
[2025-01-02T08:53:30.462Z] + SCT_CONFIG_FILES=test-cases/artifacts/debian11.yaml
[2025-01-02T08:53:30.462Z] + ./docker/env/hydra.sh create-argus-test-run
[2025-01-02T08:53:30.462Z] Running on Build Server...
[2025-01-02T08:53:30.462Z] Pull version v1.83-pycodestyle-2.10.0 from Docker Hub...
[2025-01-02T08:53:30.462Z] v1.83-pycodestyle-2.10.0: Pulling from scylladb/hydra
[2025-01-02T08:53:30.462Z] 14726c8f7834: Pulling fs layer
[2025-01-02T08:53:30.462Z] 7d676dc8a994: Pulling fs layer
[2025-01-02T08:53:30.462Z] 03bdd165d0d2: Pulling fs layer
[2025-01-02T08:53:30.462Z] 428bad6fa242: Pulling fs layer
[2025-01-02T08:53:30.462Z] 63cd35141f3a: Pulling fs layer
[2025-01-02T08:53:30.462Z] 4e009f26260c: Pulling fs layer
[2025-01-02T08:53:30.462Z] df1671404e21: Pulling fs layer
[2025-01-02T08:53:30.462Z] b56fb9b9b894: Pulling fs layer
[2025-01-02T08:53:30.462Z] 9f5e78de2e2e: Pulling fs layer
[2025-01-02T08:53:30.462Z] 4b82f5a588e0: Pulling fs layer
[2025-01-02T08:53:30.462Z] b6a5cdc11fcc: Pulling fs layer
[2025-01-02T08:53:30.462Z] 9cafdf5e4bea: Pulling fs layer
[2025-01-02T08:53:30.462Z] 27ac63fc215a: Pulling fs layer
[2025-01-02T08:53:30.462Z] 428bad6fa242: Waiting
[2025-01-02T08:53:30.462Z] 63cd35141f3a: Waiting
[2025-01-02T08:53:30.462Z] 4e009f26260c: Waiting
[2025-01-02T08:53:30.462Z] df1671404e21: Waiting
[2025-01-02T08:53:30.462Z] b56fb9b9b894: Waiting
[2025-01-02T08:53:30.462Z] 9f5e78de2e2e: Waiting
[2025-01-02T08:53:30.462Z] 4b82f5a588e0: Waiting
[2025-01-02T08:53:30.462Z] 9cafdf5e4bea: Waiting
[2025-01-02T08:53:30.462Z] b6a5cdc11fcc: Waiting
[2025-01-02T08:53:30.462Z] 7d676dc8a994: Verifying Checksum
[2025-01-02T08:53:30.462Z] 7d676dc8a994: Download complete
[2025-01-02T08:53:30.462Z] 03bdd165d0d2: Verifying Checksum
[2025-01-02T08:53:30.462Z] 03bdd165d0d2: Download complete
[2025-01-02T08:53:31.661Z] 428bad6fa242: Verifying Checksum
[2025-01-02T08:53:31.662Z] 428bad6fa242: Download complete
[2025-01-02T08:53:31.662Z] 14726c8f7834: Verifying Checksum
[2025-01-02T08:53:31.662Z] 14726c8f7834: Download complete
[2025-01-02T08:53:31.662Z] 63cd35141f3a: Verifying Checksum
[2025-01-02T08:53:31.662Z] 63cd35141f3a: Download complete
[2025-01-02T08:53:31.662Z] 4e009f26260c: Verifying Checksum
[2025-01-02T08:53:31.662Z] 4e009f26260c: Download complete
[2025-01-02T08:53:31.662Z] df1671404e21: Verifying Checksum
[2025-01-02T08:53:31.662Z] df1671404e21: Download complete
[2025-01-02T08:53:31.662Z] b56fb9b9b894: Verifying Checksum
[2025-01-02T08:53:31.662Z] b56fb9b9b894: Download complete
[2025-01-02T08:53:32.618Z] 4b82f5a588e0: Verifying Checksum
[2025-01-02T08:53:32.618Z] 4b82f5a588e0: Download complete
[2025-01-02T08:53:32.618Z] b6a5cdc11fcc: Verifying Checksum
[2025-01-02T08:53:32.618Z] b6a5cdc11fcc: Download complete
[2025-01-02T08:53:33.489Z] 9cafdf5e4bea: Verifying Checksum
[2025-01-02T08:53:33.489Z] 9cafdf5e4bea: Download complete
[2025-01-02T08:53:35.370Z] 9f5e78de2e2e: Verifying Checksum
[2025-01-02T08:53:35.371Z] 9f5e78de2e2e: Download complete
[2025-01-02T08:53:35.371Z] 27ac63fc215a: Verifying Checksum
[2025-01-02T08:53:35.371Z] 27ac63fc215a: Download complete
[2025-01-02T08:53:36.345Z] 14726c8f7834: Pull complete
[2025-01-02T08:53:36.345Z] 7d676dc8a994: Pull complete
[2025-01-02T08:53:37.288Z] 03bdd165d0d2: Pull complete
[2025-01-02T08:53:37.288Z] 428bad6fa242: Pull complete
[2025-01-02T08:53:38.232Z] 63cd35141f3a: Pull complete
[2025-01-02T08:53:38.232Z] 4e009f26260c: Pull complete
[2025-01-02T08:53:38.232Z] df1671404e21: Pull complete
[2025-01-02T08:53:38.232Z] b56fb9b9b894: Pull complete
[2025-01-02T08:54:12.972Z] 9f5e78de2e2e: Pull complete
[2025-01-02T08:54:15.210Z] 4b82f5a588e0: Pull complete
[2025-01-02T08:54:16.297Z] b6a5cdc11fcc: Pull complete
[2025-01-02T08:54:17.242Z] 9cafdf5e4bea: Pull complete
[2025-01-02T08:54:41.653Z] 27ac63fc215a: Pull complete
[2025-01-02T08:54:41.654Z] Digest: sha256:6d8a84d2279cb8aed6326f115923984ba1fd8e031fbba020597998b2b578537d
[2025-01-02T08:54:41.654Z] Status: Downloaded newer image for scylladb/hydra:v1.83-pycodestyle-2.10.0
[2025-01-02T08:54:41.654Z] docker.io/scylladb/hydra:v1.83-pycodestyle-2.10.0
[2025-01-02T08:54:41.654Z] Going to run './sct.py  create-argus-test-run'...
[2025-01-02T08:54:45.515Z] docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "tmpfs" to rootfs at "/home/jenkins/.local": create mountpoint for /home/jenkins/.local mount: mkdirat /var/lib/docker/overlay2/cc834cabd7d7da1599f90777a4216fcc4aed218be3b821757c9829a5fb0a3cba/merged/home/jenkins/.local: file exists: unknown.

all the rest of the pipelien phase were running o.k. and passing

  • we update the builder image (with new java version) 2 days ago
  • we added docker join into the pipeline not long ago
  • seem like the issue is around the .local remount we introduce in 102e63d
@fruch fruch added Bug Something isn't working right and removed Bug Something isn't working right labels Jan 2, 2025
@fruch
Copy link
Contributor Author

fruch commented Jan 2, 2025

@soyacz

what do you think on this one ? i'm not sure how to investigate it further, or maybe to revert 102e63d

@soyacz
Copy link
Contributor

soyacz commented Jan 2, 2025

could be issue with .local: we bind -v "${HOME_DIR}:${HOME_DIR}" and later create ${HOME_DIR}/.local as tmpfs. maybe if builder is reused it cannot reuse such.
I think we should revert.

@vponomaryov
Copy link
Contributor

Only 2 artifact tests failed?
We run hydra commands everywhere...

What is special about artifacts tests? Do lots of them run simultaneously on a single builder?

Also, internet says that machine restart solves the problem...
So, sounds like a docker problem appearing in along run.

@fruch
Copy link
Contributor Author

fruch commented Jan 2, 2025

Only 2 artifact tests failed?
We run hydra commands everywhere...

What is special about artifacts tests? Do lots of them run simultaneously on a single builder?

Also, internet says that machine restart solves the problem...
So, sounds like a docker problem appearing in along run.

Probably it's a docker bug, the question is how to avoid it, cause we won't be able to solve it directly (maybe only to report it)

@vponomaryov
Copy link
Contributor

vponomaryov commented Jan 3, 2025

We know following:

So, I think we get such failures when we hit the RAM limit on the builders.

Possible solutions:

  • Increase instance types for builders which will have more RAM.
  • Increase SWAP size (or add if absent) on the builder nodes.
  • Reduce number of CI jobs per builder / increase number of builders.
  • Do all of the above

@fruch , @soyacz I really believe above must help, let's avoid reverting the mentioned change.
Also, do we monitor builders resource usage?

@fruch
Copy link
Contributor Author

fruch commented Jan 3, 2025

We know following:

So, I think we get such failures when we hit the RAM limit on the builders.

Possible solutions:

  • Increase instance types for builders which will have more RAM.
  • Increase SWAP size (or add if absent) on the builder nodes.
  • Reduce number of CI jobs per builder / increase number of builders.
  • Do all of the above

@fruch , @soyacz I really believe above must help, let's avoid reverting the mentioned change.
Also, do we monitor builders resource usage?

We can also limit the size of tmpfs usage

We need it to mask things, I don't know how much we use it

On Sunday I'll reduce the Jenkins excecuters to 1 per builder (it's now 4)

@fruch
Copy link
Contributor Author

fruch commented Jan 5, 2025

I've change the builders to 1 executer, for the time being.

and I'll open a PR to limit the tmpfs size

fruch added a commit to fruch/scylla-cluster-tests that referenced this issue Jan 5, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: scylladb#9645
fruch added a commit to fruch/scylla-cluster-tests that referenced this issue Jan 6, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: scylladb#9645
fruch added a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
mergify bot pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
mergify bot pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
mergify bot pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)

# Conflicts:
#	docker/env/hydra.sh
mergify bot pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
mergify bot pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
mergify bot pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)

# Conflicts:
#	docker/env/hydra.sh
vponomaryov pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
vponomaryov pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
vponomaryov pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
vponomaryov pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
vponomaryov pushed a commit that referenced this issue Jan 7, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
@tchaikov
Copy link
Contributor

tchaikov commented Jan 8, 2025

i am still seeing this failure.

13:50:19  Going to run './sct.py  create-argus-test-run'...
13:50:23  docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "tmpfs" to rootfs at "/home/jenkins/.local": create mountpoint for /home/jenkins/.local mount: mkdirat /var/lib/docker/overlay2/8f479ebcccb17344e8bd75f6b5ec0f8462c8318bfe915f659a81867406023f41/merged/home/jenkins/.local: file exists: unknown.

see https://jenkins.scylladb.com/job/releng-testing/job/artifacts-offline-install/job/artifacts-ubuntu2204-test/430/consoleFull

@fruch
Copy link
Contributor Author

fruch commented Jan 8, 2025

looks like I've missed the number of executors in the gcp builder configuration
switched it to 1 as well

@fruch fruch reopened this Jan 8, 2025
@fruch
Copy link
Contributor Author

fruch commented Jan 8, 2025

that means the change of tmpfs size, didn't really helped.

@mykaul mykaul added Bug Something isn't working right Infrastructure P0 Critical labels Jan 8, 2025
fruch added a commit that referenced this issue Jan 8, 2025
since we are might be running multiple hydra commands on
the same jenkins builder, and we are running into those kind
of error:
```
docker: Error response from daemon: failed to create task for container:
failed to create shim task: OCI runtime create failed: runc create failed:
unable to start container process: error during container init:
error mounting "tmpfs" to rootfs at "/home/jenkins/.local":
create mountpoint for /home/jenkins/.local mount:
mkdirat /var/lib/docker/.../.local: file exists: unknown.
```

we are trying to limit the memory to try to avoid this issue

Fixes: #9645
(cherry picked from commit 9a73120)
@fruch
Copy link
Contributor Author

fruch commented Jan 15, 2025

@vponomaryov

any more ideas on how to tackle this one ?

before we revert the tmpfs change, and break UX a bit ?

@vponomaryov
Copy link
Contributor

@vponomaryov

any more ideas on how to tackle this one ?

before we revert the tmpfs change, and break UX a bit ?

Where does it fail?

@fruch
Copy link
Contributor Author

fruch commented Jan 15, 2025

@vponomaryov
any more ideas on how to tackle this one ?
before we revert the tmpfs change, and break UX a bit ?

Where does it fail?

it was reported in multiple place (no all of them reported here)

last report I've seen from yesterday

Yaniv Kaul, Yesterday 6:23 PM
Artifact tests now fail on the same error as in PRs - 00:05:46.543  docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "tmpfs" to rootfs at "/home/jenkins/.local": create mountpoint for /home/jenkins/.local mount: mkdirat   - btw, we should add a timeout to those tests. They should take few minutes.

but with no reference to the job

@vponomaryov
Copy link
Contributor

vponomaryov commented Jan 15, 2025

@vponomaryov
any more ideas on how to tackle this one ?
before we revert the tmpfs change, and break UX a bit ?

Where does it fail?

it was reported in multiple place (no all of them reported here)

last report I've seen from yesterday

Yaniv Kaul, Yesterday 6:23 PM
Artifact tests now fail on the same error as in PRs - 00:05:46.543  docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "tmpfs" to rootfs at "/home/jenkins/.local": create mountpoint for /home/jenkins/.local mount: mkdirat   - btw, we should add a timeout to those tests. They should take few minutes.

but with no reference to the job

May it come from the SCT branches where we didn't port the size limiter to?
Which builders? AWS? GCE? Azure?
Before reverting it we should understand which SCT branches still fail this way and in which clouds.

I looked the following artifact tests:

And bunch of others in the https://jenkins.scylladb.com/job/scylla-master/job/artifacts/ dir.

And don't see there this kind of a failure.

@vponomaryov
Copy link
Contributor

@fruch I walked through lots of artifacts CI test runs which happened after the bug fix and don't see any occurrence of this bug.
We must be sure that no SCT branches without this fix are used, i.e. staging CI jobs or something.

I don't have evidence so far.

@tchaikov
Copy link
Contributor

tchaikov commented Jan 20, 2025

again, see https://jenkins.scylladb.com/job/releng-testing/job/artifacts-offline-install/job/artifacts-rocky8-test/544/consoleFull

14:39:20  Cloning repository [email protected]:scylladb/scylla-cluster-tests.git
14:39:21   > /usr/bin/git init /tmp/workspace/releng-testing/artifacts-offline-install/artifacts-rocky8-test/scylla-cluster-tests # timeout=10
14:39:22  Fetching upstream changes from [email protected]:scylladb/scylla-cluster-tests.git
14:39:22   > /usr/bin/git --version # timeout=10
14:39:22   > git --version # 'git version 2.43.0'
14:39:22  using GIT_SSH to set credentials New RSA key for slaves
14:39:22   > /usr/bin/git fetch --tags --force --progress -- [email protected]:scylladb/scylla-cluster-tests.git +refs/heads/*:refs/remotes/origin/* # timeout=10
14:40:06   > /usr/bin/git config remote.origin.url [email protected]:scylladb/scylla-cluster-tests.git # timeout=10
14:40:06   > /usr/bin/git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
14:40:21  Avoid second fetch
14:40:21  Checking out Revision 0c7fa60af1b7d83424aa144f664e4647964d56f9 (origin/master)
14:40:21   > /usr/bin/git rev-parse origin/master^{commit} # timeout=10
14:40:22   > /usr/bin/git config core.sparsecheckout # timeout=10
14:40:22   > /usr/bin/git checkout -f 0c7fa60af1b7d83424aa144f664e4647964d56f9 # timeout=10
14:40:33  Commit message: "fix(test-case): update 5000 tables test case configuration"
...
14:41:51  Status: Downloaded newer image for scylladb/hydra:v1.86-scylla-driver-3.28.0
14:41:51  docker.io/scylladb/hydra:v1.86-scylla-driver-3.28.0
14:41:51  Going to run './sct.py  create-argus-test-run'...
14:41:52  docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "tmpfs" to rootfs at "/home/jenkins/.local": create mountpoint for /home/jenkins/.local mount: mkdirat /var/lib/docker/overlay2/3424a63fae01391d75afa08fb4708631800210057a314596a1ae1f212e603c3e/merged/home/jenkins/.local: file exists: unknown.

@fruch
Copy link
Contributor Author

fruch commented Jan 20, 2025

Seems like dockerd isn't completely ready when we start calling it
we should add a retry there in that initial stage, and print the docker info for debugging

fruch added a commit to fruch/scylla-cluster-tests that referenced this issue Jan 21, 2025
since we are running into cases docker is failing
on this first stage, we would retry it 3 times
with hope it would be enough to avoid those type
of error

Ref: scylladb#9645
fruch added a commit that referenced this issue Jan 21, 2025
since we are running into cases docker is failing
on this first stage, we would retry it 3 times
with hope it would be enough to avoid those type
of error

Ref: #9645
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working right Infrastructure P0 Critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants