Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quarkus-integration-test-container-image-invoker tests do not pass on M1 mac/with a docker equivalent (podman, minikube, etc) #25230

Closed
holly-cummins opened this issue Apr 28, 2022 · 16 comments
Labels
area/testing kind/bug Something isn't working

Comments

@holly-cummins
Copy link
Contributor

holly-cummins commented Apr 28, 2022

Describe the bug

The quarkus-integration-test-container-image-invoker fail in various ways when run with a docker subsitute on mac, or on M1 hardware.
I don't have a full matrix of behaviour, but we've observed:

  • Mac M1, podman default: 15 failures, 1 error, 0 passed
    • Mac M1, podman patched to support multi-arch (see below): 9 failed, 7 passed
    • Mac M1, podman patched + ryuk disabled: 5-7 failed, 9-11 passed
    • Mac M1, podman patched + ryuk privileged (see below): 3 failed
  • Mac M1, docker: Frozen at Building container-build-jib-with-mssql 0.1-SNAPSHOT, further failures after workaround
  • Mac x86, podman: Failed in container-build-jib-with-db2
  • Mac x86, minikube: 12 failures, 4 passed
  • Mac x86, docker : clean

This is the invocation:
./mvnw -Dquickly -DskipTests=false -Dstart-containers=true -f integration-tests/container-image/maven-invoker-way

(There was originally a guard which (on some maven versions) only ran the tests on linux. The guard was removed as part of #25231 since it had some issues).

Ideally, the tests will pass on mac, and ideally without a hard requirement on docker as the container runtime. If we can't get them to pass, we should reintroduce a guard, but make the guard as focussed as possible, perhaps just on the tests which are failing, and for the exact conditions which aren't supported.

Causes of failures

I suspect there are several different issues.

Issues I was able to work around:
  • x86 arch on images. Docker can cope with this, and podman can't. I was able to update podman to cope by following the instructions on https://edofic.com/posts/2021-09-12-podman-m1-amd64/ (installing qemu inside the podman vm)
  • docker socket. Working through https://xphyr.net/post/podman_on_osx/ resolved this.
  • missing docker binary. IsDockerWorking looks for a docker binary or DOCKER_HOST. I didn't need a DOCKER_HOST because of the podman helper, so duplicated the podman executable and called it docker and put ti on my path. I don't think setting quarkus.docker.executable-name would be enough because the groovy scripts in this project do a straight exec of docker so need the script on the path
Issues I haven't worked around/investigated
  • expectations of localhost AFAIK Docker Desktop forwards from localhost to the VM, whereas using Minikube or similar exposes container ports on the VM. If tests assume everything is on localhost instead of allowing for containers being on another host and ask TestContainers what the address was it can cause issues
  • ....?

Expected behavior

The tests should run cleanly.

Actual behavior

Here are some example failures, from the platforms above.

M1, with patched podman

TESTCONTAINERS_RYUK_DISABLED="true" ./mvnw -Dquickly -DskipTests=false -Dstart-containers=true -f integration-tests/container-image/maven-invoker-way

[INFO] -------------------------------------------------
[INFO] Build Summary:
[INFO]   Passed: 7, Failed: 9, Errors: 0, Skipped: 0
[INFO] -------------------------------------------------
[ERROR] The following builds failed:
[ERROR] *  container-build-docker/pom.xml
[ERROR] *  container-build-jib-with-mysql/pom.xml
[ERROR] *  container-build-jib-with-db2/pom.xml
[ERROR] *  container-build-jib-with-mongo/pom.xml
[ERROR] *  container-build-multiple-tags-jib/pom.xml
[ERROR] *  container-build-jib-with-mssql/pom.xml
[ERROR] *  container-build-multiple-tags-docker/pom.xml
[ERROR] *  container-build-jib-appcds/pom.xml
[ERROR] *  container-image-push/pom.xml
[INFO] -------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  11:21 min
[INFO] Finished at: 2022-04-27T17:45:09+01:00
[INFO] ------------------------------------------------------------------------

This one looks like a podman compatibility issue but podman should support the java API?!

[INFO] Caused by: org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=docker.io/ibmcom/db2:11.5.7.0a, imagePullPolicy=DefaultPullPolicy(), imageNameSubstitutor=org.testcontainers.utility.ImageNameSubstitutor$LogWrappedImageNameSubstitutor@7b8f4190)
[INFO] Caused by: java.lang.NullPointerException: Cannot invoke "String.matches(String)" because the return value of "com.github.dockerjava.api.model.PullResponseItem.getStatus()" is null

Unsure of this one

[INFO] [ERROR] Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 'docker load' command failed with error: Error: unable to load image: payload does not match any of the supported image formats:
[INFO] [ERROR]  * oci: initializing source oci:/var/tmp/libpod-images-load.tar3361710228:: open /var/tmp/libpod-images-load.tar3361710228/index.json: not a directory
[INFO] [ERROR]  * oci-archive: creating temp directory: untarring file "/var/tmp/oci2061944008": unexpected EOF
[INFO] [ERROR]  * docker-archive: loading tar component manifest.json: unexpected EOF
[INFO] [ERROR]  * dir: open /var/tmp/libpod-images-load.tar3361710228/manifest.json: not a directory
[INFO] [ERROR] 
[INFO] [ERROR] 	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588)

x86 mac with minikube

[INFO] -------------------------------------------------
[INFO] Build Summary:
[INFO]   Passed: 4, Failed: 12, Errors: 0, Skipped: 0
[INFO] -------------------------------------------------
[ERROR] The following builds failed:
[ERROR] *  container-build-with-keycloak/pom.xml
[ERROR] *  container-build-jib-with-mysql/pom.xml
[ERROR] *  container-build-jib-with-db2/pom.xml
[ERROR] *  container-build-jib-with-kafka/pom.xml
[ERROR] *  container-build-with-keycloak-default-realm/pom.xml
[ERROR] *  container-build-jib-with-mongo/pom.xml
[ERROR] *  container-build-jib-with-postgresql/pom.xml
[ERROR] *  container-build-jib-with-mssql/pom.xml
[ERROR] *  container-build-jib-with-mariadb/pom.xml
[ERROR] *  container-build-jib-appcds/pom.xml
[ERROR] *  container-image-jib-with-redis/pom.xml
[ERROR] *  container-image-push/pom.xml
[INFO] -------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  17:29 min
[INFO] Finished at: 2022-04-27T18:57:24+02:00
[INFO] ------------------------------------------------------------------------

M1 mac with Docker

[INFO] Caused by: org.rnorth.ducttape.RetryCountExceededException: Retry limit hit with exception
[INFO]  at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:88)
[INFO]  at org.testcontainers.containers.GenericContainer.doStart(GenericContainer.java:338)
[INFO]  ... 15 more
[INFO] Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container
[INFO]  at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:537)
[INFO]  at org.testcontainers.containers.GenericContainer.lambda$doStart$0(GenericContainer.java:340)
[INFO]  at org.rnorth.ducttape.unreliables.Unreliables.retryUntilSuccess(Unreliables.java:81)
[INFO]  ... 16 more
[INFO] Caused by: java.lang.IllegalStateException: Container exited with code 1
[INFO]  at org.testcontainers.containers.GenericContainer.tryStart(GenericContainer.java:509)
[INFO]  ... 18 more

And some Caused by: java.net.ConnectException: Connection refused too

Frozen at Building container-build-jib-with-mssql 0.1-SNAPSHOT but -Dmssql.image=mcr.microsoft.com/azure-sql-edge seemed to work around that.

How to Reproduce?

  1. ./mvnw -Dquickly -DskipTests=false -Dstart-containers=true -f integration-tests/container-image/maven-invoker-way

Output of uname -a or ver

We're seeing a range of issues on various mac systems.

Output of java -version

A range of levels.

GraalVM version (if different from Java)

No response

Quarkus version or git rev

HEAD

Build tool (ie. output of mvnw --version or gradlew --version)

Range of levels.

@holly-cummins holly-cummins added the kind/bug Something isn't working label Apr 28, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Apr 28, 2022

/cc @geoand

@geoand
Copy link
Contributor

geoand commented Apr 29, 2022

Thanks for the detailed report!

I'll try and have a look soon

@holly-cummins
Copy link
Contributor Author

holly-cummins commented Apr 29, 2022

I've found an x86 mac under the sofa so I'm trying it on that and will update the matrix.
Edit: x86 mac with true-docker runs clean!

@holly-cummins
Copy link
Contributor Author

Oh, and for another data point - or, alternatively, to muddy the waters even more - I noticed a failure in this test suite on linux in one of my other PRs. I'm pretty sure there's nothing in the PR that would affect this suite, so there may be some (mild) flakiness in the suite independent of the OS: #25170 (comment)

@holly-cummins holly-cummins changed the title quarkus-integration-test-container-image-invoker tests do not pass on mac/with a docker equivalent (podman, minikube, etc) quarkus-integration-test-container-image-invoker tests do not pass on M1 mac/with a docker equivalent (podman, minikube, etc) Apr 29, 2022
@geoand
Copy link
Contributor

geoand commented May 6, 2022

Not sure how to proceed with this as I only have Linux x86 machines so I can't really test anything :(

@Sanne
Copy link
Member

Sanne commented May 6, 2022

Not sure how to proceed with this as I only have Linux x86 machines so I can't really test anything :(

Need to look under the sofa as well ! :)

More seriously - I think for now we can simply merge any related PR based on trust, provided it doesn't introduce regressions for existing coverage.

In parallel, we could also figure out if we can introduce CI jobs running on M1 ? Assuming it can be done at low effort, or if we can find an actual sponsor interested enough.

@geoand
Copy link
Contributor

geoand commented May 6, 2022

Need to look under the sofa as well ! :)

😆

More seriously - I think for now we can simply merge any related PR based on trust, provided it doesn't introduce regressions for existing coverage.

I am in total agreement with that, no question. My comment was basically about me not being able to do anything myself to fix this :)

@holly-cummins
Copy link
Contributor Author

I'm planning to continue bashing my head against this one, so I can help too.

@holly-cummins
Copy link
Contributor Author

Some of the techniques used in #25648 may be helpful for these tests, since they are also using databases.

@holly-cummins
Copy link
Contributor Author

You'll never guess. It's DNS.

(work continues ...)

@holly-cummins
Copy link
Contributor Author

The mongo failures are caused by podman dns being case-sensitive (containers/podman#14525). It would be fine except mongo converts the hostname we pass it to lower case before trying to connect to it. #26422 makes us less vulnerable to this kind of bug by lower-casing the hostnames dev services invents.

@holly-cummins
Copy link
Contributor Author

Here's my podman recipe for maximum compatibility. https://edofic.com/posts/2021-09-12-podman-m1-amd64/ explains the architecture part, and containers/podman#14238 has discussion of what's going on with ryuk. If ryuk wasn't disabled, there were lots of socket permission exceptions. With ryuk disabled, things mostly worked but there were some failures because of name collisions, which I think happened because containers weren't being cleaned up.

Edit ~/.testcontainers.properties and add the following line

ryuk.container.privileged=true

Then run the following

brew install podman
podman machine init -v $HOME:$HOME
sudo /opt/homebrew/Cellar/podman/4.0.3/bin/podman-mac-helper install
podman machine set --rootful
podman machine start
podman machine ssh
sudo -i
rpm-ostree install qemu-user-static
systemctl reboot

Once the virtual machine restarts, you should be good to run builds or command.

If you're podman 4.1 or higher, you don't need the the -v $HOME:$HOME volume mount.

@holly-cummins
Copy link
Contributor Author

After sorting out #26501, I was still seeing failures in the MSSQQL jib tests.

[INFO] [ERROR] 	[error]: Build step io.quarkus.container.image.jib.deployment.JibProcessor#buildFromJar threw an exception: java.lang.RuntimeException: Unable to create container image
[INFO] [ERROR] 	at io.quarkus.container.image.jib.deployment.JibProcessor.containerize(JibProcessor.java:240)
[INFO] [ERROR] 	at io.quarkus.container.image.jib.deployment.JibProcessor.buildFromJar(JibProcessor.java:166)
[INFO] [ERROR] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[INFO] [ERROR] 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
[INFO] [ERROR] 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[INFO] [ERROR] 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[INFO] [ERROR] 	at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:944)
[INFO] [ERROR] 	at io.quarkus.builder.BuildContext.run(BuildContext.java:277)
[INFO] [ERROR] 	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
[INFO] [ERROR] 	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
[INFO] [ERROR] 	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
[INFO] [ERROR] 	at java.base/java.lang.Thread.run(Thread.java:833)
[INFO] [ERROR] 	at org.jboss.threads.JBossThread.run(JBossThread.java:501)
[INFO] [ERROR] Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 'docker load' command failed with error: Error: unable to load image: payload does not match any of the supported image formats:
[INFO] [ERROR]  * oci: initializing source oci:/var/tmp/libpod-images-load.tar4243803530:: open /var/tmp/libpod-images-load.tar4243803530/index.json: not a directory
[INFO] [ERROR]  * oci-archive: creating temp directory: untarring file "/var/tmp/oci3432272930": unexpected EOF
[INFO] [ERROR]  * docker-archive: loading tar component manifest.json: unexpected EOF
[INFO] [ERROR]  * dir: open /var/tmp/libpod-images-load.tar4243803530/manifest.json: not a directory
[INFO] [ERROR] 
[INFO] [ERROR] 	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588)
[INFO] [ERROR] 	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:567)
[INFO] [ERROR] 	at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.builder.steps.StepsRunner.run(StepsRunner.java:219)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.api.Containerizer.run(Containerizer.java:390)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.api.JibContainerBuilder.containerize(JibContainerBuilder.java:597)
[INFO] [ERROR] 	at io.quarkus.container.image.jib.deployment.JibProcessor.containerize(JibProcessor.java:233)
[INFO] [ERROR] 	... 12 more
[INFO] [ERROR] Caused by: java.io.IOException: 'docker load' command failed with error: Error: unable to load image: payload does not match any of the supported image formats:
[INFO] [ERROR]  * oci: initializing source oci:/var/tmp/libpod-images-load.tar4243803530:: open /var/tmp/libpod-images-load.tar4243803530/index.json: not a directory
[INFO] [ERROR]  * oci-archive: creating temp directory: untarring file "/var/tmp/oci3432272930": unexpected EOF
[INFO] [ERROR]  * docker-archive: loading tar component manifest.json: unexpected EOF
[INFO] [ERROR]  * dir: open /var/tmp/libpod-images-load.tar4243803530/manifest.json: not a directory
[INFO] [ERROR] 
[INFO] [ERROR] 	at com.google.cloud.tools.jib.docker.DockerClient.load(DockerClient.java:211)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.builder.steps.LoadDockerStep.call(LoadDockerStep.java:74)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.builder.steps.StepsRunner.lambda$loadDocker$18(StepsRunner.java:618)
[INFO] [ERROR] 	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
[INFO] [ERROR] 	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
[INFO] [ERROR] 	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
[INFO] [ERROR] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[INFO] [ERROR] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[INFO] [ERROR] 	at java.base/java.lang.Thread.run(Thread.java:833)
[INFO] [ERROR] Caused by: java.nio.file.NoSuchFileException: /var/folders/x7/z854l70x75x_c2c9xns3ls2c0000gn/T/jib-core-application-layers-cache/layers/84106084ab5209af8276f41dff7bfa676bdb612e655a62ab453ffd3f199b7dd2/431a8b6dfb450bbef88ced2708c515293671b843f85273cb602e5bb314d2ffbb
[INFO] [ERROR] 	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
[INFO] [ERROR] 	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
[INFO] [ERROR] 	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
[INFO] [ERROR] 	at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
[INFO] [ERROR] 	at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
[INFO] [ERROR] 	at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
[INFO] [ERROR] 	at java.base/java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:422)
[INFO] [ERROR] 	at java.base/java.nio.file.Files.newInputStream(Files.java:160)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.blob.FileBlob.writeTo(FileBlob.java:38)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.tar.TarStreamBuilder.writeAsTarArchiveTo(TarStreamBuilder.java:53)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.image.ImageTarball.dockerWriteTo(ImageTarball.java:166)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.image.ImageTarball.writeTo(ImageTarball.java:84)
[INFO] [ERROR] 	at com.google.cloud.tools.jib.docker.DockerClient.load(DockerClient.java:197)
[INFO] [ERROR] 	... 8 more
[INFO] [ERROR] 	Suppressed: java.io.IOException: This archive contains unclosed entries.
[INFO] [ERROR] 		at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.finish(TarArchiveOutputStream.java:291)
[INFO] [ERROR] 		at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.close(TarArchiveOutputStream.java:309)
[INFO] [ERROR] 		at com.google.cloud.tools.jib.tar.TarStreamBuilder.writeAsTarArchiveTo(TarStreamBuilder.java:56)
[INFO] [ERROR] 		... 11 more
[INFO] [ERROR] -> [Help 1]

I was a bit stumped, but a podman prune and mvn clean (and perhaps some other cleaning along the way) seem to have sorted out the issue. Leaving the stack trace here for searchability, but I think MSSQL is resolved.

So now it's just DB2 to go, where it's a fight against https://stackoverflow.com/questions/70175677/ibmcom-db2-docker-image-fails-on-m1.

@geoand
Copy link
Contributor

geoand commented Jul 1, 2022

💪

@spolti
Copy link
Contributor

spolti commented Dec 1, 2022

Hi @holly-cummins, I was having this very same issue on Mac m1 with podman-desktop, I was getting this issue while trying to do a basic build with Quarkus 2.14.1 and JiB, seems that it is fixed on 2.14.2.
Do you know what commit fixed this?

@holly-cummins
Copy link
Contributor Author

@spolti, I don't! But thanks for spotting that it's fixed. I assumed we must have disabled the tests but I checked and we didn't - it just seems to have resolved itself sometime between April and August. So I'll close, and we have the record here if we need it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants