Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more robust Maven image selection process #93

Closed
wants to merge 2 commits into from
Closed

Use more robust Maven image selection process #93

wants to merge 2 commits into from

Conversation

Stephan202
Copy link
Contributor

This is a proposal based on @hboutemy's suggestion here.

@Stephan202 Stephan202 changed the title Use more robust Maven selection process Use more robust Maven image selection process Oct 22, 2022
@hboutemy
Copy link
Member

hboutemy commented Oct 23, 2022

good idea

I tried to evaluate the impact: i added https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/bin/update-statistics.sh script that generates https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/buildspec-stats.txt

I have an issue with current full rewrite: it takes ages to run (I even just hit my pull rate limit on Dockerhub :) ) and has a huge impact on values

I imagine that keeping the existing algorithm would be ok, with adding at the end a check that the image exists, and if it does not exist, using the new algorithm with less options (because we don't need old values for old JDKs)

we would have the best of all worlds, isn't it?

@Stephan202
Copy link
Contributor Author

Works for me! I added a commit with a new proposal. Note that I swapped out docker manifest inspect for docker pull -q, because the former doesn't seem faster for existing images, while the latter performs an operation required by the subsequent docker run.

@@ -122,6 +122,14 @@ mvnBuildDocker() {
*)
mvnImage=maven:${mvnVersion}-eclipse-temurin-${jdk}-alpine
esac
if ! docker pull -q ${mvnImage} > /dev/null 2>&1; then
for image in maven:{${mvnVersion},3}-eclipse-temurin-${jdk}-alpine; do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For recent JDKs and Maven versions these two variants seem to suffice.

Comment on lines +127 to +130
if docker pull -q ${image} > /dev/null 2>&1; then
mvnImage=${image}
break
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that the previous revision could inspect more images than necessary. Are simple for-loop avoids that.

@@ -122,6 +122,14 @@ mvnBuildDocker() {
*)
mvnImage=maven:${mvnVersion}-eclipse-temurin-${jdk}-alpine
esac
if ! docker pull -q ${mvnImage} > /dev/null 2>&1; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stephan202 I tested this (with bin/update-statistics.sh) and found that docker pull -q fails when the image is already available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; locally I can't reproduce this:

$ docker --version
Docker version 20.10.20, build 9fdeb9c
$ docker pull -q maven:3.8.5-eclipse-temurin-17-alpine 2>&1 2>/dev/null; echo $?
docker.io/library/maven:3.8.5-eclipse-temurin-17-alpine
0
$ docker pull -q maven:3.8.5-eclipse-temurin-17-alpine 2>&1 2>/dev/null; echo $?
docker.io/library/maven:3.8.5-eclipse-temurin-17-alpine
0
$ docker pull -q maven:this-one-does-not-exist 2>&1 2>/dev/null; echo $?
1

Perhaps it depends on the version of Docker used (which one do you have installed locally?).

I can have a closer look at bin/update-statistics.sh later; right now I'm commuting; don't want to pull new images while using mobile internet :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can have a closer look at bin/update-statistics.sh later.

Didn't happen yesterday, and today is full of meeting and events; I intend to circle back to this PR later this week. Sorry for the delay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries, we all have a day job, I have the same type of delays that you patiently waited for

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested (and improved a little bit the script): you're right, docker pull -q works, I probably was hit by my rate limit or anything stupid and transient

now, with the bin/update-statistics.sh script, we can easily evaluate the impact of any change in image tag selection and see if it breaks any buildspec tool x jdk selection

I let you review and update the statistics script while changing rebuild.sh so we can merge with safety

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally found a moment to get back to this :)

I saw that bin/update-statistics.sh uses docker inspect rather than docker manifest inspect. So this means there are three options. I wrote the following script to compare them:

#!/usr/bin/env bash

# The image against which to test.
image=maven:3.6.3-openjdk-17-slim

print_header() {
  local header="${1}"
  echo -e "\n\033[0;34m${header}\033[0m"
}

print_result() {
  test $? == 0 && echo -e '\033[0;32mSUCCESS\033[0m' || echo -e '\033[0;31mFAILURE\033[0m'
}

# Drop the image if it exists.
docker image rm -f "${image}"

print_header 'Running `docker inspect` while the image is locally absent.'
time docker inspect "${image}" > /dev/null
print_result

print_header 'Running `docker manifest inspect` while the image is locally absent.'
time docker manifest inspect "${image}" > /dev/null
print_result

print_header 'Running `docker pull -q` while the image is locally absent.'
time docker pull -q "${image}" > /dev/null
print_result

print_header 'Running `docker inspect` while the image is locally present.'
time docker inspect "${image}" > /dev/null
print_result

print_header 'Running `docker manifest inspect` while the image is locally present.'
time docker manifest inspect "${image}" > /dev/null
print_result

print_header 'Running `docker pull -q` while the image is locally present.'
time docker pull -q "${image}" > /dev/null
print_result

Based on the results, it seems to me that:

  1. docker inspect is by far the fastest, but only says whether the given image exists locally. It seems this command cannot be used in rebuild.sh, unless there's some kind of "pre-population" of images. (Which might be worth the effort of implementing; see below.)
  2. docker manifest inspect takes ~3 seconds if the image doesn't exist, and ~5 if it does, irrespective of how often the command is executed.
  3. docker pull -q takes ~2 seconds if the image doesn't exist, and ~1.5 if is does, unless the image doesn't exist locally yet, in which case it of course takes longer (but that's a cost that needs to be paid regardless).

Based on this I see two options:

  1. Stick with docker pull -q in rebuild.sh.
  2. Introduce a docker pull -q-based "bootstrap phase" and use docker inspect in rebuild.sh.

(The timings presented here may of course be specific to my setup; interested to hear whether you can reproduce these results.)

hboutemy added a commit that referenced this pull request Nov 6, 2022
Co-authored-by: Stephan Schroevers <[email protected]>
Co-authored-by: Hervé Boutemy <[email protected]>
@hboutemy
Copy link
Member

hboutemy commented Nov 6, 2022

yes, great: I merged and reworked the algorithm into 0b5d8be

thanks a lot

@hboutemy hboutemy closed this Nov 6, 2022
@Stephan202 Stephan202 deleted the improvement/more-robust-image-selection branch November 6, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants