Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] run mvnd in docker #496

Open
dcaillia opened this issue Oct 8, 2021 · 23 comments
Open

[discussion] run mvnd in docker #496

dcaillia opened this issue Oct 8, 2021 · 23 comments

Comments

@dcaillia
Copy link

dcaillia commented Oct 8, 2021

I have a docker image that contains a regular maven installation, jdk and other build tools.

I use it to run mvn (volume-mounting only my project sources and .m2) both during local development and on the build-server.

I like this because it makes sure my local builds are built exactly like on the build-server (which uses the same "build image"). The Dockerfile of my build image is also important as it completely defines the build environment and tools.

I'm wondering how to get the benefits of mvnd while sticking to that "build image" concept (building in a docker container).

A couple of questions pop up:

  • i'ld need to keep the build container alive across builds to enjoy the mvnd speedup
  • so to compile n different projects i'ld have to keep n build-containers running
  • unless if i'ld mount my whole data dir with all projects - which is something i'ld rather avoid, as by only mounting project x sources, i'm sure the build of project x is not relying on resources outside of project x

So, with mvnd on the host all is awesome - assuming you build (call mvnd) on the host too (and not in some docker container).

I'm wondering if there's a way to have 1 single long-running mvnd container (running all the time), and then when i launch my own build docker container (mounting project x sources), it somehow connects to the long-running mvnd, such that my build container talks to that long-running mvnd to get the mvn job done faster.

@ppalaga
Copy link
Contributor

ppalaga commented Oct 18, 2021

I'm wondering if there's a way to have 1 single long-running mvnd container (running all the time), and then when i launch my own build docker container (mounting project x sources), it somehow connects to the long-running mvnd, such that my build container talks to that long-running mvnd to get the mvn job done faster.

We currently have only a local, file-based daemon registry, so discovering a daemon running on a different host is not possible. Having some general service discovery mechanism sounds interesting. Have you thought of some specific protocol/implementation?

@gnodet
Copy link
Contributor

gnodet commented Oct 20, 2021

Also, if the mvnd daemon is on the host, that would completely defeat the purpose of using docker images to make sure the "local builds are built exactly like on the build-server", because it would use the host environment and not the docker image one. Am I missing something ?

@dcaillia
Copy link
Author

The daemon could be inside my build-container (that has other build-tooling i need as part of the mvn build). I just need to:

(1) make sure it never shuts down, because it's a daemon
(2) have a way to make my source code (of the project i need to build) visible into that long-running container when i want to call its mvn, maybe this is feasible with a softlink

@ppalaga
Copy link
Contributor

ppalaga commented Oct 24, 2021

Also, if the mvnd daemon is on the host, that would completely defeat the purpose of using docker images to make sure the "local builds are built exactly like on the build-server", because it would use the host environment and not the docker image one.

I think @dcaillia could run both the daemon and the mvnd client inside the container. The daemon would be the container’s primary process (PID 1) and mvnd could be invoked via docker exec. In that way the consistency of the env across machines could be guaranteed.

@ppalaga
Copy link
Contributor

ppalaga commented Oct 24, 2021

(1) make sure it never shuts down, because it's a daemon

There is currently no easy way to start the daemon process directly. But it is not that hard to put together the necessary command. You could e.g. start the daemon via mvnd -v and then find the command that the mvnd client used to start the daemon via ps -e -f | grep mvnd. On my machine, I see this:

/home/ppalaga/.sdkman/candidates/java/21.2.0.r16-grl/bin/java -classpath /home/ppalaga/.sdkman/candidates/mvnd/0.6.0/mvn/lib/ext/mvnd-common-0.6.0.jar:/home/ppalaga/.sdkman/candidates/mvnd/0.6.0/mvn/lib/ext/mvnd-agent-0.6.0.jar -javaagent:/home/ppalaga/.sdkman/candidates/mvnd/0.6.0/mvn/lib/ext/mvnd-agent-0.6.0.jar --add-opens jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.jvm=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED -Xms128M -Xmx6G -Dmvnd.home=/home/ppalaga/.sdkman/candidates/mvnd/0.6.0 -Dmvnd.java.home=/home/ppalaga/.sdkman/candidates/java/21.2.0.r16-grl -Dlogback.configurationFile=/home/ppalaga/.sdkman/candidates/mvnd/0.6.0/conf/logback.xml -Dmvnd.id=ffe0de8d -Dmvnd.daemonStorage=/home/ppalaga/.m2/mvnd/registry/0.6.0 -Dmvnd.registry=/home/ppalaga/.m2/mvnd/registry/0.6.0/registry.bin -Dmvnd.socketFamily=inet -Djdk.java.options= --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/sun.nio.fs=ALL-UNNAMED -Dmvnd.noDaemon=false -Dmvnd.debug=false -Dmvnd.idleTimeout=3h -Dmvnd.keepAlive=100ms -Dmvnd.extClasspath= -Dmvnd.coreExtensions= -Dmvnd.minHeapSize=128M -Dmvnd.maxHeapSize=6G -Dmvnd.jvmArgs=--add-opens jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.jvm=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED -Dmvnd.enableAssertions=false -Dmvnd.expirationCheckDelay=10s -Dmvnd.duplicateDaemonGracePeriod=10s -Dmvnd.socketFamily=inet org.mvndaemon.mvnd.common.MavenDaemon

You could then put a similar command to your dockerfile.

(2) have a way to make my source code (of the project i need to build) visible into that long-running container when i want to call its mvn, maybe this is feasible with a softlink

You could perhaps mount your whole home folder or whichever folder containing all your projects? Then you could do something like docker exec CONTAINER cd /mnt/rel/path/to/my/project && mvnd <goals>

@NoSugarCoffee
Copy link

NoSugarCoffee commented Jan 7, 2022

So mvnd has less potential on CI, because can‘t benifit from daemon, more suitable for local build and run proeject again and again, is it right?

@ppalaga
Copy link
Contributor

ppalaga commented Jan 7, 2022

So mvnd has less potential on CI, because can‘t benifit from daemon, more suitable for local build and run proeject again and again, is it right?

Indeed, mvnd was primarily designed as an interactive tool run by humans.
It may still make sense to experiment with other use cases.

@chicobento
Copy link

On our CI pipelines, on a single build we launch mvn multiple times across jenkins stages, eg: lint, test, integration test, javadocs and so on, so I believe that we would benefit from mvnd on CI even though in that scenario, the definition of long-running daemon might be not so long.
However, we have a problem similar to the @dcaillia , since the mvn calls are made from containers launched by the jenkins host, so it would be really nice if we had a way of redirecting the mvnd 'client' to reuse a daemon launched by the host (or another container).
My initial idea is to perhaps spawn a mvnd 'daemon' container at the beginning of the pipeline and then be able to redirect the other mvnd 'client' container calls to reuse the daemon container.
Something like:

jenkins pipeline
  - container 0: launch mvnd daemon
  - container 1: mvnd clean
  - container 2: mvnd package install
  - container 3: mvnd test
  - container 4: mvnd dependency:analyze
  - ... and many other mvn plugins we run on our builds

Where container 1-4 calls would reuse the daemon launched by container 0.

Would that be possible ? Is there other strategy that we could benefit from mvnd given our 'restriction' of launching mvn plugins on separate containers ?

@ppalaga
Copy link
Contributor

ppalaga commented Jan 10, 2022

The existing network protocol between mvnd client and the daemon would work without any change even if daemon runs on a different host. What is missing is a discovery mechanism for the clients to find daemon(s) running on other hosts. Currently there is only a local file-based deamon registry implementation: the client looks into that file for existing daemons and the daemons write their busy/idle state and port there.

So we need some new mechanism.
Perhaps a command line option and/or an env. var for passing the daemon host:port to the client would be enough initially?

@chicobento
Copy link

Sounds perfect. That was exactly what I was hoping for 👍🏻

@wallentx
Copy link

wallentx commented Aug 1, 2024

This would be extremely useful for us in the context of running this in k8s.
Our monorepo CI has hit the limits. We are pulling down 1TB of dependencies and plugins per day, and are getting rate limited by public maven repos. If we switch to our private artifactory cloud as a pull-through cache, we're going to have to eat the transfer cost.

We use the github actions ARC controller, so our runners spin up as pods in our cluster.
If mavend could be deployed as a persistent k8s daemonset that holds a centralized .m2 cache, and the client runner pods simply sent a "execute goal" payload to the daemon, that would solve just about every (self-induced) problem we are currently trying to overcome.

I suppose I'm just commenting here because it's nice to daydream. Back to trying to figure out why my builds wont use the .m2 cache on my build container 🙃

@cstamas
Copy link
Member

cstamas commented Aug 1, 2024

Interesting ideas in this discussion...

@wallentx
Copy link

wallentx commented Aug 2, 2024

@HomeOfTheWizard
Copy link

HomeOfTheWizard commented Aug 6, 2024

This would open doors for a lot of new use cases.
Especially for using plugins that help for configuring an application running on K8s/Docker.
For example I have a plugin that I use to fetch from a secret manager the credentials necessary for an application running on K8s.
https://homeofthewizard.github.io/vault-maven-plugin/

Right now I use the plugin only during the deployment of the application, get the secrets and put to K8s secrets.
But this works only if my credentials do not change often.
If I manage to run my plugin fast enough with mvnd on a container, I could use it as a sidecar (fast enough to not slow down the startup of the pod too much) and fetch the fresh/updated credentials every time a new pod starts.

it's nice to daydream 😄

@HomeOfTheWizard
Copy link

HomeOfTheWizard commented Aug 6, 2024

@wallentx I think it is possible to use this cache extension with mvnd

mvnd.coreExtensions: internal option to specify the list of maven extension to register

For a full list of available properties please see [/dist/src/main/distro/conf/mvnd.properties](https://github.com/apache/maven-mvnd/blob/master/dist/src/main/distro/conf/mvnd.properties).

But it does not address all the requirements listed in the discussion here.
If I understood well from the documentation it is for incremental builds, meaning it helps to skip parts of the project that is already executed.

For instance, I suppose it will not be used for the below specifities of mvnd:
caching the class loaders of the plugins executed by the previous builds, or the byte code generated by the JIT in the.
From the README:

This architecture brings the following advantages:

    The JVM for running the actual builds does not need to get started anew for each build.

    The class loaders holding classes of Maven plugins are cached over multiple builds. The plugin jars are thus read and parsed just once. SNAPSHOT versions of Maven plugins are not cached.

    The native code produced by the Just-In-Time (JIT) compiler inside the JVM is kept too. Compared to stock Maven, less time is spent by the JIT compilation. During the repeated builds the JIT-optimized code is available immediately. This applies not only to the code coming from Maven plugins and Maven Core, but also to all code coming from the JDK itself.

IMHO the way is to be able to use a remote centralized long running daemon from multiple mvnd clients, or externalize the cache of the daemon so it can be shared but that might be a bit more complex if the daemons sharing the same cache runs different plugins/tasks.

@HomeOfTheWizard
Copy link

HomeOfTheWizard commented Aug 8, 2024

(1) make sure it never shuts down, because it's a daemon

There is currently no easy way to start the daemon process directly. But it is not that hard to put together the necessary command. You could e.g. start the daemon via mvnd -v and then find the command that the mvnd client used to start the daemon via ps -e -f | grep mvnd. On my machine, I see this:

/home/ppalaga/.sdkman/candidates/java/21.2.0.r16-grl/bin/java -classpath /home/ppalaga/.sdkman/candidates/mvnd/0.6.0/mvn/lib/ext/mvnd-common-0.6.0.jar:/home/ppalaga/.sdkman/candidates/mvnd/0.6.0/mvn/lib/ext/mvnd-agent-0.6.0.jar -javaagent:/home/ppalaga/.sdkman/candidates/mvnd/0.6.0/mvn/lib/ext/mvnd-agent-0.6.0.jar --add-opens jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.jvm=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED -Xms128M -Xmx6G -Dmvnd.home=/home/ppalaga/.sdkman/candidates/mvnd/0.6.0 -Dmvnd.java.home=/home/ppalaga/.sdkman/candidates/java/21.2.0.r16-grl -Dlogback.configurationFile=/home/ppalaga/.sdkman/candidates/mvnd/0.6.0/conf/logback.xml -Dmvnd.id=ffe0de8d -Dmvnd.daemonStorage=/home/ppalaga/.m2/mvnd/registry/0.6.0 -Dmvnd.registry=/home/ppalaga/.m2/mvnd/registry/0.6.0/registry.bin -Dmvnd.socketFamily=inet -Djdk.java.options= --add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/sun.nio.fs=ALL-UNNAMED -Dmvnd.noDaemon=false -Dmvnd.debug=false -Dmvnd.idleTimeout=3h -Dmvnd.keepAlive=100ms -Dmvnd.extClasspath= -Dmvnd.coreExtensions= -Dmvnd.minHeapSize=128M -Dmvnd.maxHeapSize=6G -Dmvnd.jvmArgs=--add-opens jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.jvm=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED --add-opens jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED -Dmvnd.enableAssertions=false -Dmvnd.expirationCheckDelay=10s -Dmvnd.duplicateDaemonGracePeriod=10s -Dmvnd.socketFamily=inet org.mvndaemon.mvnd.common.MavenDaemon

You could then put a similar command to your dockerfile.

(2) have a way to make my source code (of the project i need to build) visible into that long-running container when i want to call its mvn, maybe this is feasible with a softlink

You could perhaps mount your whole home folder or whichever folder containing all your projects? Then you could do something like docker exec CONTAINER cd /mnt/rel/path/to/my/project && mvnd <goals>

I have tested this approach, but it seems the mvnd client does not necessarily use the already running deamon everytime, even if you run the same mvn commands exactly 😞

I have a project here where I run a container with a docker deamon as main process, and execute mvnd client by connecting to it via docker exec <container> mvnd clean install.
https://github.com/HomeOfTheWizard/vault-mvnd-benchmark/blob/main/docker/launch-mvnd.sh

It never uses the main deamon already running on the container.
However, If I run docker exec <container> mvnd clean install twice fast enough, the second command uses the same deamon as the first one (not the same that is running as main process in the container, but a new one created by the first "docker exec")

Maybe I am missing something, or we need to implement a logic that is reproducible for clients that requires selecting specific deamons.
@cstamas I have found that issue #955.
Is this something we can integrate into that topic ?
If needed, I can happily help to code it.

@ppalaga
Copy link
Contributor

ppalaga commented Aug 9, 2024

@HomeOfTheWizard what is docker exec <container> mvnd --status showing after you start the container? Is the main daemon up and running and is it in the IDLE state?

@HomeOfTheWizard
Copy link

HomeOfTheWizard commented Aug 9, 2024

@ppalaga Thanks for pointing out. It is in "Busy" status. probably the reason why I can't reuse it.
I copy pasted the command run by mvnd -v as you recommended.
I now see that in that command I have the following parameters:

java ... \
    -Dmvnd.idleTimeout=3h \
    -Dmvnd.keepAlive=100ms \
    -Dmvnd.expirationCheckDelay=10s \
    -Dmvnd.duplicateDaemonGracePeriod=10s \
    ...

Can those params play a role in the status of the deamon beeing "Busy" ?

here is the full command of my Dockerfile

ENTRYPOINT /usr/java/openjdk-21/bin/java -classpath /usr/local/mvnd/mvn/boot/plexus-classworlds-2.8.0.jar \
    -javaagent:/usr/local/mvnd/mvn/lib/mvnd/mvnd-agent-1.0.1.jar \
    -Dmvnd.home=/usr/local/mvnd -Dmaven.home=/usr/local/mvnd/mvn \
    -Dmaven.conf=/usr/local/mvnd/mvn/conf -Dclassworlds.conf=/usr/local/mvnd/bin/mvnd-daemon.conf \
    -Dorg.slf4j.simpleLogger.logFile=/root/.m2/mvnd/registry/1.0.1/daemon-cd850559.log -Dmvnd.java.home=/usr/java/openjdk-21 \
    -Dmvnd.id=cd850559 -Dmvnd.daemonStorage=/root/.m2/mvnd/registry/1.0.1 -Dmvnd.registry=/root/.m2/mvnd/registry/1.0.1/registry.bin \
    -Dmvnd.socketFamily=inet -Dmvnd.home=/usr/local/mvnd \
    -Djdk.java.options="--add-opens java.base/java.io=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/sun.net.www.protocol.jar=ALL-UNNAMED --add-opens java.base/sun.nio.fs=ALL-UNNAMED" \
    -Dmvnd.noDaemon=false -Dmvnd.debug=false -Dmvnd.debug.address=8000 -Dmvnd.idleTimeout=3h -Dmvnd.keepAlive=100ms \
    -Dmvnd.extClasspath= -Dmvnd.coreExtensions= -Dmvnd.enableAssertions=false -Dmvnd.expirationCheckDelay=10s \
    -Dmvnd.duplicateDaemonGracePeriod=10s -Dmvnd.socketFamily=inet \
    org.codehaus.plexus.classworlds.launcher.Launcher

@ppalaga
Copy link
Contributor

ppalaga commented Aug 9, 2024

Yeah, the daemon starts in busy state so that the client that initiates its start can grab it by daemonId.
I think currently, there is no way to instruct the client on the CLI to connect to a daemon with a specific daemonId.

Not sure what is the goal of your experiments?

You just want to invoke the client inside the running container so that it connects to the main daemon?
If so, you could e.g. try to override the daemon registry after the daemon has started with a static file in which the daemon with your ID has status IDLE:

# stop all daemons
mvnd --stop
# start one daemon
mvnd -v
# make sure there is one daemon started and it has gone idle
mvnd --status
      ID      PID                   Address   Status ...
1c0afaf5    58432     inet:/127.0.0.1:33411     Idle ...
# store this file for overriding the one in the container
ls ~/.m2/mvnd/registry/<version>/registry.bin

Note that when starting the daemon in the container, you must use the same -Dmvnd.id=... as shown by mvnd --status.

If you want to run the client in a different process, then you may want to implement some new client options for connecting to a remote daemon.

@ppalaga
Copy link
Contributor

ppalaga commented Aug 9, 2024

Actually, if your goal is just to invoke the client inside the running container, then you may start the daemon by simply mvnd -v - its starts it and leaved it in idle state.

@HomeOfTheWizard
Copy link

My use case is the same as @dcaillia and @chicobento.
I am running the same maven project with same set of plugins over and over, from a new container everytime.

The main goal of my experiment is to see if I can start a container with a deamon already running, and with a JIT cache already warmed up and the plugin classloaders already loaded, for the set of plugins I use in my project.
Ideally do that during docker build, so when I create the container it already has the deamon, and I can simply run mvnd clean install as ENTRYPOINT and have my maven build startup fast.

But first I tried to see if I can connect to an already running deamon.
Thanks for your tips, I know how to do it now.

Now the question is:
Can I use the same registry file to override the JIT and plugin Classloader cache of a running deamon ?
If not, is there another way to prepare this cache during docker build ?

@ppalaga
Copy link
Contributor

ppalaga commented Aug 9, 2024

start a container with a deamon already running, and with a JIT cache already warmed up and the plugin classloaders already loaded

JIT and classloader caching will only kick in when you build against the same daemon process repeatedly. A freshly started container will host a freshly started daemon with empty classloader caches and no code JIT-ed. Unless you want to experiment with CRaC or similar tech? I am not aware that anybody would ever tried that with Maven Daemon. It is a great use case and it would be very interesting to see how well it works.

@HomeOfTheWizard
Copy link

My question was actually: "is there a way to export/store the JIT and plugin classloaders cache, so we can override and reuse in a new daemon, just like the registry file."

But I just realized by reading again the doc, that the daemon is a normal JVM so no AOT, the only GraalVM native process is the client, and all those cache of the daemon are in memory, so no way to persist it to be able to export and reuse.

I will play around with CRaC and let know if that helps. Thanks for your help again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants