Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: CI failing because of logging dependencies missing at runtime #3775

Closed
slarse opened this issue Jan 29, 2021 · 32 comments
Closed

chore: CI failing because of logging dependencies missing at runtime #3775

slarse opened this issue Jan 29, 2021 · 32 comments

Comments

@slarse
Copy link
Collaborator

slarse commented Jan 29, 2021

We currently have a ton of failures over on Jenkins due to some sl4j stuff not being available at runtime in the maven plugin. I have not been able to replicate it locally; it all works fine for me.

See also the discussion in #3755

I'll be trying to figure this out, but if anyone has any ideas that'd be great :)

@monperrus
Copy link
Collaborator

Indeed the same as the problem reported by @seintur

@monperrus
Copy link
Collaborator

One fix maybe to change the scope of logback to compile. But that goes against my current understanding of slf4j which is meant to work only with the API.

@slarse
Copy link
Collaborator Author

slarse commented Jan 31, 2021

What's very odd is that I can't reproduce it. When I run the CI build script locally, everything works out fine. That makes me think that there isn't anything overtly wrong with the Maven config. I will try reproducing with a few different VMs this Monday.

@seintur
Copy link
Contributor

seintur commented Jan 31, 2021

ok don't have a solution yet, but I've done a couple of investigations on project Juliac. In this case, I'm using exec-maven-plugin for invoking Spoon. exec-maven-plugin:3.0.0 depends on slf4j-api 1.7.5, an older version than 1.7.30 that is used by Spoon. Whatever I'm trying to do, exec-maven-plugin constructs an URLClassLoader that contains his version of slf4j-api and not the one that comes with Spoon. And that's the source of the problem because the Event, and certainly others, classes that are used by Spoon and that come from 1.7.30, are absent from 1.7.5. Still have to figure out a way to tell exec-maven-plugin to construct a classpath with the newer version of slf4j-api...

@seintur
Copy link
Contributor

seintur commented Jan 31, 2021

follow up from my previous message. So it seems that adding slf4j-api:1.7.30 in the plugin dependencies (not the project) does the trick. The good news is that exec-maven-plugin is moving from 1.7.5 to 1.7.30 in their upcoming 3.0.1 release that is still in snapshot. So whenever 3.0.1 will be released, the trick will certainly no longer be needed.

@slarse
Copy link
Collaborator Author

slarse commented Feb 1, 2021

Great! As Juliac was suffering from the same error as pretty much all the other builds (java.lang.NoClassDefFoundError: org/slf4j/event/Level), this should be applicable to the other projects as well. I'm still confused as to why none of these errors occur on my own machine.

While investigating this, I also found a major error in the build.sh script: it's trying to pull snapshots from the old snapshot repository at http://maven.inria.fr/artifactory/spoon-public-snapshot/. This explains how #3755 could be merged on the 19th (a Tuesday), but we didn't start seeing build errors until the 24th (a Sunday). To be very clear, we're building against the latest beta version that's released every weekend, and not against the snapshot.

@slarse
Copy link
Collaborator Author

slarse commented Feb 1, 2021

spoon-maven-plugin has the same problem, using the old snapshot repository.

@slarse
Copy link
Collaborator Author

slarse commented Feb 9, 2021

I've been busy with other things but I'm getting back on this now. I was unable to spin up a VM because I couldn't find where to enable virtualization in BIOS, but now I finally found it. Well, some friendly person on a forum found it. It was placed under CPU Overclocking, because that's kind of the same thing as virtualization ...

So it seems that adding slf4j-api:1.7.30 in the plugin dependencies (not the project) does the trick.

@seintur Did you add that in the plugin dependencies of Juliac, or to spoon-maven-plugin?

@seintur
Copy link
Contributor

seintur commented Feb 9, 2021

In the plugin dependencies of exec-maven-plugin, just like:

<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<dependencies>
   <dependency>
   <groupId>org.slf4j</groupId>
   <artifactId>slf4j-api</artifactId>
   <version>1.7.30</version>
   </dependency>
</dependencies>
</plugin>

@slarse
Copy link
Collaborator Author

slarse commented Feb 9, 2021

In what context is the exec-maven-plugin used? As far as I can tell, none of the CI is using it.

I've now also failed to reproduce the error on Ubuntu 20 with JDK 8, the CI scripts just work fine for me locally. What was your exact configuration for triggering the error?

@seintur
Copy link
Contributor

seintur commented Feb 9, 2021

I'm using exec-maven-plugin to invoke the CLI of Juliac that uses Spoon as a lib.

@slarse
Copy link
Collaborator Author

slarse commented Feb 9, 2021

I'm using exec-maven-plugin to invoke the CLI of Juliac that uses Spoon as a lib.

Right, okay. Then that's a bit tailored to that specific invocation. I need to find something that works with the CI build script. Only I can't get the CI build script to fail on my local machine.

After reading up on how slf4j works, I believe that our problem is a version mismatch between the API version and the logging framework it tries to bind to (http://www.slf4j.org/manual.html#compatibility). Spoon only declares the API version in it's pom file, which (I think) results in it binding to the nearest defined logging framework in the Maven dependencies, which may not be a compatible version. Dependency resolution may be machine-specific, which could explain why I can't reproduce it locally.

Anyway, I think that this problem might be solved by declaring a dependency on a logging implementation in Spoon, e.g. like so:

<dependency> 
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  <version>1.7.30</version>
</dependency>

But I can't really test it as I'm unable to reproduce the problem in the first place.

@slarse
Copy link
Collaborator Author

slarse commented Feb 10, 2021

I've found that slf4j indeed does not do any dynamic loading.

SLF4J does not rely on any special class loader machinery. In fact, each SLF4J binding is hardwired at compile time to use one and only one specific logging framework. For example, the slf4j-log4j12-1.7.28.jar binding is bound at compile time to use log4j.

That's why we get the NoClassDefFoundError when there's a version mismatch at runtime.

Here's what I think will be a solution:

  1. Merge chore: Use OW2 Nexus snapshot repository SpoonLabs/spoon-maven-plugin#75 and make a release. This is necessary for the CI build script to pull snapshots of Spoon from the correct repository (Jenkins is currently building the latest beta version).
  2. Add a binding for slf4j in Spoon's dependencies that matches the API version.

But I can't be sure as I've still not been able to actually reproduce the problem.

@slarse
Copy link
Collaborator Author

slarse commented Feb 10, 2021

Nope, I'm barking up the wrong tree here, it's not the binding that is the problem. Like @seintur noted long ago, it's definitely just the API version being picked up from some other library in the Jenkins CI.

We still need to build the snapshots instead of the beta version, though :)

@slarse
Copy link
Collaborator Author

slarse commented Feb 10, 2021

I am at a complete loss here. Even explicitly adding the dependency to the spoon-maven-plugin like so:

      <plugin>
        <groupId>fr.inria.gforge.spoon</groupId>
        <artifactId>spoon-maven-plugin</artifactId>
        <version>3.1</version>
        <executions>
          <execution>
            <phase>generate-sources</phase>
            <goals>
              <goal>generate</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <processors>
            <processor>spoon.processing.SpoonTagger</processor>
          </processors>
        </configuration>
        <dependencies>
          <dependency>
            <groupId>fr.inria.gforge.spoon</groupId>
            <artifactId>spoon-core</artifactId>
            <version>[8.4.0-SNAPSHOT,)</version>
          </dependency>
          <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.30</version>
          </dependency>
        </dependencies>
      </plugin>

It still fails when executing the plugin with this error:

Failed to execute goal fr.inria.gforge.spoon:spoon-maven-plugin:3.1:generate (default) on project jimfs: Execution default of goal fr.inria.gforge.spoon:spoon-maven-plugin:3.1:generate failed: A required class was missing while executing fr.inria.gforge.spoon:spoon-maven-plugin:3.1:generate: org/slf4j/event/Level

@monperrus I don't know what to do here. Can you try purging all possible Maven caches on the Jenkins nodes? I don't have any other idea than that this is a caching issue. I.e. delete all .m2 directories.

@slarse
Copy link
Collaborator Author

slarse commented Feb 10, 2021

@monperrus I don't know what to do here. Can you try purging all possible Maven caches on the Jenkins nodes? I don't have any other idea than that this is a caching issue. I.e. delete all .m2 directories.

Scratch that for the moment, I might have figured it out.

@slarse
Copy link
Collaborator Author

slarse commented Feb 10, 2021

Scratch the scratch, I can't figure this out.

If someone else wants to try, feel free. I'm entirely out of ideas.

@slarse
Copy link
Collaborator Author

slarse commented Feb 11, 2021

It's not a caching issue, we tried removing the .m2 directories and it still fails.

https://ci.inria.fr/sos/job/Jimfs/1554/console

So something about Maven's dependency dependency resolution order is different on the Jenkins machines than on any of my own machines. I'll keep on debugging this whenever I can, but if someone else can try to run the build script on their local machine and see if they can replicate the crash, that'd be great.

Build script is here: https://spoon.gforge.inria.fr/jenkins/build.sh

The problem is clearly that we're pulling a lot of different versions of slf4j-api, and the goal now is to work around that. It's probably just a matter of excluding one or more of those slf4j-api dependencies.

@slarse
Copy link
Collaborator Author

slarse commented Feb 12, 2021

I believe that I finally have a solution!

I've reproduced the problem with the following system:

OS: Ubuntu 16.04.1
Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 1.8.0_282, vendor: Private Build
Java home: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF8
OS name: "linux", version: "4.15.0-112-generic", arch: "amd64", family: "unix"

It's very similar to the Ubuntu 16 node on Jenkins, and the exact same compile failure occurs.

Now, I still don't know exactly what the problem is and why the correct version of slf4j-api is missing at runtime, but I've found that simply upgrading Maven to version 3.6.3 solves the problem. This has to be done manually, I'll put the exact commands I used to upgrade Maven in a new comment.

@slarse
Copy link
Collaborator Author

slarse commented Feb 12, 2021

It appears as if there were problems with setting the classpath in the version prior to 3.3.9: https://maven.apache.org/docs/3.3.9/release-notes.html

Perhaps 3.3.9 did not entirely fix the problem, and that's what we're experiencing here. In any case, 3.3.9 is more than 5 years old at this point, so using a newer version seems appropriate.

Upgrade steps for Maven on Ubuntu 16

Here's what I did to upgrade Maven

sudo apt remove maven
wget https://downloads.apache.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
tar -xzf apache-maven-3.6.3-bin.tar.gz
sudo mv apache-maven-3.6.3 /opt/maven

And then just update the M2_HOME and PATH environment variables.

export M2_HOME=/opt/maven
export PATH=$M2_HOME/bin:$PATH

After that, I think the builds should work!

@monperrus
Copy link
Collaborator

excellent analysis!

upgraded maven on sos-builder01

maven is now:

# /opt/maven/bin/mvn --version
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /opt/maven
Java version: 1.8.0_222, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF8
OS name: "linux", version: "4.4.0-164-generic", arch: "amd64", family: "unix"

And then just update the M2_HOME and PATH environment variables.

I assume this has to be done on the Jenkins side

@slarse
Copy link
Collaborator Author

slarse commented Feb 12, 2021

And then just update the M2_HOME and PATH environment variables.

I assume this has to be done on the Jenkins side

I think it's better to set it on the node itself, as it relates to system values that I don't think its customary for Jenkins to have to care about. I.e. it would make sense for a worker to be able to just assume that Maven is properly configured. But I'm not certain.

I temporarily set the correct values in the build script for jimfs, and it works! https://ci.inria.fr/sos/job/Jimfs/1558/console

@monperrus
Copy link
Collaborator

I temporarily set the correct values in the build script for jimfs, and it works!

Oh nice, what an achievement!!

I think it's better to set it on the node itself,

It depends, environment variables are user-specific anyway, so we would have to assume that Jenkins always uses the same Unix user. So I would suggest to put that in Jenkins.

We can assume that we would have /opt/maven/bin/mvn on all nodes.

@slarse
Copy link
Collaborator Author

slarse commented Feb 12, 2021

I've configured M2_HOME and PATH for the Ubuntu 16 node in Jenkins, and it appears to work (this build passed fine: https://ci.inria.fr/sos/job/Joda%20Time/1608/console). Let's wait for all of the builds to run tonight, and if they go green we can do the same upgrade for the Ubuntu 18 node and then set the environment variables globally instead of per-node.

@slarse
Copy link
Collaborator Author

slarse commented Feb 12, 2021

But, yeah, I think the problem is finally solved, although it slightly annoys me that I didn't manage to find the root cause of the problem :)

@monperrus
Copy link
Collaborator

Well, classpath + Maven dependency resolution + outdated versions + cache is a toxic cocktail for finding root causes.

@slarse
Copy link
Collaborator Author

slarse commented Feb 15, 2021

Seemingly the only casualty of the upgrade was the Spoon Snapshot Deployer. It appears as if the environment variables aren't respected when running Maven commands through the Jenkins Maven plugin (probably, it does not start a shell).

I fixed that by specifying a version of Maven in the plugin, which the plugin then automatically installs.

There is however still a NoClassDefFoundError in the casper project. This I can reproduce on my own machine (not a Ubuntu VM, but my actual machine). So there are actual problems here. I'll investigate.

@slarse
Copy link
Collaborator Author

slarse commented Feb 15, 2021

Okay, so I found the problem. Casper is pulling slf4j-api 1.7.5 from some other dependency, instead of 1.7.30 from Spoon. As org.slf4j.event.Level was introduced with 1.7.15, that's where the crash comes from.

The root cause of these issues is that we're using semi-internal SLF4J APIs (notably, org.slf4j.event.Level. As far as I can tell, the intended use of slf4j-api by clients (i.e. not logging implementations) is to only use the APIs included in org.slf4j. Anything else is not guaranteed to be backwards compatible, as we have painfully experienced over the past few weeks.

Also note that this is in fact a breaking change. Previously, we used the Level class from log4j, so there is breakage in our API. In particular, Environment.getLevel() now returns org.slf4j.event.Level, as opposed to org.apache.logging.log4j.Level ( see this diff).

I suggest that we make changes such that we only use org.slf4j. It's either that, or we must include a notice that libraries using Spoon must also declare a dependency on slf4j-api>=1.7.15, which doesn't seem viable to me. For our public API, we should define our own logging primitives (e.g. Level), such that we don't commit to a logging framework or version. No matter how we do it, dropping log4j is breaking in the parts of our API that used its Level enum, so in defining our own we avoid that happening again.

@monperrus wdyt?

@monperrus
Copy link
Collaborator

I suggest that we make changes such that we only use org.slf4j.

Yes, that's an excellent idea.

For our public API, we should define our own logging primitives (e.g. Level)

Indeed. Never expose others' private APIs in your own one :)

@slarse
Copy link
Collaborator Author

slarse commented Feb 22, 2021

So, even with #3796 merged today, the Casper build still isn't passing in CI. Turns out Casper uses the old snapshot repository, so it's not actually pulling snapshots anymore (thus using the latest beta instead). The build will go green on the next beta release (so next Monday).

@monperrus would you like me to PR the new snapshot repository into Casper, or do we leave it at this? We could also inject it into the pom in the same way that build.sh does.

@monperrus
Copy link
Collaborator

monperrus commented Feb 22, 2021 via email

@slarse
Copy link
Collaborator Author

slarse commented Feb 24, 2021

I think we can safely close this issue now as Casper is finally passing as well: https://ci.inria.fr/sos/job/casper/1556/

For future reference: the core problem we faced here was the use of internal parts of slf4j-api. In the future, we should only ever use stuff that is part of the slf4j package. Switching to a newer Maven version almost fixed the problem, but that was likely just a fluke related to different resolution of runtime dependencies. The real fix was provided by #3796.

@slarse slarse closed this as completed Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants