Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resource leak in Maven plugin #571

Merged
merged 11 commits into from
Jun 29, 2020

Conversation

lutovich
Copy link
Contributor

The resource leak increased the number of threads with every execution of Exclipse-based formatter step in a Maven module. It happened because the SpotlessCache wasn't properly utilized. Every cache key was based on randomized file names because Maven plugin relocates config files into the target directory with a random file name. This resulted in all cache accesses being cache misses.

This PR fixes the problem by:

  • making the Maven plugin use non-random file names for output files. FileLocator is changed to construct names based on a hash of the input path. Input path can be a URL, file in a JAR, or a regular local file

  • changing FileSignature (used for SpotlessCache keys) to use filenames instead of paths and file hashes instead of last modified timestamps

These two changes allow FileSignature to uniquely identify a set of files by their content and not take file paths into account. As a result, created keys for SpotlessCache are properly comparable which results in lots of cache hits and decreased number of created threads.

Changed FileCignature only accepts files, not directories. NPM formatters changed to have their signatures based on package.json file instead of the whole node_modules directory.

Fixes #559

Copy link
Member

@nedtwigg nedtwigg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It's great to see the consequences of the "content hash" choice, and there's a good chance that it will be the only possibility. Turns out there are some problems with sorting on filename, and even some problems with sorting on content, which I described in my review below. The problems aren't show-stoppers though, and it might be our best path.

What do you think about FileLocator first checking to see if it's a local config file, and only using copy-to-hashed-path for remote URLs? I pushed up a demo in #572. Whether we merge this (#571) or not, it seems like #572 speeds things up a bit, and I think it fixes #559 on its own.

In order to support the Gradle Build Cache, we might need to do the FileSignature change you've implemented here no matter what, so it's super useful to see it play out. But it hurts me to see how much extra IO it forces us to do. It's probably premature optimization to worry about it, but I think I have a way that we could keep the current FileSignature implementation, and only use content hashing when it's needed.

If we can solve #559 with #572 in the short-term, then I'd prefer to do that and buy some time to think about keeping FileSignature more IO-efficient.

@lutovich lutovich marked this pull request as draft May 11, 2020 19:36
@lutovich
Copy link
Contributor Author

Unfortunately, #559 can't be solved with #572 because settings files in openhab-addons project come from a plugin dependency. Those files are relocated from a JAR file by FileLocator. The plugin dependency is configured here. I need to think a bit more about the problems you highlighted in this PR.

@lutovich
Copy link
Contributor Author

Hey @nedtwigg, I'm observing the same issue when building spotless as described in #328 (comment):

$ gradle clean --stacktrace
executing gradlew instead of gradle

FAILURE: Build failed with an exception.

* Where:
Build file '/Projects/spotless/ide/build.gradle' line: 11

* What went wrong:
A problem occurred evaluating project ':ide'.
> A problem occurred configuring project ':lib'.
   > java.lang.NullPointerException (no error message)

* Try:
Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Exception is:
org.gradle.api.GradleScriptException: A problem occurred evaluating project ':ide'.
[...]
Caused by: org.gradle.api.ProjectConfigurationException: A problem occurred configuring project ':lib'.
[...]
Caused by: java.lang.NullPointerException
        at org.eclipse.jgit.internal.storage.file.UnpackedObjectCache$Table.index(UnpackedObjectCache.java:115)
        at org.eclipse.jgit.internal.storage.file.UnpackedObjectCache$Table.contains(UnpackedObjectCache.java:76)
        at org.eclipse.jgit.internal.storage.file.UnpackedObjectCache.isUnpacked(UnpackedObjectCache.java:31)
        at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:398)
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
        at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:203)
        at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:908)
        at org.eclipse.jgit.revwalk.RevWalk.parseCommit(RevWalk.java:818)
        at com.diffplug.gradle.spotless.GitRatchet.rootTreeShaOf(GitRatchet.java:187)
        at com.diffplug.gradle.spotless.SpotlessTaskBase.setupRatchet(SpotlessTaskBase.java:91)
        at com.diffplug.gradle.spotless.FormatExtension.setupTask(FormatExtension.java:681)
        at com.diffplug.gradle.spotless.JavaExtension.setupTask(JavaExtension.java:196)
        at com.diffplug.gradle.spotless.SpotlessExtension.lambda$createFormatTasks$0(SpotlessExtension.java:71)
[...]

I usually have two git remotes:

  • upstream - points to the original repo diffplug/spotless
  • origin - points to my fork lutovich/spotless

and I clone the original repo first, then rename upstream from origin to upstream.

This rename seems to be what's causing the NPE. Steps to reproduce:

$ git clone [email protected]:diffplug/spotless.git
$ cd spotless
$ gradle clean # works fine
$ git remote rename origin upstream
$ gradle clean # fails with NPE

Not sure if it's a bug or a problem with my local setup. That's why I describe it here instead of creating an issue.

The workaround for me is to not rename the remote :) I'll proceed with fixing this PR.

@nedtwigg
Copy link
Member

nedtwigg commented Jun 25, 2020

Thanks for reporting! I bet part of this is the master -> main migration, which is a minor problem for the in-progress PRs. We've also improved the git-related error messages a lot in the newer versions of Spotless, not sure if the latest spotless is in our dogfooded settings.gradle version. I bet if you merge main into your feature branch, it will resolve.

@lutovich
Copy link
Contributor Author

The steps to reproduce use a clean clone of diffplug/spotless and do not use a stale feature branch like this one. That's why I thought it might be a bug.

@lutovich lutovich force-pushed the fix-mvn-resource-usage branch 5 times, most recently from 553cffe to 52a7a6c Compare June 27, 2020 21:30
@lutovich lutovich marked this pull request as ready for review June 27, 2020 21:31
@lutovich
Copy link
Contributor Author

@nedtwigg review comments should now be addressed. I fixed them and squashed everything into the same commit

@nedtwigg
Copy link
Member

Awesome, thanks very much! I'm gonna break this commit into two (one for the npm part, another for the rest) and then add a few things. Do you mind if I force-push to your branch, to maintain this discussion history?

lutovich added 2 commits June 27, 2020 15:34
… file instead of the

whole node_modules directory.
The resource leak increased the number of threads with every execution of
Exclipse-based formatter step in a Maven module. It happened because the
`SpotlessCache` wasn't properly utilized. Every cache key was based on
randomized file names because Maven plugin relocates config files into
the target directory with a random file name. This resulted in all cache
accesses being cache misses.

This commit fixes the problem by:

 * making the Maven plugin use non-random file names for output files.
   `FileLocator` is changed to construct names based on a hash of the
   input path. Input path can be a URL, file in a JAR, or a regular
   local file

 * changing `FileSignature` (used for `SpotlessCache` keys) to use filenames
   instead of paths and file hashes instead of last modified timestamps

These two changes allow `FileSignature` to uniquely identify a set of files
by their content and not take file paths into account. As a result, created
keys for `SpotlessCache` are properly comparable which results in lots of
cache hits and decreased number of created threads.

Changed `FileSignature` only accepts files, not directories.
@lutovich
Copy link
Contributor Author

Sure, no problem. Let me know if I can help with anything on this PR

@nedtwigg nedtwigg force-pushed the fix-mvn-resource-usage branch from 52a7a6c to 823c2c3 Compare June 27, 2020 23:08
…performance based on lastModified that we had prior to this PR.
@nedtwigg nedtwigg force-pushed the fix-mvn-resource-usage branch from 0c96ee3 to e4db705 Compare June 27, 2020 23:11
nedtwigg added 3 commits June 27, 2020 16:17
…le, once a file has changed, we can discard the old one.
where a FileSignature is generated for multiple files with the same filename.
@nedtwigg
Copy link
Member

@lutovich I made two substantive changes. One is that I added a descriptive warning message that I don't expect anyone to ever see, but if they do, they'll be glad to see it rather than a silent failure (aaddff8). And the second thing I did was add a cache, so that when we're doing JarState, we aren't hashing the same jars over and over and over. If this LGTY, then it LGTM.

@simschla can you take a peek at b695622? I think it's a good change. We're changing FileSignature so that it only handles files (not folders). The upside is that caching will work better (remote build cache for Gradle, fixes other caching issues for Maven).

@simschla
Copy link
Contributor

can you take a peek at b695622? I think it's a good change. We're changing FileSignature so that it only handles files (not folders). The upside is that caching will work better (remote build cache for Gradle, fixes other caching issues for Maven).

This looks good to me. Thank you for the change.

@nedtwigg
Copy link
Member

nedtwigg commented Jun 28, 2020

No rush @lutovich, but I'm waiting for a positive confirmation from you on the changes I made here before merging this, which allows other merges to go through. I'm only echoing because the GitHub UI doesn't allow me to ask for a code review from you (since it is your PR), so wanted to make sure it didn't get lost :)

Copy link
Contributor Author

@lutovich lutovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, made two minor comments.

File file = new File(p);
// calculate the size and content hash of the file
long size = 0;
byte[] buf = new byte[1024];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why do you prefer to read the file this way instead of Files#readAllBytes()?
Also, maybe the code could be simplified a bit by using DigestInputStream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the eclipse jars push into the tens of MB, and SHA-256 only works on 512 bytes at a time. It's a micro-optimization to worry about that memory pressure, very possible that it's slower than Files.readAllBytes, and it wasn't my motivation. The real reason is that I wanted to protect against the race condition where you read a file, someone else modifies it, and then you check the timestamp (or the opposite order) and end up with a lying cache. By opening it to read, and checking the timestamp while it's open, you can be (more) sure that the content you're reading is the content for that timestamp. I think it's still not guaranteed on unix systems, so it's not perfect, but that was the motivation.

lib/src/main/java/com/diffplug/spotless/FileSignature.java Outdated Show resolved Hide resolved
@nedtwigg nedtwigg merged commit 07d5ab0 into diffplug:master Jun 29, 2020
@lutovich lutovich deleted the fix-mvn-resource-usage branch June 29, 2020 21:34
@nedtwigg
Copy link
Member

nedtwigg commented Jul 2, 2020

Released in plugin-maven 2.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource leak when using Maven plugin
3 participants