Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze assembly error - Reproduction of #528 and #2650 #2655

Closed
wants to merge 3 commits into from
Closed

Conversation

@lefou
Copy link
Member Author

lefou commented Jul 12, 2023

It looks to me, the main class can't be found if the jar reaches a specific size and the main class is located at the end of the jar.

I made some tests with my alpha-grade mill-spring-boot plugin, which also provided support to executable jars. The difference is, that not all jars are unpacked but included as is and a dedicated main is used. I added this failing test case and it can be started. My assumption is, that the specific spring-boot main class is located at the start of the JAR vs. in our case we append the main class at the end.

@lihaoyi
Copy link
Member

lihaoyi commented Aug 2, 2023

One thing I've found is that even jar tf on the problematic prefixed assembly jar fails

java.util.zip.ZipException: invalid CEN header (bad signature)
	at java.base/java.util.zip.ZipFile$Source.zerror(ZipFile.java:1598)
	at java.base/java.util.zip.ZipFile$Source.checkAndAddEntry(ZipFile.java:1183)
	at java.base/java.util.zip.ZipFile$Source.initCEN(ZipFile.java:1537)
	at java.base/java.util.zip.ZipFile$Source.<init>(ZipFile.java:1315)
	at java.base/java.util.zip.ZipFile$Source.get(ZipFile.java:1277)
	at java.base/java.util.zip.ZipFile$CleanableResource.<init>(ZipFile.java:709)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:243)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:172)
	at java.base/java.util.zip.ZipFile.<init>(ZipFile.java:143)
	at jdk.jartool/sun.tools.jar.Main.list(Main.java:1502)
	at jdk.jartool/sun.tools.jar.Main.run(Main.java:361)
	at jdk.jartool/sun.tools.jar.Main.main(Main.java:1683)

but if I manually cut out the prependShellScript of the problematic assembly jar and create a new jar (as shown below) then jar tf works

@ val exeBytes = os.read.bytes(
      os.Path("/Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/large/assembly.dest/out.jar")
    )

@ os.write(os.pwd / "newExe.jar", exeBytes.drop(209))

Also, unzip -l and unzip both work on both the problematic and non-problematic assembly jars. zipinfo works as well

Could it be some kind of limitation in the Java zip reading implementation? It seems unlikely, but given that unzip and zipinfo work, it seems the problematic behavior is Java-specific and not an inherent problem with the zip files involved

@lihaoyi
Copy link
Member

lihaoyi commented Aug 2, 2023

@lihaoyi
Copy link
Member

lihaoyi commented Aug 3, 2023

Another data point: taking the problematic jar, unziping it and re-ziping it generates a non-problematic zip file, without a prepended shell script, that can be read without issue via jar tf, But cating it to append to it a launcher shell script again creates a final jar file that cannot be read by jar tf (same error as above) but can be read by zip.

This somewhat confirms that it isn't our implementation of create-a-zip-file that is problematic: a native unix zip-created archive, prepended with a shell script, fails to read via jar tf (which uses java.util.zip.ZipFile) but can be successfully read via unzip -l.

That means the problem has to be the interaction between the JVM zip reading code and the the prepended shell script, since the problem remains even after swapping out the zip writing code, but the problem goes away if we remove the prepended shell script or we use a non-JVM program to read the prepended jar file

@lefou
Copy link
Member Author

lefou commented Aug 3, 2023

I made almost the same tests a while ago. I think it worth to note that it's not the sheer file size, that makes the JRE zip implementation fail, as an assembly jar built with mill-spring-boot and the exact same dependencies can be started. The difference is, that the spring boot assembly contains a lot less zip entries, as it is not unpacking all the dependencies, but rather just embeds the original jars. So, it's either the entry count (=catalog size) or the position (index) of the main class in that catalog. An interesting test would be to add the main class first (don't append to the upstream assembly jar). Though, I don't think it will fix it, from looking at the JDK code. Maybe, we can count the zip entries ourself and print a warning if we detect, that it will be too large to be executable. What I also wonder, we can't be the first who fell into that trap.

@lihaoyi
Copy link
Member

lihaoyi commented Aug 7, 2023

Another data point, I tried this with SBT-1.8.2 and SBT-Assembly 2.2.1, and it has the same failure mode: jar tf fails if the assembly jar was created with assemblyPrependShellScript := Some(defaultUniversalScript(shebang = true)), and jar tf succeeds if the assembly jar was created without that config. So Mill is not the only tool where this mis-behavior appears

@lihaoyi
Copy link
Member

lihaoyi commented Aug 10, 2023

Another data point is that zip (note: not unzip!) provides different errors when trying to load the two prepended jars, but does not give any errors trying to process the non-prepended jars

lihaoyi mill$ zip /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/large/assembly.dest/out.jar
	zip warning: Zip64 EOCDR not found where expected - compensating
	zip warning: (try -A to adjust offsets)
	zip warning: expected 75516 entries but found 75512

zip error: Zip file structure invalid (/Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/large/assembly.dest/out.jar)


lihaoyi mill$ zip /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/small/assembly.dest/out.jar
	zip warning: unexpected signature on disk 0 at 10221122

	zip warning: archive not in correct format: /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/small/assembly.dest/out.jar
	zip warning: (try -F to attempt recovery)

zip error: Zip file structure invalid (/Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/small/assembly.dest/out.jar)


lihaoyi mill$ zip /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/noExe/large/assembly.dest/out.jar

zip error: Nothing to do! (/Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/noExe/large/assembly.dest/out.jar)


lihaoyi mill$ zip /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/noExe/small/assembly.dest/out.jar

zip error: Nothing to do! (/Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/noExe/small/assembly.dest/out.jar)

@lihaoyi
Copy link
Member

lihaoyi commented Aug 10, 2023

Final bit of debugging: it appears that the misbehavior of jar tf on a prepended jar starts when the number of entries in the jar file hits exactly 65535 zip entries.

A prepended jar with 65534 entries can be jar tfed without issue, while one with 65535 entries fails with the java.util.zip.ZipException: invalid CEN header (bad signature) error. The size of the entries in the jar file doesn't seem to matter. Zeroing them out does not change the number of entries necessary to hit the problem.

This was tested by applying the following diff

$ git diff
diff --git a/integration/feature/assembly/repo/build.sc b/integration/feature/assembly/repo/build.sc
index 43d09cd415..62bf22aff7 100644
--- a/integration/feature/assembly/repo/build.sc
+++ b/integration/feature/assembly/repo/build.sc
@@ -30,5 +30,7 @@ object noExe extends Module {

 object exe extends Module {
   object small extends Setup
-  object large extends Setup with ExtraDeps
+  object large extends Setup with ExtraDeps{
+    override def mainClass = Some("foo.bar.Main")
+  }
 }
diff --git a/scalalib/src/mill/scalalib/Assembly.scala b/scalalib/src/mill/scalalib/Assembly.scala
index cd25d18152..92c1a81448 100644
--- a/scalalib/src/mill/scalalib/Assembly.scala
+++ b/scalalib/src/mill/scalalib/Assembly.scala
@@ -223,9 +223,12 @@ object Assembly {
       )
       manifest.build.write(manifestOut)
       manifestOut.close()
-
+      var writeCount = 0
       val (mappings, resourceCleaner) = Assembly.loadShadedClasspath(inputPaths, assemblyRules)
       try {
+        // Good: 32000 48000 56000 60000 62000 62750 62762 62763 62764 62766
+        // Bad:  62767 62770  62775 63000 63016 63032 63064 63125 63250 63500 64000
+        val max =  62767
         Assembly.groupAssemblyEntries(mappings, assemblyRules).foreach {
           case (mapping, entry) =>
             val path = zipFs.getPath(mapping).toAbsolutePath
@@ -238,8 +241,16 @@ object Assembly {
                 val cleaned = if (Files.exists(path)) separated else separated.drop(1)
                 val concatenated =
                   new SequenceInputStream(Collections.enumeration(cleaned.asJava))
-                writeEntry(path, concatenated, append = true)
-              case entry: WriteOnceEntry => writeEntry(path, entry.inputStream(), append = false)
+
+                if (writeCount < max){
+                  writeCount += 1
+                  writeEntry(path, concatenated, append = true)
+                }
+              case entry: WriteOnceEntry =>
+                if (writeCount < max) {
+                  writeCount += 1
+                  writeEntry(path, entry.inputStream(), append = false)
+                }
             }
         }
       } finally {

And running the following commands to test the success of jar tf and count the number of entries

rm -rf integration/feature/assembly/repo/out && ./mill -i dev.run integration/feature/assembly/repo -i show exe.large.assembly && jar tf /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/large/assembly.dest/out.jar

ls -lh /Users/lihaoyi/Github/mill/integration/feature/assembly/repo/out/exe/large/assembly.dest/out.jar

It seems that there must be some change in the handling of zip files when the number of entries reaches 65535.

@lefou
Copy link
Member Author

lefou commented Aug 14, 2023

Thanks for this great analysis, @lihaoyi. It would be nice, if we could evaluate the entry count and warn the user, if it is too large. We could continue, or fail or produce a jar without prepend script in case of too many entries. Don't know which option is the best though. All are unpredictable. Or we make the behavior configurable and default to fail, which feels like the best option to me.

@lefou
Copy link
Member Author

lefou commented May 3, 2024

I opened PR #3140, which detects faulty assembly JARs, bases on Haoyi's findings and fails the build with an actionable error message.

@lefou
Copy link
Member Author

lefou commented May 6, 2024

Merged as part of #3140.

@lefou lefou closed this May 6, 2024
@lefou lefou added this to the 0.11.8 milestone May 6, 2024
@lefou lefou deleted the repro-528 branch May 16, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants