Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to relocate/shade in assembly #947

Merged
merged 2 commits into from
Aug 27, 2020
Merged

Conversation

joan38
Copy link
Collaborator

@joan38 joan38 commented Aug 13, 2020

Resolves #355

This gives the ability to relocate/shade in assembly on a global level.
This is a good first step but we might want later to provide the ability to shade only for a given lib. This is currently not possible because at the time we get the class path we lost the information of the dependency tree.

Example of usage:

  override def assemblyRules =
    Assembly.defaultRules ++ Seq(Assembly.Rule.Relocate("shapeless.**", "com.netflix.data.playback.shade.shapless.@1"))

@joan38 joan38 force-pushed the shade branch 2 times, most recently from 09270d7 to e8edff1 Compare August 13, 2020 02:59
@alexarchambault
Copy link
Contributor

@joan38 Did you look at jarjar-abrams, instead of jar-relocator? It allows to shade / relocate classes, but it also adjusts the scala annotations accordingly, so that the shaded classes can be used from scala code too (if library A is shaded in library B, and scala users depend on B, they'll be able to reference the shaded classes of A thanks to that. This doesn't work with other shading / relocating libraries, because of the discrepancy after shading between the - shaded - class names and the - non-shaded - scala annotations).

@joan38
Copy link
Collaborator Author

joan38 commented Aug 13, 2020

Hi @alexarchambault,

I saw it while trying to implement the same as sbt-assembly and it's usage here and here and got stuck on how to use the lib (missing documentation).
But after another look at it actually I could use this one. I will need to unzip jars first. And we could reuse the same shading rules in a different Target.

I'm a little busy with other stuff at work right now but I can give it a try later. If I need some help can I ask you?

@joan38
Copy link
Collaborator Author

joan38 commented Aug 14, 2020

Sound like https://github.com/pantsbuild/jarjar is archived and has no activity since 2018. Is it safe to use that?

@alexarchambault
Copy link
Contributor

Sound like https://github.com/pantsbuild/jarjar is archived and has no activity since 2018. Is it safe to use that?

Someone will probably fork it at some point… It's originally a fork itself too.

@alexarchambault
Copy link
Contributor

If I need some help can I ask you?

Sure!

@alexarchambault
Copy link
Contributor

As another example, https://github.com/coursier/sbt-shading uses pantsbuild/jarjar. I need to migrate it to jarjar-abrams.

@joan38 joan38 force-pushed the shade branch 2 times, most recently from 529060b to efd7894 Compare August 16, 2020 05:48
@joan38
Copy link
Collaborator Author

joan38 commented Aug 16, 2020

@alexarchambault I have done the switch to jarjar-abrams but I didn't want to unzip, write jars, modify/shade jars, recopy jars. It's too many operations on disk.

So I copy and modified the Shader object to add support for InputStream shading, what do you guys think, cc @eed3si9n? If that sounds good to you I can make a PR in jarjar-abrams and then we can remove the Shaderr object from here.

Also if it's easier for you, I'd be happy to schedule a 30min call with you guys, let me know.

@eed3si9n
Copy link

@joan38 Feel free to send a PR to jarjar-abrams. I've split it out of sbt-assembly so everyone can use it.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 18, 2020

Alright folks, we are on jarjar-abrams 0.2.0 that supports shading over InputStreams.
@lefou @lihaoyi Could you review?
My team is in need of this shading capability as we are using Spark with shapeless and there is a clash of versions.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 18, 2020

Following from the Gitter chat with @lihaoyi, this "looks reasonable at a glance" but would like someone else to also review.
Can I ask a review from you @alexarchambault ?

@alexarchambault
Copy link
Contributor

Can I ask a review from you @alexarchambault ?

I never really reviewed mill code, but I can have a look. It would have been nice to minimize the diff somehow… It seems there's quite a few added / removed empty lines, renamed variables, re-formatting, that hinder reviewing.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 19, 2020

@alexarchambault I just pushed the change in 2 separate commits for easier review.
I added few comments to help.
I'm also happy to jump on a call anytime if you want.

.entries()
.asScala
.filterNot(_.isDirectory)
.map(entry => entry.getName -> jarFile.getInputStream(entry))
.map(entry => jarFile.getInputStream(entry) -> entry.getName)
Shader.shadeInputStreams(shadeRules, mappings.toSeq, verbose = false)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shading jar entries

.walk(path)
.filter(os.isFile)
.map(subPath => subPath.relativeTo(path).toString -> os.read.inputStream(subPath))
.map(subPath => os.read.inputStream(subPath) -> subPath.relativeTo(path).toString)
Shader.shadeInputStreams(shadeRules, pathsWithMappings, verbose = false)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shading class files (that are not jars)

private def classpathIterator(inputPaths: Agg[os.Path]): Agg[(String, InputStream)] =
private def classpathIterator(inputPaths: Agg[os.Path], assemblyRules: Seq[Assembly.Rule]): Agg[(String, InputStream)] = {
val shadeRules = assemblyRules.collect {
case Rule.Relocate(from, to) => ShadePattern.Rename(List(from -> to)).inAll
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer exposing our own API (Rule.Relocate) and mapping it to jarjar-abram's ShadePattern.Rename in case we are swapping the shading lib.

@@ -29,6 +30,8 @@ object Assembly {

case class Exclude(path: String) extends Rule

case class Relocate(from: String, to: String) extends Rule
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later we can add more config on this to apply the shading only to specific libs.

@joan38 joan38 force-pushed the shade branch 2 times, most recently from 55514ce to 4e60faf Compare August 19, 2020 05:30
Copy link
Contributor

@alexarchambault alexarchambault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. It would have been nice to have a test for that too, although I'm not too familiar with how things are tested in this repo. I'll let @lefou or @lihaoyi decide whether some tests should be added.

@@ -77,52 +70,39 @@ object Assembly {
}
}

private def classpathIterator(inputPaths: Agg[os.Path]): Generator[AssemblyEntry] = {
Generator.from(inputPaths)
private def classpathIterator(inputPaths: Agg[os.Path]): Agg[(String, InputStream)] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change that Generator to an Agg? Generator seemed safer, in case the os.walk returns many files.

Copy link
Collaborator Author

@joan38 joan38 Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the first level inputPaths back to Generator.from(inputPaths) but the os.walk is not possible unless jarjar-abram supports Generator. It looks for a Seq maybe we should be more open and require a Iterable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the signature in my comment below, we shouldn't need to have jarjar-abrams be aware of the collection here.

.entries()
.asScala
.filterNot(_.isDirectory)
.map(entry => entry.getName -> jarFile.getInputStream(entry))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of the InputStream, via a () => InputStream seemed safer before this. I think they're then opened one-by-one down the line, rather than all at once with these changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jarjar-abram is looking for a Seq[(String, InputStream)].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Maybe then, Shader.shadeInputStreams in jarjar-abrams could be changed to something like

def processByteCode(
    rules: Seq[ShadeRule],
    verbose: Boolean
): (String, Array[Byte]) => Option[(String, Array[Byte])]

This method could be used to process the InputStreams one by one.

Copy link
Collaborator Author

@joan38 joan38 Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done (String, InputStream) => Option[(String, InputStream)] in eed3si9n/jarjar-abrams#6.
I monkey patched in this PR for the time being to see how it plays. What's your thought on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed again with Array[Byte] let me know what you think.

Copy link
Collaborator Author

@joan38 joan38 Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this, I feel like I prefer the InputStream over Array[Byte] because at least with an InputStream we are not loading all the jars in memory if there is no rules. And also in the future we can think of jarjar-abram or another one that could process the shading on the fly inside an InputStream. Maybe?

I added this commit to see how it looks with InputStream. @alexarchambault let me know what you think.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 24, 2020

@lefou this is good to go.

@lefou
Copy link
Member

lefou commented Aug 24, 2020

Looks good to me. As we remove the whole append logic I'm wondering when and where it was used before. I just want to ensure we don't remove something that is still used. Also we should add some documentation about shading packages including an example.

@joan38 We have failing test cases which look related to the removed append logic. E.g.

X mill.scalalib.HelloWorldTests.assembly.assemblyRules.appendPatternMultiModule 9298ms 

  utest.AssertionError: #2: referenceContent.contains("Model Reference Config File"),

  referenceContent: String = ##############################

  # Core Reference Config File #

  ##############################

  bar.baz=hello

    utest.asserts.Asserts$.assertImpl(Asserts.scala:30)

    mill.scalalib.HelloWorldTests$.$anonfun$tests$168(HelloWorldTests.scala:718)

    mill.scalalib.HelloWorldTests$.$anonfun$tests$168$adapted(HelloWorldTests.scala:709)

    mill.scalalib.HelloWorldTests$.workspaceTest(HelloWorldTests.scala:314)

    mill.scalalib.HelloWorldTests$.checkAppendMulti$1(HelloWorldTests.scala:320)

    mill.scalalib.HelloWorldTests$.$anonfun$tests$182(HelloWorldTests.scala:741)

@joan38
Copy link
Collaborator Author

joan38 commented Aug 24, 2020

Ah I didn't see it. Now it's fixed.

@lefou
Copy link
Member

lefou commented Aug 25, 2020

@joan38 I find it really hard reviewing your PRs, as you constantly rebase exising commit(s). Adding new commits would better allow tracking what changed and would speed up review quite a bit. In most cases we squash when merging anyways, so adding commits is not an issue.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 25, 2020

Thanks for the feedback @lefou, I'll push again separating and avoid amend commits in the futur.

@joan38 joan38 force-pushed the shade branch 2 times, most recently from 2004f7b to fda8193 Compare August 25, 2020 16:41
@@ -267,19 +262,12 @@ object Jvm {
def createAssembly(inputPaths: Agg[os.Path],
manifest: JarManifest = JarManifest.Default,
prependShellScript: String = "",
base: Option[os.Path] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to remove this because we need to process both the project's classes and dependencies with the same shader instance.

@@ -293,30 +281,18 @@ object Jvm {
manifest.build.write(manifestOut)
manifestOut.close()

Assembly.groupAssemblyEntries(inputPaths, assemblyRules).view
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for .view because we execute only a foreach right after

Comment on lines -300 to -307
val separated =
if (entries.isEmpty) Nil
else
entries.head +: entries.tail.flatMap { e =>
List(JarFileEntry(e.mapping, () => new ByteArrayInputStream(separator.getBytes)), e)
}
val concatenated = new SequenceInputStream(
Collections.enumeration(separated.map(_.inputStream).asJava))
Copy link
Collaborator Author

@joan38 joan38 Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concatenation is handled in groupAssemblyEntries now.

@@ -335,16 +311,14 @@ object Jvm {
PathRef(output)
}

private def writeEntry(p: java.nio.file.Path, is: InputStream, append: Boolean): Unit = {
Copy link
Collaborator Author

@joan38 joan38 Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need append now since we are doing the InputStreams concat and we are not merging with the upstream jar.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 25, 2020

I pushed again and added comments.
But again if you want more details I'm happy to jump on a call to discuss if it's easier.

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for your extra comments.

We now no longer use the upstreamAssembly target, but keep it. Give me some time to think about it. Maybe we can deprecate and remove that target in the future. If I understood it correctly, it's purpose was optimization, which we've lost now. Otherwise, I'm going to merge.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 25, 2020

Alright after writing a reply to your comment @lefou I realized the importance of upstreamAssembly. The goal is to save on rebuilding the biggest part of the jar (which is the dependencies) over and over every time we change some code in our project. I benchmarked it quickly and it is actually significant on my project.

I actually managed to revert back this benefit keeping the shading. Have a look.

@joan38 joan38 force-pushed the shade branch 2 times, most recently from 62ff286 to 81b0023 Compare August 26, 2020 05:53
@joan38
Copy link
Collaborator Author

joan38 commented Aug 26, 2020

Build is green now @lefou if you want to take a look.

@lefou
Copy link
Member

lefou commented Aug 27, 2020

I actually managed to revert back this benefit keeping the shading. Have a look.

Not sure how to read that statement.

Just to clarify: In its current state this PR makes building assemblies significantly slower, right? For the sake of class relocation support.

@joan38
Copy link
Collaborator Author

joan38 commented Aug 27, 2020

It used to be slower in the case of running assembly then modifying your code and rerunning assembly. Because we would reprocess all the dependencies again.

Now it's back to what it was before, and we have shading. So no compromise done.

@lefou
Copy link
Member

lefou commented Aug 27, 2020

Ok, great! Thank you very much!

@lefou lefou merged commit 1a410f0 into com-lihaoyi:master Aug 27, 2020
@lefou lefou added this to the after 0.8.0 milestone Aug 27, 2020
@joan38
Copy link
Collaborator Author

joan38 commented Aug 27, 2020

Thanks @lefou, @alexarchambault and @eed3si9n !

@joan38 joan38 deleted the shade branch August 31, 2020 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shade / Relocation of classes support
5 participants