Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document cases where ijar + scalac is suboptimal #125

Open
johnynek opened this issue Jan 18, 2017 · 43 comments
Open

document cases where ijar + scalac is suboptimal #125

johnynek opened this issue Jan 18, 2017 · 43 comments

Comments

@johnynek
Copy link
Member

ijar https://github.com/bazelbuild/bazel/blob/master/third_party/ijar/README.txt is what bazel uses to create "interface jars" from compiled jars. By hashing those interfaces we see if we need to recompile.

The challenge with scala is that scalac adds a ScalaSignature annotation to the class that includes extra information from the scala type system. Any change to that, will change the hash. It may be that not all of them are actually essential to compile potentially for private method, for instance, where other callers can't call the method, we may be able to strip the ScalaSignature.

The -Yskip-inline-info-attribute scalac option greatly improves things since without that almost every change will cause a recomputation. It seems that is a tradeoff that is worth it, but we have not benchmarked to see if that costs performance.

/cc @ianoc

@johnynek
Copy link
Member Author

This could be useful as an example of parsing the scala annotation on each class: https://github.com/avityuk/scala-pickled-visualizer

It is not clear to me if the scala annotation is already minimal or if we could remove data about private methods and prevent changes. Might be interesting to make some test cases, run ijar and the above tool and see if things change when private[this] defs and vals change.

@softprops
Copy link
Contributor

Hi @johnynek just learning about ijars which I think will be essential for avoiding invalidating scalac dependency chains. Running some experiements with scala_libary rules. I can see ijars getting created but it didn't look like they were actually used when I was testing adding/removing private methods to down stream dependencies which shouldn't have an affect on needing to recompile upstream users. Then I found this thread. I tried adding the -Yskip-inline-info-attribute suggestion and it didn't seem to help.

Here's of my experiement

scala_library(
    name = "foo",
    srcs = glob(
      [
        "foo/src/main/scala/**/*.scala",
        "foo/src/main/java/**/*.java"
      ]
    ),
    scalacopts = ["-Yskip-inline-info-attribute"]
)

scala_library(
    name = "bar",
    srcs = glob(
      [
        "bar/src/main/scala/**/*.scala",
        "bar/src/main/java/**/*.java"
      ]
    ),
    deps = [":foo"],
    scalacopts = ["-Yskip-inline-info-attribute"]
)

I'm testing adding and removing private methods in package foo then bazel building bar. I expected that with the ijar machinery in place bar wouldn't require a rebuild since it doesn't/cant reference foo's private parts.

Do you have an examples you can point to help demonstrate how to make this work?

@softprops
Copy link
Contributor

I also tried the private[this] on vals, which i understand forces scalac not to synthesize a method interface for a private field, to no avail.

@johnynek
Copy link
Member Author

johnynek commented Feb 11, 2017 via email

@johnynek
Copy link
Member Author

johnynek commented Feb 11, 2017 via email

@softprops
Copy link
Contributor

ok cool I'll watch this issue and post back if I find updates. Just a final note because Im still newish to bazel and what happens when I run bazel build ...

I confirmed that I got different md5 checksums on my bazel-out/local-fastbuild/bin/foo_ijar.jar did change when I added/removed new private members and so forth. So I don't think its an issue with scala_rules in general, it does the right thing when ijars change. Maybe just a recipe for dealing with scalasignatures we need to find.

@johnynek
Copy link
Member Author

johnynek commented Feb 11, 2017 via email

@softprops
Copy link
Contributor

checksum of the ijar stays consistent when I modify the contents of the private methods but that still triggers some action which looks like a recompile (not positive) because it takes about 6 seconds . Still new to bazel so I'm not sure what that could be.

@johnynek
Copy link
Member Author

johnynek commented Feb 11, 2017 via email

@ianoc
Copy link
Contributor

ianoc commented Feb 11, 2017 via email

@softprops
Copy link
Contributor

Six seconds seems like a long time, if it's not private code is it
possible for you to put a public code/ repo up somewhere as a test case ?

Testing with an private company best-practices repo to show our team some examples of how to build their mixed scala/java projects with bazel. I'll try to put up a simple public example sometime this weekend.

@johnynek
Copy link
Member Author

johnynek commented Feb 11, 2017

Okay, I did some digging.

In the ijar, the private methods don't appear (neither private nor private[this]) in the .class bytecode, but the scala signature does change relative to not having the method at all. Scalap can show this:

st-oscar1:compare oscar$ scalap -private scala.test.ScalaLibResources
package scala.test
object ScalaLibResources extends scala.AnyRef {
  def this() = { /* compiled code */ }
  private def foo: scala.Int = { /* compiled code */ }
  def getGreetings(): scala.List[scala.Predef.String] = { /* compiled code */ }
  def baz: scala.Int = { /* compiled code */ }
  def getFarewells: scala.List[scala.Predef.String] = { /* compiled code */ }
  def getData: scala.List[scala.Predef.String] = { /* compiled code */ }
  private def get(s: scala.Predef.String): scala.List[scala.Predef.String] = { /* compiled code */ }
}

st-oscar1:compare oscar$ scalap -private scala_no_private.test.ScalaLibResources
package scala.test
object ScalaLibResources extends scala.AnyRef {
  def this() = { /* compiled code */ }
  def getGreetings(): scala.List[scala.Predef.String] = { /* compiled code */ }
  def baz: scala.Int = { /* compiled code */ }
  def getFarewells: scala.List[scala.Predef.String] = { /* compiled code */ }
  def getData: scala.List[scala.Predef.String] = { /* compiled code */ }
  private def get(s: scala.Predef.String): scala.List[scala.Predef.String] = { /* compiled code */ }
}

So, scala embeds information about private methods into the scala signature.

That means to improve on matters, we need a scala-specific ijar (or probably better, extension to the bazel ijar that parses the scalasignature, removes any info about private methods, and puts it back). Here is a page that shows some demos of parsing it:
http://blog.vityuk.com/2010/12/how-scala-compiler-stores-meta.html

@gkossakowski do you have any input you can share with us here? Should this be hard or easy? It seems like is should be straightforward enough.

@johnynek
Copy link
Member Author

cc @adriaanm any suggestions on how to produce an interface jar for scala (a jar of bytecode that only has the non-private interfaces, and none of the code). Using bazel's ijar produces an interface, but the scalasig is left untouched, so the hash changes, so adding a private method causes a recompilation.

@johnynek
Copy link
Member Author

@softprops are you using the --strategy=Scalac=worker option? I doubt it if it is taking 6 seconds on a trivial change. You can add that to your .bazelrc like:

build --strategy=Scalac=worker

@johnynek
Copy link
Member Author

I wonder if it could be as easy just removing any item with the private flag set:
https://github.com/avityuk/scala-pickled-visualizer/blob/master/src/PickledVisualizer.scala#L377

@ianoc
Copy link
Contributor

ianoc commented Feb 12, 2017 via email

@johnynek
Copy link
Member Author

I imagine, the compiler can't inline across package boundaries if we take this approach (so you have to rely on the JIT in such cases). That may be a tradeoff worth making if the JIT usually does well enough and maybe a lot of inlining is from the std-lib or from within the package.

@ianoc
Copy link
Contributor

ianoc commented Feb 12, 2017 via email

@johnynek
Copy link
Member Author

johnynek commented Feb 12, 2017

We don't ijar the standard library or compiler (or anything with macros), and of course, within a target, you are not using an ijar. But other than that, yeah, we are probably killing the ability for scalac to inline.

@ianoc
Copy link
Contributor

ianoc commented Feb 12, 2017 via email

@johnynek
Copy link
Member Author

I think you can't compile scala without it. For instance, the implicit labels are probably the biggest things, but other aspects of the type-system are in there. Dropping everything, I think, would be pretty bad news.

@ianoc
Copy link
Contributor

ianoc commented Feb 12, 2017 via email

@softprops
Copy link
Contributor

Wasn't using any nondefault flags. I wonder if there's and knowledge that can be minded from sbts analyzing compiler. It seems a bit more knowledgeable about how to detect scala public api changes

@johnynek
Copy link
Member Author

johnynek commented Feb 12, 2017 via email

@ianoc
Copy link
Contributor

ianoc commented Feb 12, 2017 via email

@johnynek
Copy link
Member Author

johnynek commented Feb 12, 2017 via email

@softprops
Copy link
Contributor

Do you think that would be probable those changes would be landable in the 2.11.x line of scala? I'm targeting bazel as a viable alternative to sbt for a monorepo for the potential improved performance and correctness without changing existing scala 2.11 runtime dependencies. My goal is the prove this out by dark building in ci and drawing comparisons to our existing sbt build.

@johnynek
Copy link
Member Author

johnynek commented Feb 12, 2017 via email

@softprops
Copy link
Contributor

Ok I talked to my team and there wasn't really anything private in here so I just flipped the GH private repo flag. Here's the repo I'm testing with https://github.com/meetup/blt-best-bazel. There are two packages foo and bar. Bar doesn't do anything interesting. It just exists so I can test bazel's behavior when I build it after changing fiddling with non public members in the foo package.

I'm fiddling with those here

here's how the dance currently goes ( on osx )

$ bazel build //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 3.436s, Critical Path: 3.34s

# no change (its fast of course!)
$ bazel build //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 0.079s, Critical Path: 0.00s

# fiddle with private members of foo package
$ bazel build //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 6.038s, Critical Path: 5.96s

I reread some of the comments above last night and an getting a better sense of the scala-specifc problem. The public interfaces indeed doesn't change so ijar is doing what it advertises. By scala encodes enriched type interface ( including private interfaces ) metadata as bytearray in a @ScalaSig annotation attached to the public interfaces which invalidates the cache.

Some options on the table are to some how edit that bytearray, maybe with PickBuffers write methods and deserialize those definitions and store that in the ijar. This would be very cool. ScalaSig tampering. If there were a lib for that I'd call it forgery!

The other option is to get scalac to do all the work up front and not use ijar and use a scalaspecific form that uses public interfaces scalac may emit for us.

Let me know if my understanding is on the right path. This is a very interesting problem but one I think more scala shops will appreciate a good solution for when adopting bazel

@johnynek
Copy link
Member Author

Yes. This seems like a good summary.

One pattern we use is to define a repo-local set of the scala rules (that call the standard one) where we can change some default options (like compiler plugins and compiler flags):
in local_tools/scala/scala.bzl

load("@io_bazel_rules_scala//scala:scala.bzl",
     uppstream_lib = "scala_library",
     uppstream_macro = "scala_macro_library",
     uppstream_bin = "scala_binary",
     uppstream_test = "scala_test")

_default_scalac_opts = [
    "-Yskip-inline-info-attribute",  # without this minor changes almost always trigger recomputations
    "-Ywarn-dead-code",
    "-Ywarn-unused-import",  # macros can cause this to kill the compiler, see: https://github.com/scala/pickling/issues/370
    "-Ywarn-value-discard",
    "-Xmax-classfile-name", "128",  # Linux laptops don't like long file names
    "-Xlint",
    "-Xfuture",
    "-Xfatal-warnings",  # sometimes disabled due to https://issues.scala-lang.org/browse/SI-9673 on stubs
    "-deprecation",
    "-feature",
    "-unchecked",
    "-Yno-stub-warning",  # This is a custom flag we added (see WORKSPACE)
]

# We use the linter: https://github.com/HairyFotr/linter
# This is disabled by default since it has a large performance overhead.
_plugins = []  # "@org_psywerx_hairyfotr_linter_2_11//jar:file"]


def scala_library(name, srcs = [], deps = [], runtime_deps = [], data = [], resources = [],
                  scalacopts = _default_scalac_opts, jvm_flags = [], main_class = "", exports = [], visibility = None):
    uppstream_lib(name = name, srcs = srcs, deps = deps, runtime_deps = runtime_deps,
                  plugins = _plugins,
                  resources = resources, scalacopts = scalacopts,
                  jvm_flags = jvm_flags, main_class = main_class, exports = exports, visibility = visibility,
                  print_compile_time=True)

etc....

This way, we can put skip inline attribute in one place.

Also, I can't stress enough what a big performance difference the worker strategy makes.

@softprops
Copy link
Contributor

Oh nice!

@johnynek
Copy link
Member Author

I wonder if this is as easy as right here:
https://github.com/scala/scala/blob/2.11.x/src/compiler/scala/tools/nsc/symtab/classfile/Pickler.scala#L45

adding a check that the tree.symbol.hasFlag(PRIVATE) and if it does, skip it. I might try that tomorrow and see if that can solve the issue.

@ianoc
Copy link
Contributor

ianoc commented Feb 12, 2017 via email

@softprops
Copy link
Contributor

I might try that tomorrow and see if that can solve the issue.

Awesome. You folks are seriously the best.

@softprops
Copy link
Contributor

Last note for the weekend. --strategy=Scalac=worker indeed makes a huge difference! Thanks @johnynek

$ bazel build  --strategy=Scalac=worker //bar
WARNING: Sandboxed execution is not supported on your system and thus hermeticity of actions cannot be guaranteed. See http://bazel.build/docs/bazel-user-manual.html#sandboxing for more information. You can turn off this warning via --ignore_unsupported_sandboxing.
INFO: Found 1 target...
INFO: From scala //foo:foo:
Compiler runtime: 250ms.
INFO: From scala //bar:bar:
Compiler runtime: 162ms.
Target //bar:bar up-to-date:
  bazel-bin/bar/bar.jar
INFO: Elapsed time: 1.215s, Critical Path: 1.11s

@gkk-stripe
Copy link
Contributor

Hi guys,
Chiming in on your discussions of ScalaSig and private. Sometimes you need a private member to be part of your interface. E.g. private var in a trait has to be kept in the interface, see for details:
sbt/sbt#2155
(the discussion is about sbt but exactly the same issue would pop up if bazel's ijar excluded private var defined in a trait from the interface)
I think, except for private var in a trait, you're safe to skip all private members in interface.

Scalac produces a number of synthetic members (e.g. accessors for nested classes, holders for default arguments) that are public but should not be included in the interface.

The issues you're wresting are the same sbt/zinc needed to solve. I encourage for you to borrow ideas and code from:
https://github.com/sbt/zinc/blob/1.0/internal/compiler-bridge/src/main/scala/xsbt/ExtractAPI.scala

Zinc solves two problems for incremental compilation:

  1. Dependency graph extraction
  2. Interface extraction

Bazel has a different approach to 1. but 2. is shared between bazel and zinc as problem to tackle. Trying to solve 2. from scratch by surgery to ScalaSig is doomed to be an epic amount of work.

Lastly, there have discussion about integrating ExtractAPI back to scalac because that code is very tied to internals of scalac and we had a feeling that other tools than just zinc could make use of the info produced by ExtractAPI. This discussion confirms that impression. I think it's realistic to integrate ExctractAPI into scalac in 2.12.x as an experimental feature and have it stable in 2.13. Scala 2.11 is long in maintenance mode so nobody from Scala team wants to see any new development there.

@gkk-stripe
Copy link
Contributor

As a bonus, I'm throwing a fun issue that you would have to solve yourself if you go down of manually tweaking scalaSig: sbt/sbt#823
There're more such subtle issues that I can't recall off the top of head.

@johnynek
Copy link
Member Author

Thanks for replying in detail @gkk-stripe.

I saw this comment:
sbt/sbt#2441 (comment)

Is that still your understanding? I wonder if the conservative approach might be to just not write private defs into the scalasig in the first place. That seems like it will cover most of the cases that matter.

@gkk-stripe
Copy link
Contributor

Scalasig is used for recreating faithfully the state of a symbol across compiler runs (so it supports separate compilation). Consider an example like this:

// A.scala
class A

// B.scala
class B extends A

Scalasig pickles all information about A so you get the same results from either running:

scalac A.scala B.scala

or

scalac A.scala && scalac B.scala

Skipping even private defs would break that property. Scalasig serves many clients that would break in subtle ways:

  • scala reflection: you wouldn't see private members of a class/trait
  • implicit search gets affected because private implicit defs are still considered to be a member but only later get excluded for not being accessible. In other words: membership and accessibility are orthogonal concerns in Scala
  • Scala inliner used to rely on looking up private members for inlining (I'm not sure if it still does it in Scala 2.12, though)

In all these cases you would get different results from either a clean build or an incremental build. This would break the main selling point of bazel that you don't need ever to clean manually.

@johnynek
Copy link
Member Author

So, @gkk-stripe and I spoke offline.

The TL;DR is:

  1. currently, bazel is correct, and fairly fast, but invalidates the cache when there is ANY scala signature change (adding/removing any method, changing any type, etc...).
  2. as Grzegorz says, private in scala is pretty subtle (for instance, in 2.11 at least, he says a private can shadow a non-private, making a name inaccessible, so removing a private does not give the same compilation).
  3. a way forward would be to see about adding a scalac option to emit interface files. The contract of the interface file is that should be the minimal file against which we can compile and get the same output as compiling against a class/jar representation of the same dependency. The interface should be suitable for hashing as a key into a cache (i.e. it should be deterministic: if two interfaces are the same, they should be bit-for-bit identical). It would be nice for these interface files to be independent of platform (i.e. in principle work for scaja-js, scala-native).

We are thinking of a SIP proposal for item 2 above. We will update this issue with links to any discussions.

@softprops
Copy link
Contributor

Sounds grrrreat

@SethTisue
Copy link

SethTisue commented Feb 14, 2017

a way forward would be to see about adding a scalac option to emit interface files.

see also http://docs.scala-lang.org/sips/pending/binary-compatibility.html. this was just discussed at the SIP meeting today and seems likely to move forward in some form. cc @DarkDimius @jvican

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants