Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of 'mc' synthetic methods is not deterministic #17330

Closed
raboof opened this issue Apr 24, 2023 · 23 comments · Fixed by #18210
Closed

Order of 'mc' synthetic methods is not deterministic #17330

raboof opened this issue Apr 24, 2023 · 23 comments · Fixed by #18210

Comments

@raboof
Copy link
Contributor

raboof commented Apr 24, 2023

Compiler version

Tested with 3.2.2 and 3.3.1-RC1-bin-20230422-49879ac-NIGHTLY

Minimized code

import scala.collection.immutable

case object EmptyImmutableSeq extends immutable.Seq[Nothing] {
  override final def iterator = Iterator.empty
  override final def apply(idx: Int): Nothing = throw new java.lang.IndexOutOfBoundsException(idx.toString)
  override final def length: Int = 0
}

Output

Compiling this object several time and looking at the compiled output, the order of long apply$mcJI$sp(int), long apply$mcJJ$sp(long), long apply$mcJF$sp(float) etc is not deterministic.

Expectation

I would expect the order to be the same each time.

Making the compiler output deterministic helps quickly verifying two independently-built executables are equivalent, which can be a useful security property. It also may help with cache performance and for making it easier to find the 'meaningful' differences when comparing two class files that have more differences. See also Reproducible Builds and #7661 .

@raboof raboof added itype:bug stat:needs triage Every issue needs to have an "area" and "itype" label labels Apr 24, 2023
@jchyb
Copy link
Contributor

jchyb commented Apr 24, 2023

Hi @raboof, I'm afraid I am not able to reproduce this issue. Running javap on EmptyImmutableSeq$.class I always get those in the same order:

  public int apply$mcII$sp(int);
  public int apply$mcIJ$sp(long);
  public int apply$mcIF$sp(float);
  public int apply$mcID$sp(double);
  public double apply$mcDI$sp(int);
  public double apply$mcDJ$sp(long);
  public double apply$mcDF$sp(float);
  public double apply$mcDD$sp(double);
  public long apply$mcJI$sp(int);
  public long apply$mcJJ$sp(long);
  public long apply$mcJF$sp(float);
  public long apply$mcJD$sp(double);
  public void apply$mcVI$sp(int);
  public void apply$mcVJ$sp(long);
  public void apply$mcVF$sp(float);
  public void apply$mcVD$sp(double);
  public boolean apply$mcZI$sp(int);
  public boolean apply$mcZJ$sp(long);
  public boolean apply$mcZF$sp(float);
  public boolean apply$mcZD$sp(double);
  public float apply$mcFI$sp(int);
  public float apply$mcFJ$sp(long);
  public float apply$mcFF$sp(float);
  public float apply$mcFD$sp(double);

Same with EmptyImmutableSeq.class, where I always get:

  public static double apply$mcDD$sp(double);
  public static double apply$mcDF$sp(float);
  public static double apply$mcDI$sp(int);
  public static double apply$mcDJ$sp(long);
  public static float apply$mcFD$sp(double);
  public static float apply$mcFF$sp(float);
  public static float apply$mcFI$sp(int);
  public static float apply$mcFJ$sp(long);
  public static int apply$mcID$sp(double);
  public static int apply$mcIF$sp(float);
  public static int apply$mcII$sp(int);
  public static int apply$mcIJ$sp(long);
  public static long apply$mcJD$sp(double);
  public static long apply$mcJF$sp(float);
  public static long apply$mcJI$sp(int);
  public static long apply$mcJJ$sp(long);
  public static void apply$mcVD$sp(double);
  public static void apply$mcVF$sp(float);
  public static void apply$mcVI$sp(int);
  public static void apply$mcVJ$sp(long);
  public static boolean apply$mcZD$sp(double);
  public static boolean apply$mcZF$sp(float);
  public static boolean apply$mcZI$sp(int);
  public static boolean apply$mcZJ$sp(long);

For the record, I also tried comparing those via a diff between .class files from multiple different builds, all cleaned between recompilations. If there is something I'm missing here, please tell me and I'll try again

@raboof
Copy link
Contributor Author

raboof commented Apr 24, 2023

Interesting. I definitely get different ones, for example:

  public long apply$mcJI$sp(int);
  public long apply$mcJJ$sp(long);
  public long apply$mcJF$sp(float);
  public long apply$mcJD$sp(double);
  public boolean apply$mcZI$sp(int);
  public boolean apply$mcZJ$sp(long);
  public boolean apply$mcZF$sp(float);
  public boolean apply$mcZD$sp(double);
  public int apply$mcII$sp(int);
  public int apply$mcIJ$sp(long);
  public int apply$mcIF$sp(float);
  public int apply$mcID$sp(double);
  public float apply$mcFI$sp(int);
  public float apply$mcFJ$sp(long);
  public float apply$mcFF$sp(float);
  public float apply$mcFD$sp(double);
  public void apply$mcVI$sp(int);
  public void apply$mcVJ$sp(long);
  public void apply$mcVF$sp(float);
  public void apply$mcVD$sp(double);
  public double apply$mcDI$sp(int);
  public double apply$mcDJ$sp(long);
  public double apply$mcDF$sp(float);
  public double apply$mcDD$sp(double);

vs

  public float apply$mcFI$sp(int);
  public float apply$mcFJ$sp(long);
  public float apply$mcFF$sp(float);
  public float apply$mcFD$sp(double);
  public int apply$mcII$sp(int);
  public int apply$mcIJ$sp(long);
  public int apply$mcIF$sp(float);
  public int apply$mcID$sp(double);
  public boolean apply$mcZI$sp(int);
  public boolean apply$mcZJ$sp(long);
  public boolean apply$mcZF$sp(float);
  public boolean apply$mcZD$sp(double);
  public void apply$mcVI$sp(int);
  public void apply$mcVJ$sp(long);
  public void apply$mcVF$sp(float);
  public void apply$mcVD$sp(double);
  public long apply$mcJI$sp(int);
  public long apply$mcJJ$sp(long);
  public long apply$mcJF$sp(float);
  public long apply$mcJD$sp(double);
  public double apply$mcDI$sp(int);
  public double apply$mcDJ$sp(long);
  public double apply$mcDF$sp(float);
  public double apply$mcDD$sp(double);

I'm building with SBT, I haven't tried with the standalone Scala compiler yet. I've shared the project (and 2 class files from separate compilations) at https://codeberg.org/raboof/scala-3-reproducible-apply-mc-forwarders

@raboof
Copy link
Contributor Author

raboof commented Apr 24, 2023

Invoking scalac (testing with 3.2.2) directly I did see it once:

$ scalac EmptyImmutableSeq.scala Other.scala 
$ mkdir 1
$ mv *.class 1
$ scalac Other.scala EmptyImmutableSeq.scala 
$ diff EmptyImmutableSeq\$.class 1/EmptyImmutableSeq\$.class 
Binary files EmptyImmutableSeq$.class and 1/EmptyImmutableSeq$.class differ

... but not consistently: it's giving the same order (a different one from the ones posted above...) each time now. (I added that Order.scala to see if that'd trigger the problem, but that's not it)

@dwijnand
Copy link
Member

Changing (or adding or removing) files could affect this, but does that spec expect that not to affect the output?

We store the filenames in a global names map (for reasons unknown to me), which impacts the hash order of later names, which can impact method iteration order. Some derivative operations handle this, but not all.

@jchyb jchyb added area:backend area:classpath and removed stat:cannot reproduce stat:needs triage Every issue needs to have an "area" and "itype" label labels Apr 26, 2023
@raboof
Copy link
Contributor Author

raboof commented May 3, 2023

Changing (or adding or removing) files could affect this, but does that spec expect that not to affect the output?

I expect the spec doesn't dictate any particular ordering, but it would still be good if the compiler produced a stable ordering.

It's interesting to note that the static forwarders do explicitly get a stable ordering (https://github.com/lampepfl/dotty/blob/main/compiler/src/dotty/tools/backend/jvm/BCodeHelpers.scala#L489). I'm not sure where the non-forwarder method iteration order gets determined, but it's interesting that so far it seems it's only the apply$mcXX$sp methods that are unstable: I don't think I've seen any changes in ordering of other methods.

@mdedetrich
Copy link

mdedetrich commented May 30, 2023

I would like to provide some additional context here. This feature is particularly useful for projects that are part of the ASF i.e. Apache Software Foundation which in our case is Pekko but this would also apply to any Scala project in the ASF.

Currently in order to make a release in the ASF we have to do it manually, that is someone on a local machine (technically speaking owner controlled hardware) needs to produce and publish the artifacts (typically +publishSigned). While for trivial projects this is usually not an issue (at most you might need to mandate a specific JDK version) in Pekko's case specifically the build for our project is non trivial because we have to have multiple JDK's installed and as you can see from the discussion at https://lists.apache.org/thread/zz77bc6tjhdp68cojj6vksrmlqqvn7ln this is not exactly an ideal situation, wrt allowing all the possible OS's being able to build Pekko.

Ideally we would love to be able to just use github actions CI to create release artifacts which is what is typical with almost all Scala OS projects (i.e. push a git tag which triggers a build that pushes to Maven). This has already been setup for snapshots in Pekko however due to ASF processes we cannot do this for actual releases. There are many reasons why (i.e. one is git tags being mutable but this is now hopefully resolved with git tag protection rules however the other main blocker is reproducible builds. More specifically when RC's are published by the CI, other release managers need to confirm that the build can be reproduced by matching the hashes to a build that they create locally to confirm there hasn't been any tampering/mistakes in the artifacts generated by the CI. While other Scala OS projects may not have as strict requirements, they would nonetheless benefit from a such a feature if they desire.

Onto the technicals, while I understand there may be arguments against doing this for Scala 3 (i.e. not part of the specification, potential performance implications, etc etc) I believe this can be reasonably resolved by putting it behind flag (i.e. -reproducible-build?), then its quite simply to provide this flag in CI. In fact you can make you can argue that its non controversial to say that language compilers in general should provide reproducible builds for the reasons stated earlier (Scala or otherwise).

@sjrd
Copy link
Member

sjrd commented May 30, 2023

In my book, for a compiler, stability of the output is a standard feature. I don't think it's controversial that we should fix this.

@mdedetrich
Copy link

In my book, for a compiler, stability of the output is a standard feature. I don't think it's controversial that we should fix this.

Perfect, it would also be fantastic if this would be back-ported to the Scala 3.3 LTS series since that is what most of the Scala OS libraries will target (Pekko inclusive) and this would mean we can update the Scala version without breaking users.

@sjrd
Copy link
Member

sjrd commented May 30, 2023

Bug fixes that are forward and backward source and binary compatible will be backported to LTS. That's the definition of LTS. Things that don't follow that policy won't.

@mdedetrich
Copy link

mdedetrich commented May 30, 2023

Bug fixes that are forward and backward source and binary compatible will be backported to LTS. That's the definition of LTS. Things that don't follow that policy won't.

LTS stands for Long Term Support which can mean different things to different projects, i.e. in context of Scala 3 it could have been looser in regards to which forwards/backwards compatible bug fixes would be backported but its good to know this is the definition!

@sjrd
Copy link
Member

sjrd commented May 30, 2023

I meant that it is our (Scala 3's) definition of LTS. See https://virtuslab.com/blog/the-scala-3-compatibility-story/ for the most recently published details.

@mdedetrich
Copy link

I meant that it is our (Scala 3's) definition of LTS. See https://virtuslab.com/blog/the-scala-3-compatibility-story/ for the most recently published details.

Yup, it's all clear. Thanks for clarifying.

@jrudolph
Copy link
Contributor

jrudolph commented Jul 3, 2023

The slight confusion here is that the problem is not the forwarders (= static methods added to the mirror class without the $ and not the module class). These are generated just fine and in order just as the code shows.

The problem is the ordering of the specialized methods in the module itself.

@raboof raboof changed the title Order of 'mc' synthetic forwarder methods is not deterministic Order of 'mc' synthetic methods is not deterministic Jul 3, 2023
@dwijnand
Copy link
Member

dwijnand commented Jul 3, 2023

Changing (or adding or removing) files could affect this, but does that spec expect that not to affect the output?

I expect the spec doesn't dictate any particular ordering, but it would still be good if the compiler produced a stable ordering.

Sorry, I didn't mean the Scala spec, I meant the Reproducible Builds spec: does it not expect that changing the order of inputs changes the output? Run scalac EmptyImmutableSeq.scala Other.scala over and over, and you'll get the same output - same with scalac Other.scala EmptyImmutableSeq.scala. But those two outputs might be different.

@mdedetrich
Copy link

mdedetrich commented Jul 3, 2023

Sorry, I didn't mean the Scala spec, I meant the Reproducible Builds spec: does it not expect that changing the order of inputs changes the output? Run scalac EmptyImmutableSeq.scala Other.scala over and over, and you'll get the same output - same with scalac Other.scala EmptyImmutableSeq.scala. But those two outputs might be different.

If the ordering of the inputs to scalac is not important to the correctness of the build it appears that the simplest solution to this is to just order the cli args before feeding it into scalac?

@dwijnand
Copy link
Member

dwijnand commented Jul 3, 2023

Correct. So it might be the case that this is about how the build tool discovers and passes those files to the compiler - in a way that is cross-OS compatible.

I'm mostly being dotc's lawyer here - but I'm not against just sorting them like we do for the static forwarders. I'm even more of a fan of someone looking at whether we can stop adding the filenames in a global names and avoid that impact.

@raboof
Copy link
Contributor Author

raboof commented Jul 3, 2023

Sorry, I didn't mean the Scala spec, I meant the Reproducible Builds spec: does it not expect that changing the order of inputs changes the output?

Aah, right: indeed the Reproducible Builds spec just says "building the same source should produce the same output". Indeed it would be totally acceptable for dotc to only produce the same output when invoked with the same parameters in the same order. However, then for the whole sbt build to be reproducible it would shift the responsibility to sbt to always invoke the compoiler with the same parameters in the same order.

Run scalac EmptyImmutableSeq.scala Other.scala over and over, and you'll get the same output - same with scalac Other.scala EmptyImmutableSeq.scala. But those two outputs might be different.

That is actually not the case, though: I just ran scalac EmptyImmutableSeq.scala twice and got different orders.

@jrudolph
Copy link
Contributor

jrudolph commented Jul 3, 2023

The problem is the ordering of the specialized methods in the module itself.

I.e. the mixin forwarders are created in a different order.

Here is a simpler reproducer:

class Test extends (Int => Long) {
  override def apply(v1: Int): Long = 5L
}

@jrudolph
Copy link
Contributor

jrudolph commented Jul 3, 2023

The undefined order comes from SpecializeApplyMethods which uses Definitions.Function1SpecializedReturnTypes etc which are sets.

The non-determinism comes from Definitions.IntType and the other primitive types to use identityHashCode.

@jrudolph
Copy link
Contributor

jrudolph commented Jul 3, 2023

So, one solution would be to sort in SpecializeApplyMethods when generating those methods.

@raboof
Copy link
Contributor Author

raboof commented Oct 19, 2023

Bug fixes that are forward and backward source and binary compatible will be backported to LTS. That's the definition of LTS. Things that don't follow that policy won't.

Is there anything I can/should do to make that happen or should it be 'automatic'?

@sjrd
Copy link
Member

sjrd commented Oct 19, 2023

It's been added to https://github.com/orgs/lampepfl/projects/6, so it's going to be "automatic".

@raboof
Copy link
Contributor Author

raboof commented Oct 19, 2023

❤️ thanks! (that project appears to be private btw - but I believe you ;) )

Kordyjan pushed a commit that referenced this issue Dec 1, 2023
raboof added a commit to raboof/incubator-pekko-http that referenced this issue Jan 31, 2024
Follow-up on apache#377

Should get us the fix for scala/scala3#17330
raboof added a commit to raboof/incubator-pekko-grpc that referenced this issue Jan 31, 2024
Follow-up on apache#195

Should get us the fix for scala/scala3#17330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants