Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking implicit compiler dependencies #1459

Merged
merged 37 commits into from
Feb 13, 2023

Conversation

liucijus
Copy link
Collaborator

@liucijus liucijus commented Dec 19, 2022

Ready for review

Opening this PR for visibility and feedback.

Warning: this is WIP and very experimental and probably contains bugs. Review comments are very welcome.

Why?

Currently dep tracking in Rules Scala is implemented as compiler plugins and do not get full information which dependencies compiler has used to resolve types. All symbols which are not detected by the dependency tracking plugin are reported as unused and can break compilation if corresponding targets will be removed.
Dependencies can be grouped into three categories:

  • Strict (S) - Deps which are detected by the plugin (used in the AST or source)
  • Compiler only (I) - Deps which are not mentioned in AST/source, but needed by compiler to resolve other types
  • Unused (U) - Deps are not used in AST/source and not needed by the compiler

Currently reported unused deps are I+U, which makes dependency tracking mostly unusable, and requires complex use of unused_dependency_checker_ignored_targets.

How?

Patch Scala compiler to report which jars it has opened. Aggregate and analyze compiler and plugin reported information in custom reporter implementation.

Outputs:

  • sdeps - similar to jdeps, but with more information, eg. jar labels. Why not jdeps? Because they break some tools like Intellij and lack precise label information. If there's a need for jdeps to be produced, it's easy to add toggle in the toolchain.
  • logs buildozer commands in WARN, ERROR or off for each group. Can be configured per group.

Testing:

I've done some testing on large codebase and it seems to be working without issues. It's annoying that there are so many I deps reported, but I guess that's expected. The pros are that I'm able to clean target deps in isolation without a need for external indexes or fiddling with unused_dependency_checker_ignored_targets.
Only tested with 2.12 - patch may fail on 2.13.
sdeps can be used to build more sophisticated tools, eg. explain what are all these deps in my target.
I'm settling on S/error, I/warn, U/warn, which gives some room for changes not to break builds while large codebase is changing in multiple places.

Key design decisions

  • Patch scala/tools/nsc/symtab/SymbolLoaders.scala and prepend on the compiler classpath to get loaded jars reported.
  • Patch must be applied only if dependency tracking is enabled (scala_config(enable_compiler_dependency_tracking = True))
  • Produce sdeps (named after similar jdeps) with label, jar and usage info to be consumed by tools. Why not jdeps, because it breaks current Intellij support and does not contain label information. jdeps can be easily added later if there's a need.
  • Aggregation of compiler and analyzer plugins information and reporting happens in custom reporter

Things left out of the initial implementation

  • Silence log messages, but produce sdeps
  • Exclude/include patterns for compiler (I) deps
  • A tool to read sdeps for a target

@liucijus
Copy link
Collaborator Author

cc @ianoc

@ianoc
Copy link
Contributor

ianoc commented Jan 2, 2023

This is pretty amazing/awesome @liucijus , thanks for sharing it. I'll try get some time soon to see how it works in our code base. Having reliable unused deps tracking unlocks a bunch of fun new possibilities!

target = indirectJarToTarget.get(jar);
}

if (target.startsWith("Unknown")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up with target being null here on scala 2.11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to happen from jars from the jdk for me --
external/remote_jdk8_macos_aarch64/zulu-8.jdk/Contents/Home/jre/lib/rt.jar is triggering this, since its not a direct or indirect target

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i did go back now and just try confirm unused deps in ast mode works, its too buggy on 2.11 for us to use. But it does run. [this i believe is a known thing that its not great on 2.11 going back many years.... some day we'll get this codebase on 2.12....])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the null check just needs to be moved up a few lines, looks like you have one one right after this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ianoc, this is a good catch! I think I need to ignore jars which are not associated with labels as it is done in the analyzer plugin.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope 27f96d4 fixes the problem, but I haven't tested it yet

@ianoc
Copy link
Contributor

ianoc commented Jan 9, 2023

I've been going through this a bit trying to use it, the ast code for 2.11 (really we just need to upgrade, so i think ignore this), isn't so reliable so i did a bit of surgery locally to try ensure i could focus purely on the results of the compiler plugin.

With that the results generally that i saw seemed correct, vs our BUILD file generation tooling i ran into some stuff like:

Foo.fieldWithAList.map(_.methodB)

Our ast stuff can't see the type of _ which is really a direct dependency. so that shows up a bunch in the delta here.

I did just spend a bunch of time tracking down (i think it might be worth expanding the error messages in here, like:

  private String reportDep(Dep dep, boolean add) {
    if(add) {
      String message = "error: Target '" + dep.target + "' (via jar: ' " + dep.jar +" ') is being used by " + ops.currentTarget + " but is is not specified as a dependency, please add it to the deps.\nYou can use the following buildozer command:\n";
      String command = "buildozer 'add deps " + dep.target + "' " + ops.currentTarget;
      return message + command + "\n";
    } else {
      String message = "error: Target '" + dep.target + "' (via jar: ' " + dep.jar +" ')  is specified as a dependency to " + ops.currentTarget + " but isn't used, please remove it from the deps.\nYou can use the following buildozer command:\n";
      String command = "buildozer 'remove deps " + dep.target + "' " + ops.currentTarget;
      return message + command + "\n";
    }
  }

I had wanted to make them look like the javac ones so some of our existing error correction tooling would just work -- but i've ran into really wanting that jar information. We are currently using 3rdparty being brought in as exports on the fields. So the target mapping information we wind up building here is kinda off.

e..g.

A - B - C - D
            \
               - D
             

With the meaning A depends on B
B depends on C and D
C also depends on D.


B depends on C via field `foo`, and d via field `bar`

In effect when accessing B.foo in A i was picking up a dependency on C. Though the dependency is actually on D. This looks to be happening because C is in the transitive dependencies of B, and it itself is exporting D.

So i think some of this is a reflection more of our our dependencies are structured too here that makes it tricky. The _ case above also would pose a challenge for us in generating build files too, though it is right

@liucijus
Copy link
Collaborator Author

  private String reportDep(Dep dep, boolean add) {
    if(add) {
      String message = "error: Target '" + dep.target + "' (via jar: ' " + dep.jar +" ') is being used by " + ops.currentTarget + " but is is not specified as a dependency, please add it to the deps.\nYou can use the following buildozer command:\n";
      String command = "buildozer 'add deps " + dep.target + "' " + ops.currentTarget;
      return message + command + "\n";
    } else {
      String message = "error: Target '" + dep.target + "' (via jar: ' " + dep.jar +" ')  is specified as a dependency to " + ops.currentTarget + " but isn't used, please remove it from the deps.\nYou can use the following buildozer command:\n";
      String command = "buildozer 'remove deps " + dep.target + "' " + ops.currentTarget;
      return message + command + "\n";
    }
  }

I had wanted to make them look like the javac ones so some of our existing error correction tooling would just work -- but i've ran into really wanting that jar information. We are currently using 3rdparty being brought in as exports on the fields. So the target mapping information we wind up building here is kinda off.

Updated to your version

@liucijus
Copy link
Collaborator Author

In effect when accessing B.foo in A i was picking up a dependency on C. Though the dependency is actually on D. This looks to be happening because C is in the transitive dependencies of B, and it itself is exporting D.

Does the same behavior reproduce with both compiler deps tracking and regular AST plugin? If they act differently in regards of exported deps, then it's a regression I have introduced. I've seen similar problems in our codebase, but I'm not sure if that's exactly the same dependency situation. If it reproduces with AST plugin, a separate issue, and if possible, a test or repro would be very appreciated.

@liucijus
Copy link
Collaborator Author

Status update. I'm testing this change in production with a large codebase and so far it works flawlessly.

My next step is to rebase this PR for review and hopefully get it merged.

@liucijus liucijus force-pushed the compiler-deps-tracking branch from 2ed3a13 to 1c88ea6 Compare January 25, 2023 08:17
@liucijus liucijus marked this pull request as ready for review January 25, 2023 08:30
@liucijus liucijus requested a review from simuons as a code owner January 25, 2023 08:30
@liucijus liucijus requested a review from ianoc January 27, 2023 14:38
dependency_tracking_method):
is_strict_deps_on = strict_deps_mode != "off"
is_unused_deps_on = unused_deps_mode != "off"

need_direct_jars = is_strict_deps_on or is_unused_deps_on
need_direct_targets = is_unused_deps_on
need_direct_targets = is_unused_deps_on or is_strict_deps_on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like both need_direct_jars and need_direct_targets will always have same value. Maybe these needs to be squashed into single flag if it's needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching - simplified the code

patch = "@io_bazel_rules_scala//dt_patches:dt_compiler_%s.patch" % SCALA_MAJOR_VERSION

if SCALA_MAJOR_VERSION == "2.12":
minor_version = int(SCALA_VERSION[SCALA_VERSION.find(".", 2) + 1:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SCALA_VERSION.split(".")[2] would be cleaner? And maybe this can be moved to @io_bazel_rules_scala_config//:config.bzl to keep version parsing in single place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@simuons simuons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Some minor comments, commented code and clarifications I would like to get.

scala/private/macros/scala_repositories.bzl Show resolved Hide resolved
@@ -55,6 +56,11 @@ def _phase_dependency(
else:
strict_deps_mode = get_strict_deps_mode(ctx)

if strict_deps_always_off or not included_in_strict_deps_analysis:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify. There is a dependency between strict_deps_mode and compiler_deps_mode
If strict_deps_mode is "off" then no matter what is specified for compiler_deps_mode it will be turned "off"
Maybe it would be better to express this dependency like compiler_deps_mode == get_compiler_deps_mode(ctx) if strict_deps_mode != "off" else "off" or something similar instead of having duplicate if condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it this way because I think we will need a separate flag to control compiler deps mode in the future. Because it's a breaking change and I'm not very sure about the need yet, I left it this way.

I think I'll squash strict and compiler modes under one control statement. Eg.:

    if strict_deps_always_off or not included_in_strict_deps_analysis:
        strict_deps_mode = "off"
        compiler_deps_mode = "off"
    else:
        strict_deps_mode = get_strict_deps_mode(ctx)
        compiler_deps_mode = get_compiler_deps_mode(ctx)

scala/private/rule_impls.bzl Show resolved Hide resolved
scala/scala_toolchain.bzl Show resolved Hide resolved
$runner scala_test_test_filters
$runner test_multi_service_manifest
$runner test_override_javabin
#$runner test_disappearing_class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these tests irrelevant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed - were commented by mistake

}
}

private String reportDep(Dep dep, boolean add) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just split this method into two instead of boolean flag. It would be easier to read code where it's used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rearranged it a bit + fixed warning/error part of the message

try {
jarToTarget.put(ops.directJars[i], ops.directTargets[i]);
} catch (ArrayIndexOutOfBoundsException e) {
throw new RuntimeException(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe throw IllegalArgumentException from else clause instead of catching ArrayIndexOutOfBoundsException

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for catching

}
}

if (ops.indirectTargets.length == ops.indirectJars.length) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be guarded from situation when sizes doesn't match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

protoReporter.writeTo(Paths.get(diagnosticsFile));
}

private static class Dep {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class really needed. It looks like it maps one to one to Dependencies from proto. You could cut some conversion code. But I myself not convinced this would be proper approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with Dependencies

@liucijus liucijus requested review from simuons and removed request for ianoc February 7, 2023 09:35
@liucijus liucijus merged commit 56bfe4f into bazelbuild:master Feb 13, 2023
@liucijus liucijus deleted the compiler-deps-tracking branch February 13, 2023 07:15
@johnynek
Copy link
Member

Thanks for your work on this folks!

Side note: it would be really nice to get this upstreamed into the scala compiler this is a standard feature that other build tools could also use.

@liucijus
Copy link
Collaborator Author

Side note: it would be really nice to get this upstreamed into the scala compiler this is a standard feature that other build tools could also use.

On my todo list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants