Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraalVM: explore flipping the default of --allow-incomplete-classpath #43409

Open
Sanne opened this issue Sep 20, 2024 · 7 comments
Open

GraalVM: explore flipping the default of --allow-incomplete-classpath #43409

Sanne opened this issue Sep 20, 2024 · 7 comments
Labels

Comments

@Sanne
Copy link
Member

Sanne commented Sep 20, 2024

Description

The GraalVM native-image tool can take an option --allow-incomplete-classpath which changes how it deals with unresolved symbols found during reachability analisys.
Normally, when running an application on a regular JVM, code can refer to optional dependencies and the application can still boot; if the code is triggered at runtime and refers to a class which can't be loaded, a CNFE is thrown - but in native-image, when this flag is not set, resolution is validated at build time.

In the past we did like the native-image approach, for several reasons:

  • it helps validate the closed world application is fully valid and defined
  • setting --allow-incomplete-classpath was having some additional odd side-effects, such as making reachability reports more confusing (this was most likely a bug).

In hotspot it wouldn't have been reasonable to have a similar validation, as it needs to support non-closed-world use cases. Essentially I (personally) considered it a "nice feature" of native-image, however it comes with problems: some libraries actually do rely on optional dependencies and don't use a clean decoupling mechanism like ServiceLoader. I have been hoping that the native-image approach would encourage library maintainers to use the ServiceLoader mechanism, but it's been some years and we still have problems with at least two notable dependencies:

  • the Oracle JDBC driver
  • the MSSQL JDBC driver

Both of these libraries have several optional dependencies; in the case of the MSSQL driver we patch this with subsitutions but this causes problems for people who actually want to use the optional capabilities. In the case of the Oracle driver we can't do much as it's closed source - currently the use of this driver triggers the use of --allow-incomplete-classpath but this has global impact, so it raises quality concerns as this mode is less so tested in combination with other extensions.

I've discussed this pending problem with @christianwimmer @jerboaa @maxandersen today and we came to the conclusion that perhaps native-image should default to the same semantics as hotspot: to have the --allow-incomplete-classpath set by default. The "odd side effects" that we had observed in early days are very likely no longer applicable, as it's been years since that time.

The proposal here is to set this flag by default in Quarkus, to verify that it doesn't break any other extension and has no other surprises; based on our feedback, native-image might also flip its default.

This would resolve the immediate issues with those JDBC drivers: we wouldn't need substitutions in MSSQL, the Oracle driver wouldn't be setting an untested global flag, and possibly other libraries would benefit as well as it brings us semantics closer to hotspot.

We would still be able to revert the behaviour, which has been useful to verify consistency at build time.

Implementation ideas

  1. Set --allow-incomplete-classpath by default in when Quarkus runs a native compilation
  2. Test it all, provide feedback to the GraalVM team
  3. Document suggestions about how to potentially revert the behaviour (might not be trivial as we'd need to remove a flag we set)
@Sanne Sanne added kind/enhancement New feature or request area/native-image labels Sep 20, 2024
@yrodiere
Copy link
Member

Makes sense to me, though I wasn't present for these "long-ago problems". In any case +1 for consistency between native and JVM.

We would still be able to revert the behaviour, which has been useful to verify consistency at build time.

So essentially, flip the default in general, but override it in some integration tests that are known to not need it, just to be sure? Seems nice.

@Sanne
Copy link
Member Author

Sanne commented Sep 20, 2024

So essentially, flip the default in general, but override it in some integration tests that are known to not need it, just to be sure?

Exactly: some integration tests or when doing manual investigations.
GraalVM would also consider to make the switch eventually.

@zakkak
Copy link
Contributor

zakkak commented Sep 23, 2024

currently the use of this driver triggers the use of --allow-incomplete-classpath but this has global impact,

Related upstream issue oracle/graal#3929 asking for more granular control.

I've discussed this pending problem with @christianwimmer @jerboaa @maxandersen today and we came to the conclusion that perhaps native-image should default to the same semantics as hotspot: to have the --allow-incomplete-classpath set by default.

FWIW, --allow-incomplete-classpath is already the default in GraalVM since GraalVM/Mandrel 22.1 (~2 years ago) and we countered it by passing --link-at-build-time see #24213

That said, in order to

Set --allow-incomplete-classpath by default in when Quarkus runs a native compilation

You essentially need to avoid passing --link-at-build-time, which can be achieved through the NativeImageAllowIncompleteClasspathBuildItem.

This would resolve the immediate issues with those JDBC drivers: we wouldn't need substitutions in MSSQL,

Why are these substitutions bad? They help GraalVM eliminate unused code, while still providing a nice error message to the users. I understand there is maintenance overhead, but I don't think they are bad.

the Oracle driver wouldn't be setting an untested global flag

The Oracle driver should already not be setting that flag :)

and possibly other libraries would benefit as well as it brings us semantics closer to hotspot.

That's not necessarily a good thing.

I would suggest we drop the --link-at-build-time requirement for user code only (similar to build-time-initialization as discussed in #42682). Keeping the requirement for our extensions is a good thing IMHO.

Being notified that a class won't be reachable at run-time allows us to detect potential targets of improvement, e.g. we can most probably substitute code to avoid the broken dependency while improving dead code elimination.

Update: Another related issue #2101

@jerboaa
Copy link
Contributor

jerboaa commented Sep 23, 2024

This would resolve the immediate issues with those JDBC drivers: we wouldn't need substitutions in MSSQL,

Why are these substitutions bad? They help GraalVM eliminate unused code, while still providing a nice error message to the users. I understand there is maintenance overhead, but I don't think they are bad.

Certain substitutions are problematic since those optional dependencies solve problems for users who actually do need them. For example some feature of a DB driver that has been cut out (with substitutions). In the case of allowing incomplete classpath, the feature would be available if the optional dependencies are there at build time. On the other hand, the feature wouldn't be there even if the optional deps existed at build time (status quo).

@zakkak
Copy link
Contributor

zakkak commented Sep 23, 2024

On the other hand, the feature wouldn't be there even if the optional deps existed at build time (status quo).

That's a valid point. We could make such substitutions optional (only when the dependencies are absent) using onlyWith. Of course this will only make sense if we observe some notable difference that justifies the substitutions in the first place.

@geoand
Copy link
Contributor

geoand commented Oct 30, 2024

Related upstream issue oracle/graal#3929 asking for more granular control.

More granular control was the first thing that came to my mind when I read this issue

I would suggest we drop the --link-at-build-time requirement for user code only (similar to build-time-initialization as discussed in #42682). Keeping the requirement for our extensions is a good thing IMHO.

+1

@geoand
Copy link
Contributor

geoand commented Oct 30, 2024

On the other hand, the feature wouldn't be there even if the optional deps existed at build time (status quo).

That's a valid point. We could make such substitutions optional (only when the dependencies are absent) using onlyWith. Of course this will only make sense if we observe some notable difference that justifies the substitutions in the first place.

There is an even more advanced way of handling cases like this: We can generate a substitution class at Quarkus build time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants