-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer initialization of AlluxioMetastoreModule #12541
Defer initialization of AlluxioMetastoreModule #12541
Conversation
Is it possible to test it? |
I tested it manually, I'm not sure if it can be automated. Maybe with some convoluted product test? |
@findepi any comments? |
I'd like to know if there are other places where we could/should do something like this |
cc @willmostly |
bindMetastoreModule("thrift", () -> new ThriftMetastoreModule()); | ||
bindMetastoreModule("file", () -> new FileMetastoreModule()); | ||
bindMetastoreModule("glue", () -> new GlueMetastoreModule()); | ||
bindMetastoreModule("alluxio", () -> new AlluxioMetastoreModule()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why not AlluxioMetastoreModule::new
?
the code doesn't convey the intention, so it will be soon refactored to get rid of redundant Supplier
It's unclear why this would be sufficient and whether this is sufficient always, or depends on Java version
i think a more explicit approach would be to use reflection to load class by name.
But then, we don't want to hard-code class names in the code.
AlluxioMetastoreModule.class.getName()
doesn't initialize the class... but this isn't (much) more explicit than current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why not
AlluxioMetastoreModule::new
?
I wasn't convinced that a method reference won't try to load the class. So this is kind of defensive programming.
the code doesn't convey the intention, so it will be soon refactored to get rid of redundant Supplier
I suppose at least a comment will be necessary :)
It's unclear why this would be sufficient and whether this is sufficient always, or depends on Java version i think a more explicit approach would be to use reflection to load class by name. But then, we don't want to hard-code class names in the code.
AlluxioMetastoreModule.class.getName()
doesn't initialize the class... but this isn't (much) more explicit than current code.
I'm aware that there is no guarantee that it will serve its purpose as there are not that many things in the JVM spec we can rely on. It is kind of a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't convinced that a method reference won't try to load the class. So this is kind of defensive programming.
we don't want to add unneeded warning suppressions like @SuppressWarnings("Convert2MethodRef")
so ... this defensive style won't hold unless it's "the way"
I'm aware that there is no guarantee that it will serve its purpose as there are not that many things in the JVM spec we can rely on. It is kind of a hack.
Class loading is not well defined, but it's not undefined either.
Classes are guaranteed not to be loaded if they are not needed.
So for example, of the only reference is Class.forName(.....)
and that line isn't executed, the class is guaranteed not to be loaded.
Now, since it's really only about Alluxio module, we can do reflection-based loading for that class only, and then using Class.forName
+ plain class name as a string will be totally fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Class.forName
for AlluxioMetastoreModule + good comment.
restore the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's really only about Alluxio module
Can we make sure it is only about AlluxioMetastoreModule
. E.g. add a test which scans the class path and checks if only references to Aluxio library are coming from AlluxioMetastoreModule
(possibly transitively)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We thought about a test like this, but I'm not convinced that it deserves this much effort put into it.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
Outdated
Show resolved
Hide resolved
Why would we want to merge this if this is not solving any problem? |
The use case is somewhere between silly and paranoid, depending on your point of view: there are some users why physically delete JARs they don't use to avoid vulnerabilities therein. And then they complain that Trino doesn't start :) So that's my attempt at placating them somehow. "I tried, by the community rejected it" is one possible outcome, I guess ;) |
Is it possible to update alluxio? |
Or maybe we could make alluxio pluggable (requiring to place the jar in installation) and disabled by default. Just thinking at loud. IIRC it is not the first time we have issue like this here. |
If this PR is supposed to make peace with vulnerability scanners, it won't do that. I would assume vulnerability scanners scan all the classes, without analysis which one will get loaded. Is this PR supposed to allow deleting alluxio jar from the classpath? BTW the alluxio integration is unmaintained. We should remove it. But that's a separate story. |
They do, that's correct. So it's not about appeasing them per se.
Confirmed. The goal is to avoid loading these classes if not necessary.
Oh, I'd love to remove it :) From what I heard users also hate it. But it's still being used, so. |
bindMetastoreModule("thrift", () -> new ThriftMetastoreModule()); | ||
bindMetastoreModule("file", () -> new FileMetastoreModule()); | ||
bindMetastoreModule("glue", () -> new GlueMetastoreModule()); | ||
bindMetastoreModule("alluxio", () -> new AlluxioMetastoreModule()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Class.forName
for AlluxioMetastoreModule + good comment.
restore the rest.
0d9cac7
to
ea59d42
Compare
Addressed comments. Hopefully it's easier to swallow now :) |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
Show resolved
Hide resolved
ea59d42
to
c32ec30
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
Outdated
Show resolved
Hide resolved
Some of these modules are depending on third-party libraries (e.g. Alluxio), which may have some security vulnerabilities; they should be benign if the particular module is not enabled, but instantiation of the module class may trigger loading and initialization of some code from these libraries anyway. For this particular module we want to make some preemptive action, because it is unmaintained (though still used).
This test will only make sure that the metastore can be instantiated, like similar tests for other metastore implementations.
9216842
to
fcbdf7d
Compare
Description
Some of these modules are depending on third-party libraries (e.g. Alluxio), which may have some security vulnerabilities; they should be benign if the particular module is not enabled, but instantiation of the module class may trigger loading and initialization of some code from these libraries anyway. For this particular module we want to make some preemptive action, because it is unmaintained (though still used).
Something between a fix and improvement...
Plugins (Hive Hadoop2).
Makes it harder to exploit vulnerabilities in unused libraries.
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: