Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer initialization of AlluxioMetastoreModule #12541

Merged
merged 2 commits into from
May 26, 2022

Conversation

ksobolew
Copy link
Contributor

@ksobolew ksobolew commented May 25, 2022

Description

Some of these modules are depending on third-party libraries (e.g. Alluxio), which may have some security vulnerabilities; they should be benign if the particular module is not enabled, but instantiation of the module class may trigger loading and initialization of some code from these libraries anyway. For this particular module we want to make some preemptive action, because it is unmaintained (though still used).

Is this change a fix, improvement, new feature, refactoring, or other?

Something between a fix and improvement...

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Plugins (Hive Hadoop2).

How would you describe this change to a non-technical end user or system administrator?

Makes it harder to exploit vulnerabilities in unused libraries.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label May 25, 2022
@ksobolew ksobolew requested a review from kokosing May 25, 2022 12:43
@kokosing
Copy link
Member

Is it possible to test it?

@ksobolew
Copy link
Contributor Author

Is it possible to test it?

I tested it manually, I'm not sure if it can be automated. Maybe with some convoluted product test?

kokosing
kokosing previously approved these changes May 25, 2022
@kokosing
Copy link
Member

@findepi any comments?

@ksobolew
Copy link
Contributor Author

I'd like to know if there are other places where we could/should do something like this

@findepi
Copy link
Member

findepi commented May 25, 2022

cc @willmostly

bindMetastoreModule("thrift", () -> new ThriftMetastoreModule());
bindMetastoreModule("file", () -> new FileMetastoreModule());
bindMetastoreModule("glue", () -> new GlueMetastoreModule());
bindMetastoreModule("alluxio", () -> new AlluxioMetastoreModule());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why not AlluxioMetastoreModule::new?

the code doesn't convey the intention, so it will be soon refactored to get rid of redundant Supplier

It's unclear why this would be sufficient and whether this is sufficient always, or depends on Java version
i think a more explicit approach would be to use reflection to load class by name.
But then, we don't want to hard-code class names in the code.
AlluxioMetastoreModule.class.getName() doesn't initialize the class... but this isn't (much) more explicit than current code.

cc @electrum @losipiuk @dain

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why not AlluxioMetastoreModule::new?

I wasn't convinced that a method reference won't try to load the class. So this is kind of defensive programming.

the code doesn't convey the intention, so it will be soon refactored to get rid of redundant Supplier

I suppose at least a comment will be necessary :)

It's unclear why this would be sufficient and whether this is sufficient always, or depends on Java version i think a more explicit approach would be to use reflection to load class by name. But then, we don't want to hard-code class names in the code. AlluxioMetastoreModule.class.getName() doesn't initialize the class... but this isn't (much) more explicit than current code.

I'm aware that there is no guarantee that it will serve its purpose as there are not that many things in the JVM spec we can rely on. It is kind of a hack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't convinced that a method reference won't try to load the class. So this is kind of defensive programming.

we don't want to add unneeded warning suppressions like @SuppressWarnings("Convert2MethodRef")
so ... this defensive style won't hold unless it's "the way"

I'm aware that there is no guarantee that it will serve its purpose as there are not that many things in the JVM spec we can rely on. It is kind of a hack.

Class loading is not well defined, but it's not undefined either.
Classes are guaranteed not to be loaded if they are not needed.
So for example, of the only reference is Class.forName(.....) and that line isn't executed, the class is guaranteed not to be loaded.

Now, since it's really only about Alluxio module, we can do reflection-based loading for that class only, and then using Class.forName + plain class name as a string will be totally fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Class.forName for AlluxioMetastoreModule + good comment.
restore the rest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's really only about Alluxio module

Can we make sure it is only about AlluxioMetastoreModule. E.g. add a test which scans the class path and checks if only references to Aluxio library are coming from AlluxioMetastoreModule (possibly transitively)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We thought about a test like this, but I'm not convinced that it deserves this much effort put into it.

@findepi
Copy link
Member

findepi commented May 25, 2022

(x) No release notes entries required.

Why would we want to merge this if this is not solving any problem?

@ksobolew
Copy link
Contributor Author

(x) No release notes entries required.

Why would we want to merge this if this is not solving any problem?

The use case is somewhere between silly and paranoid, depending on your point of view: there are some users why physically delete JARs they don't use to avoid vulnerabilities therein. And then they complain that Trino doesn't start :) So that's my attempt at placating them somehow. "I tried, by the community rejected it" is one possible outcome, I guess ;)

@kokosing
Copy link
Member

Is it possible to update alluxio?

@kokosing
Copy link
Member

kokosing commented May 25, 2022

Or maybe we could make alluxio pluggable (requiring to place the jar in installation) and disabled by default. Just thinking at loud. IIRC it is not the first time we have issue like this here.

@kokosing kokosing dismissed their stale review May 25, 2022 14:38

Due Findepi comments

@findepi
Copy link
Member

findepi commented May 25, 2022

The use case is somewhere between silly and paranoid, depending on your point of view: there are some users why physically delete JARs they don't use to avoid vulnerabilities therein

If this PR is supposed to make peace with vulnerability scanners, it won't do that. I would assume vulnerability scanners scan all the classes, without analysis which one will get loaded.

Is this PR supposed to allow deleting alluxio jar from the classpath?
If so, I am fine with the change and yes, release notes aren't warranted. Please confirm.

BTW the alluxio integration is unmaintained. We should remove it. But that's a separate story.

@ksobolew
Copy link
Contributor Author

If this PR is supposed to make peace with vulnerability scanners, it won't do that. I would assume vulnerability scanners scan all the classes, without analysis which one will get loaded.

They do, that's correct. So it's not about appeasing them per se.

Is this PR supposed to allow deleting alluxio jar from the classpath? If so, I am fine with the change and yes, release notes aren't warranted. Please confirm.

Confirmed. The goal is to avoid loading these classes if not necessary.

BTW the alluxio integration is unmaintained. We should remove it. But that's a separate story.

Oh, I'd love to remove it :) From what I heard users also hate it. But it's still being used, so.

bindMetastoreModule("thrift", () -> new ThriftMetastoreModule());
bindMetastoreModule("file", () -> new FileMetastoreModule());
bindMetastoreModule("glue", () -> new GlueMetastoreModule());
bindMetastoreModule("alluxio", () -> new AlluxioMetastoreModule());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Class.forName for AlluxioMetastoreModule + good comment.
restore the rest.

@ksobolew ksobolew force-pushed the kudi/deferred-metastore-creation branch 2 times, most recently from 0d9cac7 to ea59d42 Compare May 26, 2022 09:59
@ksobolew
Copy link
Contributor Author

Addressed comments. Hopefully it's easier to swallow now :)

@ksobolew ksobolew force-pushed the kudi/deferred-metastore-creation branch from ea59d42 to c32ec30 Compare May 26, 2022 10:18
ksobolew added 2 commits May 26, 2022 16:20
Some of these modules are depending on third-party libraries (e.g.
Alluxio), which may have some security vulnerabilities; they should be
benign if the particular module is not enabled, but instantiation of the
module class may trigger loading and initialization of some code from
these libraries anyway. For this particular module we want to make some
preemptive action, because it is unmaintained (though still used).
This test will only make sure that the metastore can be instantiated,
like similar tests for other metastore implementations.
@ksobolew ksobolew force-pushed the kudi/deferred-metastore-creation branch from 9216842 to fcbdf7d Compare May 26, 2022 14:21
@kokosing kokosing merged commit 34c9763 into trinodb:master May 26, 2022
@github-actions github-actions bot added this to the 383 milestone May 26, 2022
@ksobolew ksobolew deleted the kudi/deferred-metastore-creation branch May 27, 2022 08:28
@ksobolew ksobolew changed the title Defer initialization of Hive metastore modules Defer initialization of AlluxioMetastoreModule May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants