Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Hive metastore impersonation #1441

Merged
merged 1 commit into from
Sep 23, 2019

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Sep 4, 2019

Fix #43

@cla-bot cla-bot bot added the cla-signed label Sep 4, 2019
@ebyhr ebyhr force-pushed the hms-impersonation branch 2 times, most recently from a937203 to 1299349 Compare September 5, 2019 13:46
@electrum
Copy link
Member

electrum commented Sep 6, 2019

Would it be sufficient to call set_ugi() after connecting to the metastore? The delegation token support says it is there for map/reduce tasks, where the tasks would run on a different machine (and thus wouldn't have access to the original credentials).

@ebyhr ebyhr force-pushed the hms-impersonation branch from 1299349 to 72a6bdc Compare September 6, 2019 16:11
@ebyhr
Copy link
Member Author

ebyhr commented Sep 6, 2019

Thank you for the advise. I think it's sufficient too and replaced the delegation token logic with set_ugi().

@ebyhr ebyhr marked this pull request as ready for review September 7, 2019 03:53
@ebyhr ebyhr force-pushed the hms-impersonation branch 4 times, most recently from 75aa6e9 to d6623de Compare September 8, 2019 11:10
@ebyhr ebyhr requested a review from electrum September 9, 2019 06:30
@AnupGS
Copy link

AnupGS commented Sep 13, 2019

Hi, impersonation support is missed out for get table operation. is there any plan to add support for that as well?

@electrum
Copy link
Member

@AnupGS What is the use case for impersonation for read calls? We could certainly add those, but I'm not aware of any reason to do so. When using SQL standard authorization, Presto performs all the security checks using the security information provided by the metastore.

@findepi
Copy link
Member

findepi commented Sep 16, 2019

@electrum this is required because HMS may impose additional checks on get_table call.
It may be, for example, that Presto user is allowed to list tables only, but it's not allowed to read from any of them.
With storage-based auth enabled in HMS, Presto user won't be able to do get_table on any table.

@ebyhr @electrum I will share with you exact impersonation rules we have implemented.

@ebyhr thanks for working on this! And apologies for not opening our impersonation impl earlier, as I promised.
It was dependent on several other changes we made.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, otherwise looks good. Apologies for the delay in reviewing. I'd like to merge this for the next release.

@findepi or @kokosing do you want to look at the product tests? They seem fine to me, but I'm not an expert on them.

@electrum
Copy link
Member

@findepi Thanks for explaining. @ebyhr It sounds like we need impersonation for the read calls. If you have time to update this PR to include those, great, otherwise we can merge this as-is and follow up with those later.

@ebyhr
Copy link
Member Author

ebyhr commented Sep 16, 2019

@findepi @electrum Thanks for your reviews and comments. I'm just adding impersonation for get_table and the dependent methods. I'll push it later.

@AnupGS
Copy link

AnupGS commented Sep 16, 2019

Hi @electrum, @findepi has explained it perfectly. presto user is unable to get table if storage based auth is enabled.

@findepi findepi self-requested a review September 16, 2019 20:53
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments. Well done!

set -xeuo pipefail

presto-product-tests/bin/run_on_docker.sh \
singlenode-hive-impersonation \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://github.com/prestosql/presto/blob/c808deea818325b300cb3d185ef1af0f7a056cd8/presto-hive/src/main/java/io/prestosql/plugin/hive/metastore/thrift/ThriftHiveMetastore.java#L144

this.impersonationEnabled = thriftConfig.isImpersonationEnabled() && hiveMetastoreAuthentication instanceof KerberosHiveMetastoreAuthentication;

the impersonation is supported only when kerberos is enabled
i think with setUGI, we should support impersonation with and without kerberos

-- but this also signals we don't have a direct test for an impersonation, something that would fail when impersonation does not work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be visible if you add authorization here. (It should be included here anyway, it contains TestRoles, this is important to run here)

@findepi
Copy link
Member

findepi commented Sep 16, 2019

When applying changes, please separate them from rebase (or delay rebase), so that the changes themselves can be viewed on GH ui.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, adding impersonation for get calls was more complicated than I thought. Thanks for doing that.

set -xeuo pipefail

presto-product-tests/bin/run_on_docker.sh \
singlenode-hive-impersonation \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be visible if you add authorization here. (It should be included here anyway, it contains TestRoles, this is important to run here)


presto-product-tests/bin/run_on_docker.sh \
singlenode-kerberos-hive-impersonation \
-g storage_formats,cli,hdfs_impersonation,authorization,hive_file_header
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to include cli and hive_file_header tests?
(@kokosing do you know why do we run cli tests so often?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason to include those tests. I followed existing hdfs-impersonation tests.

@ebyhr
Copy link
Member Author

ebyhr commented Sep 17, 2019

Let me push updated code once. Below comments are not yet applied.
#1441 (comment) & #1441 (comment)

@ebyhr
Copy link
Member Author

ebyhr commented Sep 19, 2019

Pushed so that we can share the current status.

@@ -86,6 +87,7 @@
implements HiveMetastore
{
protected final HiveMetastore delegate;
private final boolean impersonationEnabled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this field if we add isImpersonationEnabled() to HiveMetastore. Then updateIdentity() can simply call delegate.isImpersonationEnabled().

Copy link
Member Author

@ebyhr ebyhr Sep 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we always return false in FileHiveMetastore and throw an exception in GlueHiveMetastore? At least, we can't inject ThriftHiveMetastoreConfig into FileHiveMetastore. We can do it in GlueHiveMetastore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, always false. we don't support impersonation there

@electrum
Copy link
Member

I had a few comments, mostly around the usage of ThriftHiveMetastoreConfig, which is actually the cause of the Travis test failures. This should be easy to fix. After that, this PR should be ready to go. Thanks for all your hard work on this. It's much more involved than I imagined.

@ebyhr ebyhr force-pushed the hms-impersonation branch 2 times, most recently from 9736171 to ed32e58 Compare September 22, 2019 02:01
Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.
Thanks @ebyhr


private HiveIdentity()
{
this(new ConnectorIdentity("dummy_identity", Optional.empty(), Optional.empty()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply:

Suggested change
this(new ConnectorIdentity("dummy_identity", Optional.empty(), Optional.empty()));
this.username = "dummy_identity";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I'd prefer the username be Optional<String> in this class.
There is only one place where getUsername() is called, so it's not a big deal and improves clarify.

private HiveIdentity updateIdentity(HiveIdentity identity)
{
// remove identity if not doing impersonation
return delegate.isImpersonationEnabled() ? identity : HiveIdentity.none();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveIdentity.none() is going to be used as a cache key many times at the same time, so using a singleton would be nice.

@electrum
Copy link
Member

One more thing: we can remove AVRO_SCHEMA_URL from TestGroups in product-tests

@electrum electrum merged commit b5a6a8f into trinodb:master Sep 23, 2019
@martint martint added this to the 320 milestone Oct 6, 2019
@ebyhr ebyhr deleted the hms-impersonation branch April 8, 2020 02:48
@BlueStalker
Copy link
Contributor

BlueStalker commented Apr 9, 2020

Actually I wonder how this is working, the set_ugi implementation in hive metastore server-side is like adding the user information in the session, and there is no additional impersonation or authentication implemented. In fact, the earlier version of implementation (delegation token based) looks pretty promising and follow what hive did internally. Have you guys tested this in the real secure hive metastore environment, for example, if you create a table, is it owned by the authenticated user (mostly presto) or the real impersonation user (end-user)? @ebyhr @findepi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Support Impersonation in Metastore communication
6 participants