Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More avoidance for loadAuthorizedIndices #81237

Merged
merged 19 commits into from
Jan 31, 2022

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Dec 2, 2021

This PR replaces the concrete list of authorizeIndices with a lazily load class
so that loading is not triggered if not necessary.

For example, a search targeting a concret name, GET index/_search no longer
triggers the loading. But names with wildcard will still trigger the loading.

However if we organize the indices with data streams and alias, it is possible
to achieve similar wildcard effect while still avoid the loading. That is,
searches like GET alias/_search or GET data_stream/_search will not
trigger the loading even they target multiple indices behind the single name.

@ywangd ywangd added :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v8.1.0 labels Dec 2, 2021
@ywangd
Copy link
Member Author

ywangd commented Dec 8, 2021

@elasticmachine update branch

@albertzaharovits
Copy link
Contributor

I think delaying evaluating the authorizedIndices will turn out to be a great optimization in practice because many requests do not contain wildcards.
I'm not sure about the switch to a predicate though (#81237 (comment)) . I need to think it through a bit more, but maybe you can clarify your thoughts about this particular change.

@albertzaharovits
Copy link
Contributor

TBC I'm generally in favor of this change.

@ywangd
Copy link
Member Author

ywangd commented Jan 24, 2022

@albertzaharovits I made a rather big change based on the discussion. It is still along the agreement we reached last time and overall a simplication. Instead of changing the signatures of existing Classes (AuthorizationEngine, IndexAbstractionResolver etc), the PR now introduces a new Set implemetation which lazily builds the authorized indices when necessary, but otherwise uses predicate for contains test. So the idea does not change, but the changeset is much smaller. Here is a summary for how it should work for different scenarios:

  1. Concrete names, e.g. index1, or index1,index2,index3. Performance should be much better because the predicate testing for the individual names should much faster than loading all authorized indices. [1]
  2. Simple wildcard, e.g. *, or logs-*,metrics-*. No change to performance because the authorizedIndices set will just be built when there are genuine wildcard expansion.
  3. The more interesting case is a mix of concrete names and wildcards, e.g. index1,logs-*. The performance could be slightly worse because the code first runs the predicate for index1 and then still needs to load authorizedIndices for logs-*. So the performance lost is one additional predicate execution. Obviously, the performance can keep degrade with longer list of concrete index names. [2]

I think the change should have a postive impact on performance overall because above Scenario 1 is very common and the potential gain is significant (especially when the cluster is large in size). Scenario 3 does have certain performance penalty but it is mitigated by (1) We can advise users to list wildcard patterns first in the requested names; (2) It is unlikely for the list of concrete names to be very long which means the performance drop should not be anyway significant (compared to loading authorizedIndices).

[1] An edge case of this scenario is that all requested names are for backing indices and the user does not have privileges over data stream. In this case, the index pattern predicate can run twice for each give name (first for the data stream name, a second time for the backing index name). However, it still unlikely an issue because (1) it still saves time in loading authorized indices which is likely much more expensive than running the predicate couple more times; (2) We do not recommend granting privileges only over the backing indices. If the user does not have privileges over the backing indices either, the request will end up be a 403 failure and performance is not critical in the failure case.

[2] We can avoid the additional predicate executions by looping through the requested names earlier and find out whether it has any wildcard. However, this loop itself also has its cost. And if the names are all concrete, it ends up doing useless work. In short, it is still a tradeoff between different scenarios.

@ywangd ywangd marked this pull request as ready for review January 24, 2022 03:36
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jan 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

for (IndexAbstraction indexAbstraction : lookup.values()) {
if (indexAbstraction.getType() != IndexAbstraction.Type.DATA_STREAM && predicate.test(indexAbstraction)) {
indicesAndAliases.add(indexAbstraction.getName());
final Set<String> authorizedIndices = Collections.unmodifiableSet(indicesAndAliases);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AuthorizedIndicesSet should protect against modification to this, so maybe avoid the wrapping in an unmodifiable set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is not strictly necessary. The Set is passed into timeChecker.accept in the next line. I was pretending that if we don't know anything about timeChecker and want to have maximum protection against modification, then we need to wrap it in an unmodifiableSet. I think it was just me being unnecessarily paranoid. So I dropped the wrapping.


@Override
public boolean containsAll(Collection<?> c) {
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should implement this in some way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it. I meant to but somehow missed it. Fixed by just delegating to contains

if (authorizedIndices == null) {
authorizedIndices = supplier.get();
}
return authorizedIndices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would slightly prefer we make this check thread safe, like in the CachingAsyncSupplier.
We're not using it in a multi-thread fashion, and I don't think we would ever do, and even if we did we only risk performance problems with building the set multiple times (but no correctness problems).
What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Set is wrapped in a CachingAsyncSupplier, i.e. resolvedIndicesAsyncSupplier (not authorizedIndicesSupplier which I will remove, see below). The resolvedIndicesAsyncSupplier is what gets passed to other places so that this Set is never directly used by anything else. Since resolvedIndicesAsyncSupplier already guarantees thread safety, I think the Set itself can be kept simple. I added the following comment to the class to clarify its usage:

* NOTE that the lazy loading does not guarantee to run only once and is not meant to be used by multi-threads
* because loading multiple times can incur performance penalty (but not correctness).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this implementation can be considered thread safe. There's no memory barrier here, so there's no guarantee about the state of the authorizedIndices field when viewed from another thread.

The options are:

  • simply state that it is not thread-safe and if callers require thread safety they need to synchronize
  • mark authorizedIndices as volatile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It is indeed not thread safe. I changed the documentation to state that and warned about changing how this class is used.

})
);
});
final AsyncSupplier<Set<String>> authorizedIndicesSupplier = new CachingAsyncSupplier<>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we still need the CachingAsyncSupplier wrapper magic?
I would ditch it, it's hard to follow and useless now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is redundant now and should be removed.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM this is looking cool!
Only had small points.

@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

public RBACEngine(
Settings settings,
CompositeRolesStore rolesStore,
LoadAuthorizedIndicesTimeChecker.Factory authzIndicesTimerFactory
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving TimeChecker here means it is no longer applied to custom AuthorizationEngine. I think it is OK. The intention was always to measure the performance of RBACEngine.

Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lack of tests that this change actually achieves anything. The existing tests verify that you haven't broken any existing behaviour, but there's no test that would fail if someone changed the behaviour back so it was no longer lazy.

if (authorizedIndices == null) {
authorizedIndices = supplier.get();
}
return authorizedIndices;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this implementation can be considered thread safe. There's no memory barrier here, so there's no guarantee about the state of the authorizedIndices field when viewed from another thread.

The options are:

  • simply state that it is not thread-safe and if callers require thread safety they need to synchronize
  • mark authorizedIndices as volatile

@Override
public boolean contains(Object o) {
if (authorizedIndices == null) {
return predicate.test((String) o);
Copy link
Contributor

@tvernum tvernum Jan 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this violates the guarantees on the Set interface. You should return false if o is not a String (rather than throw CCE).

Suggested change
return predicate.test((String) o);
if (o instanceof String s) {
return predicate.test(s);
} else {
return false;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it. But JDK doc says it can throw ClassCastException . So I didn't do anything. Also adding instanceof check can in theory hide subtle usage error. So I am not sure if it helps with anything.

@ywangd
Copy link
Member Author

ywangd commented Jan 31, 2022

There seems to be a lack of tests that this change actually achieves anything. The existing tests verify that you haven't broken any existing behaviour, but there's no test that would fail if someone changed the behaviour back so it was no longer lazy.

I added two assertions to existing tests to ensure that the returned Set from resolveAuthorizedIndicesFromRole is an instance of AuthorizedIndicesSet. Also added a new test to ensure the lazy behaviour of AuthorizedIndicesSet.

PS: Benchmark showed that the change helped a great deal for requests with a single concrete name.

@ywangd ywangd requested a review from tvernum January 31, 2022 01:21
…ecurity/authz/RBACEngineTests.java

Co-authored-by: Tim Vernum <[email protected]>
@ywangd ywangd merged commit 5265827 into elastic:master Jan 31, 2022
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 29, 2022
…#81237)

This PR replaces the concrete list of authorizeIndices with a lazily load class
so that loading is not triggered if not necessary.

For example, a search targeting a concret name, GET index/_search no longer
triggers the loading. But names with wildcard will still trigger the loading.

However if we organize the indices with data streams and alias, it is possible
to achieve similar wildcard effect while still avoid the loading. That is,
searches like GET alias/_search or GET data_stream/_search will not
trigger the loading even they target multiple indices behind the single name.
elasticsearchmachine pushed a commit that referenced this pull request Jul 14, 2022
…#88149)

* Avoid loadAuthorizedIndices for requests with concrete names (#81237)

This PR replaces the concrete list of authorizeIndices with a lazily load class
so that loading is not triggered if not necessary.

For example, a search targeting a concret name, GET index/_search no longer
triggers the loading. But names with wildcard will still trigger the loading.

However if we organize the indices with data streams and alias, it is possible
to achieve similar wildcard effect while still avoid the loading. That is,
searches like GET alias/_search or GET data_stream/_search will not
trigger the loading even they target multiple indices behind the single name.

* fix compilation

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Security Meta label for security team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants