-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Token Service #39808
Refactor Token Service #39808
Conversation
Pinging @elastic/es-security |
@elasticmachine test this please |
Don't use icons in PR description! |
The failures were utterly alien and I think they were due to the #39512 missing in the PR . I have merged master in. |
7384e1d
to
981fc99
Compare
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-2/9732/
@elasticmachine run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the changes look good and aren't that bad to review on the UI in split mode :)
.../src/main/java/org/elasticsearch/xpack/security/action/token/TransportCreateTokenAction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
}; | ||
final String refreshedNewTokenId; | ||
try { | ||
refreshedNewTokenId = checkTokenDocForRefresh(source, clientAuth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I much prefer the previous implementation with the optional. There is no need to throw and then catch to pass it on to a listener if we have a clean way of doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more than this going on behind the original implementation, I'll try to briefly explain.
Firstly, note that returning non-null, null, and throwing an exception are three different return results! The proposed change uses all of them.
The parsed token document is a map of maps. We don't parse it in a fully fledged entity.
The original implementation did some integrity checks (verify if entries existed and have the expected type) on the map(s) entries and then, subsequently, at completely different places in the code, values were extracted out of the maps, under the assumption that the null check had been performed before. I don't like that. I prefer to keep the null check as I navigate in the maps and extract the values. I feel like walking on a mine field when navigating maps knowing that null checks have been performed somewhere else. So, in short, the proposed implementation keeps value extracting paired with the null checks, without introducing a new entity.
Now, if we agree on this, the "value" extraction should be done once and grab everything in one pass, otherwise we'll have to duplicate the null checks on each value extraction, by the same philosophy. So, the proposed implementation grabs everything in one pass, and returns: null
if the access token has not been refreshed before, non-null value representing the document id in case the access token has been refreshed recently, throw exception in case the token document is malformed or the refresh token is invalidated or expired. These are three return results, that the proposed implementation uses, again without introducing another class/entity.
What do you say? Can we keep it in the proposed form? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in short, the proposed implementation keeps value extracting paired with the null checks, without introducing a new entity.
What's the reasoning for not introducing a new entity? I've been thinking about this and from a code readability/understand-ability perspective a entity can convey the meaning of the returned result much better and enables someone to more easily dive into this code. In the past, I have definitely used this type of behavior but then I start to think about some of those times and we've actually gone back and changed it to have a result entity; examples would be AuthenticationResult and RoleRetrievalResult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reasoning for not introducing a new entity?
The "read the document once" was closer to the existing code, as the code at this abstraction level deals with maps, and only a single field is required from all the entity parsing anyway, so the parsing seemed like a stretch. Furthermore the field is not even an entity field, it's a document id, which should probably be abstracted away by the entity (as a token id).
Do you wish to go down the entity parsing path in this instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about something like:
class RefreshStatus {
boolean isRefreshed() { }
boolean isRefreshable() { }
boolean isValid() { }
Exception getCause() { }
String getRefreshedTokenId() { }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a very strong opinion on this. I think I prefer the new entity approach slightly better from a readability perspective, but I could be happy with slightly longer javadoc and the methods kept as is.
Also, why don't we like all integrity and null checks in one place? I find it preferable, compared to checking before use in many places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have pushed a7139a8 to assuage your concerns. The previous integrity checks would have returned 400 for actual 500 error codes. Given the previous feedback, I have changed them in this last "refactor".
@jaymode this is not exactly your suggestion because isRefreshable
and isValid
cannot cohabitate with getCause
because getCause
would be calling them internally so the caller of getCause
would have no reason to call them on its own. Moreover, I believe getCause()
is not how things are currently done in the code base, when we encounter validation errors we throw exceptions if the function is sync or call the onFailure
for async functions. Furthermore, what would be the value of isRefreshed
if getCause
is not null
? We can come up with some rules about which flag is true/false/null in relation to others, but I doubt that this would be an improvement over checking the refreshTokenDocumentId
for null
that this thread is arguing against.
why don't we like all integrity and null checks in one place?
I don't like to have any checks far from the code relying on those checks, because when code changes in one place it has to change in the other, and I could more easily forget to change it in two places rather than one. Moreover, it is not clear to me what is the span that a "previous" check covers. The guarantees of a previous check fade away as an unmutable object is passed around in functions that could change it. For example, we make some guarantees on the document write function a few lines above, and then read the doc here, and do the checks. Why bother with the checks, then, if they were asserted "before", in the write method?
I find it preferable, compared to checking before use in many places
I disagree. In a world with null
"values" and no immutable objects, I would be checking for null
on every single step. This world is real, though, it's the world in which all objects are Map
s of Map
s . Fortunately we can break free of that world by parsing the map in an entity object, that has no null
"values". This world is indeed not real, because the Java gods have blessed us with the null
"value", so we decided to use null
s in very "specific" circumstances, where "specific" is not defined (can be null
).
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
…ecurity/authc/TokenService.java Co-Authored-By: albertzaharovits <[email protected]>
…ecurity/authc/TokenService.java Co-Authored-By: albertzaharovits <[email protected]>
…ecurity/authc/TokenService.java Co-Authored-By: albertzaharovits <[email protected]>
@elasticmachine run elasticsearch-ci/packaging-sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thank you.
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/UserToken.java
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Show resolved
Hide resolved
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Show resolved
Hide resolved
* and return an access token and refresh token based on that. Otherwise this token document gets its refresh_token marked as refreshed, | ||
* while also storing the Instant when it was refreshed along with a pointer to the new token document that holds the refresh_token that | ||
* supersedes this one. The new document that contains the new access token and refresh token is created and finally the new access | ||
* token and refresh token are returned to the listener. | ||
*/ | ||
private void innerRefresh(String tokenDocId, Map<String, Object> source, long seqNo, long primaryTerm, Authentication clientAuth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to ping @kobelb here and maybe help kibana get a list of potential messages for the InvalidGrantException
s we throw (if these have changed) as this affects the way that Kibana treats auth errors wrt to refreshing tokens.
In a follow up, we can probably sit down and define a list of short error codes that we can use instead of descriptive messages that can change in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please! At this moment, we're looking at the error_description
of the HTTP response body for the following two strings to determine if the error represents an invalid refresh token:
- token has already been refreshed
- refresh token is expired
/cc @azasypkin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe to assume HTTP error code 400 is an invalid refresh token. If you find it otherwise, I would suggest we treat it as a bug. Would this be satisfiable for you @kobelb ?
There are many more messages to account for (which were not introduced/changed by this PR) if you insist on parsing the error_description
in the Exception object. Not even all failure paths will return an ElasticsearchException
- that has the error_description
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe to assume HTTP error code 400 is an invalid refresh token. If you find it otherwise, I would suggest we treat it as a bug.
Currently Kibana automatically re-initiates SAML handshake only if refreshing fails with one of the errors mentioned above, otherwise we just return error. @albertzaharovits do you think we should re-initiate SAML handshake in Kibana as long as we get 400
irrespective to the underlying reason? If brand new SAML session is the only recovery mechanism for all 400
errors we get while trying to refresh then it seems there is no need to inspect error_description
on our side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@azasypkin I think so yes. We should strive to make 400 errors client errors, so they should not be due to ES instability, to the effect that a retry with the same contents will never be accepted. If Kibana decides to re-initiate the SAML flow, or present the error to the user and abort, is a different story IMO. That being said, we have not taken great care that 400 errors are honored as described, I will take on me the responsibility to look over the code again and see that 400 and 500s are consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one place where this is not happening is
https://github.com/albertzaharovits/elasticsearch/blob/0d4872665adefb8c4db2d4fedcd36dc446131d02/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java#L383
where we return a 500 when we can't find the token document based on the Bearer token that was provided. This made sense when the Bearer token was the encrypted serialized version of the UserToken, but not anymore - I have this tracking in #38866 and plan to address this soon, now that this PR is merged.
I totally agree with Albert here, if you get a 400
while refreshing then it means that the the refresh token can't be used, no matter what that reason is. I think we should re-inititate the authentication flow from Kibana when this happens, as I can't think of a case where the failure to refresh should cause a user to see an error in Kibana.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sounds good, filed elastic/kibana#33646
x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/authc/TokenService.java
Outdated
Show resolved
Hide resolved
…ecurity/authc/TokenService.java Co-Authored-By: albertzaharovits <[email protected]>
@elasticmachine elasticsearch-ci/default-distro |
@elasticmachine run elasticsearch-ci/bwc |
@elasticmachine run elasticsearch-ci/default-distro |
@jkakavas this is ready for another review round. |
Is probably a failed backport:
As well as:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the iterations Albert !
@elasticmachine run elasticsearch-ci/bwc |
@elasticmachine run elasticsearch-ci/packaging-sample |
This refactoring is in the context of the work related to moving security tokens to a new index. In that regard, the Token Service has to work with token documents stored in any of the two indices, albeit only as a transient situation. I reckoned the added complexity as unmanageable, hence this refactoring. This is incomplete, as it fails to address the goal of minimizing .security accesses, but I have stopped because otherwise it would've become a full blown rewrite (if not already). I will follow-up with more targeted PRs. In addition to being a true refactoring, some 400 errors moved to 500. Furthermore, more stringed validation of various return result, has been implemented, notably the one of the token document creation.
This refactoring is in the context of the work related to moving security tokens to a new index. In that regard, the Token Service has to work with token documents stored in any of the two indices, albeit only as a transient situation. I reckoned the added complexity as unmanageable, hence this refactoring. This is incomplete, as it fails to address the goal of minimizing .security accesses, but I have stopped because otherwise it would've become a full blown rewrite (if not already). I will follow-up with more targeted PRs. In addition to being a true refactoring, some 400 errors moved to 500. Furthermore, more stringed validation of various return result, has been implemented, notably the one of the token document creation.
Be warned, this is one of those PRs that you have to pull down and inspect.
I have started to refactor the Token Service with the general goal of reducing complexity and, more specifically, minimizing .security index access.
This is in the context of the work related to moving security tokens to a new index. In that regard, the Token Service has to work with token documents stored in any of the two indices, albeit only as a transient situation. I reckoned the added complexity as unmanageable, hence this refactoring.
This is incomplete, as it fails to address the goal of minimizing .security accesses, but I have stopped because otherwise it would've become a full blown rewrite (if not already). I will follow-up with more targeted PRs.
I understand this is hard to review, so I have tried to keep individual commits standalone and reviewable on their own. Nevertheless this might be a pain! One round of beers for each of the two reviewers (GAH is close)!