-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TraceableTcpTransportChannel classcast exception #3514
Fix TraceableTcpTransportChannel classcast exception #3514
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3514 +/- ##
============================================
+ Coverage 64.77% 64.92% +0.15%
- Complexity 3623 3634 +11
============================================
Files 281 282 +1
Lines 20586 20598 +12
Branches 3400 3385 -15
============================================
+ Hits 13334 13374 +40
+ Misses 5552 5542 -10
+ Partials 1700 1682 -18
|
@reta, @peternied, @cwperks Please take a look at it. We need to merge this fix in 2.11 |
@Gaganjuneja There is another issue with the Telemetry experimental feature and the security plugin. The way that the TraceableHttpChannel wraps the underlying channel here is problematic. Security needed to recently refactor where auth was performed from the last step of the netty pipeline to further up the pipeline and needs access to the netty channel. You can see an example here of how the security plugin would access the channel: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/filter/SecurityRestFilter.java#L135 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gaganjuneja Thanks for creating this PR - if this is breaking it seems like we are missing some tests for these scenarios.
final TcpChannel inner = ((TcpTransportChannel) channel).getChannel(); | ||
nettyChannel = (Netty4TcpChannel) ((BaseTcpTransportChannel) inner).getChannel(); | ||
} else if (channel instanceof BaseTcpTransportChannel) { | ||
final TcpChannel inner = ((BaseTcpTransportChannel) channel).getChannel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add a test(s) that confirms this fix is resolved.
If you are looking for an example of how these tests could be written, take a look at https://github.com/opensearch-project/security/blob/main/src/integrationTest/java/org/opensearch/security/http/OnBehalfOfJwtAuthenticationTest.java that might provide some guidance on cluster configuration and a basic setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peternied, I have written a Integ test case and its failing for 2 more issues. Where RestRequest::getHttpChannel is being type casted to Netty4HttpChannel. At one place this is already been reported by @cwperks and in addition to that one check is here
|| !(underlyingRequest.getHttpChannel() instanceof Netty4HttpChannel)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gaganjuneja @peternied I see at least 2 issues with this code:
- it very coupled with Netty, that should not be the case
- the issue will come back when we would instrument
TaskTransportChannel
, that should not be the case
Would I would like to suggest: add get
(and possibly put
, have to look into it) methods to HttpChannel
and TransportChannel
:
public T get(String attribute, Class<T> clazz);
With that, we could remove || !(underlyingRequest.getHttpChannel() instanceof Netty4HttpChannel)) {
from OpenSearchRequest
since it will be null
value anyway. @Gaganjuneja do you want me to push this change to core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @reta, Yes it would be more cleaner approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one more change where we are trying to get the SSLHandler from the inbound and outbound pipeline
SslHandler sslhandler = (SslHandler) httpChannel.getNettyChannel().pipeline().get("ssl_http"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One clarification, Please pardon me for the naive question. How will we access the channel attributes (NettyAttribute::popFrom) for authentication? Or the old implementation should be suffice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to be yet another case (attr vs handler, looking into that), thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One clarification, Please pardon me for the naive question. How will we access the channel attributes (NettyAttribute::popFrom) for authentication? Or the old implementation should be suffice.
@Gaganjuneja chatted with Craig (@cwperks) offline, it seems like we may try to pursue a different path for early termination of the the request, I would suggest we do nothing for channel attributes (NettyAttribute::*) now since it is likely to go away (sadly no chances to fix that 2.11
). It should not be an issue, tracing is an experimental feature that is expected to have hiccups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gaganjuneja I've updated the pull request with respect to core changes
@reta, This seems to be new changes which will be bypassed for TraceableHttpChannel always. I am thinking, if we can add a new method to the HttpChannel::getType and let all the implementations implement it.
For Netty4
|
@Gaganjuneja I sadly cannot look at the issue today, travelling back home, but it does not look right - the HTTP transport should have no dependency on the engine behind it, if any, I will take a look tomorrow morning |
Sure, sorry for bothering you! |
Other simplest option could be is to have TraceableHttpChannel::getDelegateChannel method and inside the security plugin we can put an additional check
|
Signed-off-by: Gagan Juneja <[email protected]>
6eb05ba
to
d8451c7
Compare
…PIs instead of typecasting Signed-off-by: Andriy Redko <[email protected]>
d8451c7
to
49eda17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - this change looks much cleaner. Happy to re-review after an integration test has been added.
@Gaganjuneja are you working on that? |
@reta, I will pick this up very first thing next week. |
Signed-off-by: Gagan Juneja <[email protected]>
e16fe56
There is nothing we can test about telemetry. Only thing is to enable it and see if cluster is coming up fine as far as the integration with security plugin is concerned. I have enable it for SearchOperationTest.java and now it should be able to catch if anything is breaking due to change in the Telemetry side. |
Signed-off-by: Gagan Juneja <[email protected]>
Signed-off-by: Gagan Juneja <[email protected]>
@@ -371,6 +373,16 @@ public class SearchOperationTest { | |||
USER_ALLOWED_TO_PERFORM_INDEX_OPERATIONS_ON_SELECTED_INDICES, | |||
USER_ALLOWED_TO_CREATE_INDEX | |||
) | |||
.nodeSettings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag is noop if plugin is not installed, right? I think what we could do is to enhance (or add new) GA which does full plugin install test to have security and opentelemetry plugin installed together, @peternied does it sound right to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I would expect that behavior. @Gaganjuneja can you help us understand what this setting controls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @reta you are right. but this would at least create the Traceable wrappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @reta you are right. but this would at least create the Traceable wrappers.
how? it will use NOOP tracer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. After this recent change tracer:isTracingEnabled, it wont.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved this flag to its own class - but I was not able to reproduce a failure with the other changes in this CR reverted.
Please pull the latest changes from your fork and test locally until the test can fail if you'd reverted the other changes - its possible I've got the feature flag configured incorrectly.
@@ -371,6 +373,16 @@ public class SearchOperationTest { | |||
USER_ALLOWED_TO_PERFORM_INDEX_OPERATIONS_ON_SELECTED_INDICES, | |||
USER_ALLOWED_TO_CREATE_INDEX | |||
) | |||
.nodeSettings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I would expect that behavior. @Gaganjuneja can you help us understand what this setting controls?
@@ -371,6 +373,9 @@ public class SearchOperationTest { | |||
USER_ALLOWED_TO_PERFORM_INDEX_OPERATIONS_ON_SELECTED_INDICES, | |||
USER_ALLOWED_TO_CREATE_INDEX | |||
) | |||
.nodeSettings( | |||
Map.of(FeatureFlags.TELEMETRY_SETTING.getKey(), true, TelemetrySettings.TRACER_FEATURE_ENABLED_SETTING.getKey(), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should these tests use tracing vs not use tracing?
As I see it the tracing platform feature certainly could have tests that are in the security plugin, but I would expect all tests to behave the same if the tracing plugin is operational vs noop. If that was the case, then I'd expect just a couple of tests that verify specific security plugin + tracing behavior works as opposed to flagging different behavior for whole test suites.
If we believe that we need whole suites of duplicate testing to ensure confidence I might not understand how tracing is interacting with the security plugin and would love more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peternied, As mentioned by @reta these settings are not going to work and tracing would be noop in this scenario. In case of NoopTracer we tried to implement in a way that its shouldn't alter the execution pattern by unnecessarily introducing the tracer related core changes (Tracing being an experimental feature eg. is TraceableHttpChannel here. If tracer is noop we are not wrapping the HttpChannel hence its working with FeatureFlag disabled).
Now to test the telemetry plugin and security plugin we need to install them together and execute the tests.
To validate the telemetry working we need to have some search/indexing requests and validate if traces are being generated and which is not very straight forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To validate the telemetry working we need to have some search/indexing requests and validate if traces are being generated and which is not very straight forward.
While it might not be, I think that is where we will need to land before - think of it as a requirement to release the tracing feature. Could you create a new issue for end to end validation and include what sanity tests might be needed alongside tests that are security specific, then link it to where all the tracing deliverables are being tracked?
I don't want to bog down this fix if its more appropriate to step back and confirm the approach to then execute on it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we testing security with other plugins today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, Even if we write the integration test, it's still going to fail bcauase the issue is partially fixed and the check inside NettyAttribute.java is still there -
public static <T> Optional<T> popFrom(final RestRequest request, final AttributeKey<T> attribute) {
if (request.getHttpChannel() instanceof Netty4HttpChannel) {
Channel nettyChannel = ((Netty4HttpChannel) request.getHttpChannel()).getNettyChannel();
return Optional.ofNullable(nettyChannel.attr(attribute).getAndSet(null));
}
return Optional.empty();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, correct, this part is on me, thanks for reminding @Gaganjuneja
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gaganjuneja You can see an example of how anomaly-detection tests with security here: https://github.com/opensearch-project/anomaly-detection/blob/main/.github/workflows/test_security.yml
They run their full integ test suite in a cluster w/ the security plugin installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing, Even if we write the integration test, it's still going to fail bcauase the issue is partially fixed and the check inside NettyAttribute.java is still there -
public static <T> Optional<T> popFrom(final RestRequest request, final AttributeKey<T> attribute) { if (request.getHttpChannel() instanceof Netty4HttpChannel) { Channel nettyChannel = ((Netty4HttpChannel) request.getHttpChannel()).getNettyChannel(); return Optional.ofNullable(nettyChannel.attr(attribute).getAndSet(null)); } return Optional.empty(); }
Ah, correct, this part is on me, thanks for reminding @Gaganjuneja
@reta, What is your opinion, this should be addressed here on in another PR.
@DarshitChanpura, Once the above issue is addressed, I will write the integration test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reta, What is your opinion, this should be addressed here on in another PR.
Up to you @Gaganjuneja
Signed-off-by: Peter Nied <[email protected]>
src/main/java/org/opensearch/security/ssl/transport/SecuritySSLRequestHandler.java
Show resolved
Hide resolved
@Gaganjuneja would you need any helping getting this across? |
@Gaganjuneja are you still working on this? |
Doesn't look like this is being worked on anymore, going to close this out. Please feel free to reopen if you re-engage on this @Gaganjuneja |
…PIs instead of typecasting (#3917) ### Description This is cherry-pick from #3514 to use the channel properties instead of type-casting ### Issues Resolved Closes #3911 Is this a backport? If so, please add backport PR # and/or commits # ### Testing The change is covered by existing test suites ### Check List - [X] New functionality includes testing - [X] New functionality has been documented - [X] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Andriy Redko <[email protected]>
…PIs instead of typecasting (opensearch-project#3917) This is cherry-pick from opensearch-project#3514 to use the channel properties instead of type-casting Closes opensearch-project#3911 Is this a backport? If so, please add backport PR # and/or commits # The change is covered by existing test suites - [X] New functionality includes testing - [X] New functionality has been documented - [X] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Andriy Redko <[email protected]> (cherry picked from commit b038f93)
…PIs instead of typecasting (opensearch-project#3917) ### Description This is cherry-pick from opensearch-project#3514 to use the channel properties instead of type-casting ### Issues Resolved Closes opensearch-project#3911 Is this a backport? If so, please add backport PR # and/or commits # ### Testing The change is covered by existing test suites ### Check List - [X] New functionality includes testing - [X] New functionality has been documented - [X] Commits are signed per the DCO using --signoff By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check [here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin). Signed-off-by: Andriy Redko <[email protected]>
Description
Fix the TraceableTcpTransportChannel class cast exception when enabling the feature opensearch.experimental.feature.telemetry.enabled. Issue reported here - #3464 (comment)
Issues Resolved
#3515
Reference Chnage - https://github.com/opensearch-project/OpenSearch/blob/8e5e54e87f6f86a2425a5c767e208767ba3df14c/server/src/main/java/org/opensearch/transport/RequestHandlerRegistry.java#L101
Is this a backport? If so, please add backport PR # and/or commits #
Testing
[Please provide details of testing done: unit testing, integration testing and manual testing]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.