Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Failing to start OpenSearch from Search Processor #162

Closed
sejli opened this issue Jul 10, 2023 · 3 comments
Closed

[BUG] Failing to start OpenSearch from Search Processor #162

sejli opened this issue Jul 10, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@sejli
Copy link
Member

sejli commented Jul 10, 2023

Overview

Following opensearch-project/dashboards-search-relevance#177 (about two months ago), all Remote integ tests workflow runs have failed with varying errors. This change was to implement an automated version bump to match OpenSearch and OpenSearch Dashboards versions. In particular, .github/workflows/remote-integ-tests-workflow.yml was updated to fetch versions from the other opensearch-project repos. This PR also included a change from using anomaly-detection to search-processor as a dependency for OpenSearch. Our Cypress integration tests require a running OpenSearch Dashboards instance, which also itself requires a running OpenSearch instance. To alleviate the long OpenSearch setup through checking out the main repo, we can run OpenSearch from a plugin. We switched from anomaly-detection to search-processor, since many of the maintainers of dashboards-search-relevance are also maintainers on search-processor. In theory, if anomaly-detection builds fail, this repo would be blocked until their maintainers fix the issue.

Cause of Failure

Taking a look at this run, OpenSearch and OpenSearch Dashboards seemingly build successfully. However, upon starting the Cypress tests, errors occur immediately with a connection refused error. Typically speaking, this error is occurs when OpenSearch Dashboards cannot connect to OpenSearch (most likely because OpenSearch isn't running). However, we do have a check to see if the OpenSearch instance started successfully before proceeding to the other steps.

After some testing on my local, I found that search-processor can start an instance of OpenSearch successfully with ./gradlew run -Dopensearch.version=<version>, as well as connect succesfully to OpenSearch Dashboards upon yarn start, but it fails once there are any interactions with OpenSearch Dashboards. In my testing to replicate the above action on version 2.8.1, I first set up OpenSearch with search-processor using the command ./gradlew run -Dopensearch.version=2.8.1, verifying that it started with curl localhost:9200. Then, I started a 2.8.1 version instance of OpenSearch Dashboards. After using command curl localhost:5601 to verify OpenSearch Dashboards setup, the search-processor instance of OpenSearch fails. Below is a portion of the search-processor logs .

» ERROR][o.o.ExceptionsHelper     ] [integTest-0] fatal error
»       at org.opensearch.ExceptionsHelper.lambda$maybeDieOnAnotherThread$4(ExceptionsHelper.java:332)
»       at java.base/java.util.Optional.ifPresent(Optional.java:183)
»       at org.opensearch.ExceptionsHelper.maybeDieOnAnotherThread(ExceptionsHelper.java:322)
»       at org.opensearch.http.netty4.Netty4HttpRequestHandler.exceptionCaught(Netty4HttpRequestHandler.java:66)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:447)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at org.opensearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:71)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»       at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
»       at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»       at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»       at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
»       at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
»       at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
»       at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
»       at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
»       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
»       at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
»       at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
»       at java.base/java.lang.Thread.run(Thread.java:829)
» ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [integTest-0] fatal error in thread [Thread-3], exiting
»  java.lang.AssertionError: Expected current thread [Thread[opensearch[integTest-0][transport_worker][T#8],5,main]] to not be a transport thread. Reason: [Blocking operation]
»       at org.opensearch.transport.Transports.assertNotTransportThread(Transports.java:79) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.common.util.concurrent.BaseFuture.blockingAllowed(BaseFuture.java:109) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:103) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:74) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:55) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.search.relevance.client.OpenSearchClient.getIndexSettings(OpenSearchClient.java:29) ~[?:?]
»       at org.opensearch.search.relevance.actionfilter.SearchActionFilter.getResultTransformerConfigurations(SearchActionFilter.java:163) ~[?:?]
»       at org.opensearch.search.relevance.actionfilter.SearchActionFilter.apply(SearchActionFilter.java:111) ~[?:?]
»       at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.action.support.TransportAction.execute(TransportAction.java:107) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:110) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.rest.action.RestCancellableNodeClient.doExecute(RestCancellableNodeClient.java:106) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:476) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.rest.action.search.RestSearchAction.lambda$prepareRequest$2(RestSearchAction.java:135) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:127) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.rest.RestController.dispatchRequest(RestController.java:320) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.rest.RestController.tryAllHandlers(RestController.java:411) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.rest.RestController.dispatchRequest(RestController.java:249) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.http.AbstractHttpServerTransport.dispatchRequest(AbstractHttpServerTransport.java:366) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.http.AbstractHttpServerTransport.handleIncomingRequest(AbstractHttpServerTransport.java:445) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.http.AbstractHttpServerTransport.incomingRequest(AbstractHttpServerTransport.java:356) ~[opensearch-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
»       at org.opensearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:55) ~[?:?]
»       at org.opensearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:41) ~[?:?]
»       at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at org.opensearch.http.netty4.Netty4HttpPipeliningHandler.channelRead(Netty4HttpPipeliningHandler.java:71) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
»       at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[?:?]
»       at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»       at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[?:?]
»       at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»       at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
»       at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
»       at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[?:?]
»       at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
»       at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
»       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
»       at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
»       at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
»       at java.lang.Thread.run(Thread.java:829) [?:?]

Full logs can be viewed here

Resolution/Mediation

For purposes of release, I rolled back our dependency from search-processor to anomaly-detection for the time being. This change to the workflow is in opensearch-project/dashboards-search-relevance#230. In order to satisfy the correct OpenSearch version dependencies, I specified the workflow to use a -SNAPSHOT postfix to be passed in when starting OpenSearch, since unreleased versions of OpenSearch require a -SNAPSHOT tag (3.0.0, 2.8.1, etc.).

This issue is to track the progress on the OpenSearch dependency. Currently, our options are to:

  1. Continue to use the anomaly-detection plugin and hope they maintain the plugin in a successful state
  2. Fix the issue with running OpenSearch from search-processor.
@sejli sejli added the bug Something isn't working label Jul 10, 2023
@sejli sejli self-assigned this Jul 10, 2023
@macohen
Copy link
Collaborator

macohen commented Jul 11, 2023

Very thorough explanation. Do other plugins take a dependency on another plugin to shorten build times?

@sejli
Copy link
Member Author

sejli commented Jul 11, 2023

Very thorough explanation. Do other plugins take a dependency on another plugin to shorten build times?

Looks like it. Going through the other dashboards plugins, all of them start OpenSearch from a single plugin. Typically, they use plugins that are the respective backends. For example, anomaly-detection-dashboards-plugin uses anomaly-detection to start OpenSearch. Below is a list of dashboards plugins and how they start OpenSearch to conduct Cypress tests within a GitHub workflow:

In order for dashboards-search-relevance to depend on search-processor instead of an external plugin, the issues with search-processor need to be addressed.

Side note, many of these plugins, don't follow the plugin repository naming conventions. Dashboards plugins should prefix with dashboards-.

@macohen macohen removed the untriaged label Jul 12, 2023
@macohen macohen transferred this issue from opensearch-project/dashboards-search-relevance Jul 12, 2023
@macohen macohen moved this from 🆕 New to 👀 In review in Search Project Board Jul 12, 2023
@macohen macohen removed the untriaged label Jul 12, 2023
@macohen macohen changed the title [BUG] Failing Integ Test Workflow [BUG] Failing to start OpenSearch from Search Processor Jul 12, 2023
@sejli
Copy link
Member Author

sejli commented Aug 2, 2023

As it turns out, OpenSearch was failing when adding sample data since there were analyzers missing. The issue stemmed from the integration tests, which were using integ-test distributions. The run task uses the same distributions to spin up a OpenSearch cluster. These test distributions don't install modules when run with ./gradlew run, so when attempting to add sample data, the cluster would exit. The testDistribution was changed in #188; issue has been resolved, closing.

@sejli sejli closed this as completed Aug 2, 2023
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants