Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mapping service to be null for scenarios of shard recovery from translog #685

Conversation

martin-gaievski
Copy link
Member

@martin-gaievski martin-gaievski commented Dec 20, 2022

Signed-off-by: Martin Gaievski [email protected]

Description

In certain scenarios codec can be initialized as part of recovery process for shard (e.g. index close and open operations). In such cases mapperService can be passed as null, but currently this causes NPE exception on kNN side with below stacktrace. It's ok to allow null value as codec is not actually required and is initialized just to create new engine and read segment information. In addition to that we do have another check for null references in kNN code

[WARN ][o.o.i.c.IndicesClusterStateService] [integTest-0] [target_index][5] marking and sending shard failed due to [failed recovery]
org.opensearch.indices.recovery.RecoveryFailedException: [target_index][5]: Recovery failed on {integTest-0}{eHellPqISveDSsP3t1W3kw}{V5PeDGWOQK-xl0mUnKEnuQ}{127.0.0.1}{127.0.0.1:9300}{dimr}{testattr=test, shard_indexing_pressure_enabled=true}
        at org.opensearch.index.shard.IndexShard.lambda$executeRecovery$25(IndexShard.java:3181) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.StoreRecovery.lambda$recoveryListener$7(StoreRecovery.java:436) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionListener.completeWith(ActionListener.java:345) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.StoreRecovery.recoverFromStore(StoreRecovery.java:111) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.IndexShard.recoverFromStore(IndexShard.java:2303) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.index.shard.IndexShardRecoveryException: failed recovery
        ... 11 more
Caused by: java.lang.NullPointerException
        at java.util.Objects.requireNonNull(Objects.java:208) ~[?:?]
        at java.util.Optional.of(Optional.java:113) ~[?:?]
        at org.opensearch.knn.index.codec.KNNCodecVersion.lambda$static$5(KNNCodecVersion.java:74) ~[?:?]
        at org.opensearch.knn.index.codec.KNNCodecService.codec(KNNCodecService.java:33) ~[?:?]
        at org.opensearch.index.engine.EngineConfig.getCodec(EngineConfig.java:378) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.engine.Engine.getSegmentFileSizes(Engine.java:916) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.engine.Engine.fillSegmentStats(Engine.java:906) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.engine.NoOpEngine.<init>(NoOpEngine.java:82) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2036) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1999) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.StoreRecovery.internalRecoverFromStore(StoreRecovery.java:584) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.index.shard.StoreRecovery.lambda$recoverFromStore$0(StoreRecovery.java:113) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.ActionListener.completeWith(ActionListener.java:342) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]

Issue can be replicated with following steps:

  • create index with knn field
  • ingest some vector data
  • call POST /index/close

Check List

  • New functionality includes testing.
    • All tests pass
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski martin-gaievski requested a review from a team December 20, 2022 23:45
@martin-gaievski martin-gaievski added bug Something isn't working backport 2.x v2.5.0 'Issues and PRs related to version v2.5.0' labels Dec 20, 2022
@martin-gaievski martin-gaievski merged commit c412c8a into opensearch-project:main Dec 21, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 21, 2022
…m translog (#685)

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit c412c8a)
martin-gaievski added a commit that referenced this pull request Dec 21, 2022
…m translog (#685) (#687)

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit c412c8a)

Co-authored-by: Martin Gaievski <[email protected]>
opensearch-trigger-bot bot added a commit that referenced this pull request Dec 21, 2022
…m translog (#685) (#687)

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit c412c8a)

Co-authored-by: Martin Gaievski <[email protected]>
(cherry picked from commit 04f677e)
martin-gaievski pushed a commit that referenced this pull request Dec 21, 2022
…m translog (#685) (#687) (#688)

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit c412c8a)

Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
martin-gaievski added a commit to martin-gaievski/k-NN that referenced this pull request Dec 21, 2022
…m translog (opensearch-project#685) (opensearch-project#687)

    Signed-off-by: Martin Gaievski <[email protected]>
    (cherry picked from commit c412c8a)

    Co-authored-by: Martin Gaievski <[email protected]>

Signed-off-by: Martin Gaievski <[email protected]>
martin-gaievski added a commit to martin-gaievski/k-NN that referenced this pull request Dec 21, 2022
…m translog (opensearch-project#685) (opensearch-project#687)

    Signed-off-by: Martin Gaievski <[email protected]>
    (cherry picked from commit c412c8a)

    Co-authored-by: Martin Gaievski <[email protected]>

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 61f6346)
martin-gaievski added a commit that referenced this pull request Dec 21, 2022
…m translog (#685) (#687) (#689)

Signed-off-by: Martin Gaievski <[email protected]>
    (cherry picked from commit c412c8a)

    Co-authored-by: Martin Gaievski <[email protected]>

Signed-off-by: Martin Gaievski <[email protected]>

Signed-off-by: Martin Gaievski <[email protected]>
martin-gaievski added a commit that referenced this pull request Dec 21, 2022
…m translog (#685) (#687) (#690)

Signed-off-by: Martin Gaievski <[email protected]>
    (cherry picked from commit c412c8a)

    Co-authored-by: Martin Gaievski <[email protected]>

Signed-off-by: Martin Gaievski <[email protected]>
(cherry picked from commit 61f6346)
@jmazanec15 jmazanec15 added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch and removed bug Something isn't working labels Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch v2.5.0 'Issues and PRs related to version v2.5.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants