Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] OpenSearch non-deterministically returning fewer than size hits when search_after is configured even if hits exist #9537

Closed
calebmer opened this issue Aug 24, 2023 · 14 comments
Labels
bug Something isn't working Search Search query, autocomplete ...etc v2.11.0 Issues and PRs related to version 2.11.0

Comments

@calebmer
Copy link

calebmer commented Aug 24, 2023

Describe the bug
In one of my tests, I think a bug in OpenSearch is causing it to be flaky.

I have five items:

[
    {createdTime: 1, priority: 2},
    {createdTime: 2, priority: null},
    {createdTime: 3, priority: 2},
    {createdTime: 4, priority: null},
    {createdTime: 5, priority: null},
]

I sort them by [priority, createdTime]. I issue a search with size of 1 and search_after for the third item ([2147483647, 3]). I’d expect to get a response containing just {createdTime: 4, priority: null} which OpenSearch does give me most of the time. However, occasionally it returns zero items.

To Reproduce
Here’s a minimal reproduction script for the issue on OpenSearch 2.9.0:

#!/bin/bash

opensearch_index_url=http://localhost:3006/search_after_bug_repro

curl -X DELETE "$opensearch_index_url"

echo

curl -X PUT "$opensearch_index_url" \
    -H "content-type: application/x-ndjson" \
    -d @- << EOF
{
    "settings": {
        "index": {
            "refresh_interval": "-1"
        }
    },
    "mappings": {
        "properties": {
            "createdTime": {"type": "long", "doc_values": true},
            "priority": {"type": "byte", "doc_values": true}
        }
    }
}
EOF

echo

for i in {1..100}; do
    nl=$'\n'
    id1="$RANDOM$RANDOM"
    id2="$RANDOM$RANDOM"
    id3="$RANDOM$RANDOM"
    id4="$RANDOM$RANDOM"
    id5="$RANDOM$RANDOM"

    curl -s -X POST "$opensearch_index_url/_bulk" \
        -H "content-type: application/x-ndjson" \
        -d "{\"create\":{\"_id\":\"$id1\"}}$nl \
            {\"createdTime\":\"1\",\"priority\":2}$nl \
            {\"create\":{\"_id\":\"$id2\"}}$nl \
            {\"createdTime\":\"2\",\"priority\":null}$nl \
            {\"create\":{\"_id\":\"$id3\"}}$nl \
            {\"createdTime\":\"3\",\"priority\":2}$nl \
            {\"create\":{\"_id\":\"$id4\"}}$nl \
            {\"createdTime\":\"4\",\"priority\":null}$nl \
            {\"create\":{\"_id\":\"$id5\"}}$nl \
            {\"createdTime\":\"5\",\"priority\":null}$nl" \
        > /dev/null

    curl -s -X POST "$opensearch_index_url/_refresh" > /dev/null

    hits=$(curl -s -X POST "$opensearch_index_url/_search?size=1" \
        -H "content-type: application/json" \
        -d "{\"sort\":[{\"priority\":{\"order\":\"asc\",\"missing\":\"_last\"}},{\"createdTime\":{\"order\":\"asc\",\"missing\":\"_last\"}},{\"_id\":{\"order\":\"asc\"}}],\"search_after\":[2147483647,3,\"$id3\"]}" \
        | jq '.hits.hits | length')

    echo "Hits: $hits"

    if [[ "$hits" != "1" ]]; then
        exit 1
    fi
done

An example output for me:

{"acknowledged":true}
{"acknowledged":true,"shards_acknowledged":true,"index":"search_after_bug_repro"}
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 1
Hits: 0

This script runs the test scenario 100 times and exits on failure. As you can see most of the time I get the correct result (Hits: 1) but fairly frequently you get an incorrect result (Hits: 0).

Expected behavior
I’d expect OpenSearch to always return a hit that matches the search if one exists.

A little more color on my use case: I perform these queries as a part of pagination. My system will sometimes do a size of 1 search to check if there is more data on the next page. If it gets back zero results I consider pagination to be over. So it’s kinda bad if I don’t show the user all results because OpenSearch failed to return a hit.

Plugins
I remove all plugins on installation (rm -rf plugins).

Host/Environment (please complete the following information):

  • OS: MacOS (I’m running the Linux download directly on MacOS, I’m not using a Docker container. There aren’t any official docs for this but the Linux download appears to support darwin architectures. This is also how Homebrew installs OpenSearch on MacOS)
  • Version: 2.9.0

Additional context
Since I’m running OpenSearch locally I’ve configured discovery.type: single-node.

@calebmer calebmer added bug Something isn't working untriaged labels Aug 24, 2023
@dblock
Copy link
Member

dblock commented Aug 24, 2023

Is this related/similar/dup of #9013? Possibly related to #6596.

@dblock
Copy link
Member

dblock commented Aug 24, 2023

@gashutos care to take a quick look?

@gashutos
Copy link
Contributor

@dblock looking.

@gashutos
Copy link
Contributor

gashutos commented Aug 25, 2023

@dblock This mst be different root cause. Becuase we specifically dont invoke search_after optimizations introduced in #7453 after fix here. We check if missing:_last is specified or not.

@calebmer Just double verifying if you are testing on 2.9.0 ? And not 2.8.0.

@gashutos
Copy link
Contributor

@calebmer I've ran the script provided by you 5/6 times and was not able to reproduce this bug on 2.9.0

@calebmer
Copy link
Author

calebmer commented Aug 25, 2023

@gashutos Hmm. It’s still reproing for me, confirmed I’m running 2.9.0. Let me give more system details…

I installed OpenSearch fresh, no extra configuration, with Homebrew on my macOS 12.3 computer running an Apple M1 chip. The commands I ran were effectively:

$ brew install opensearch
$ opensearch
$ ./repro.sh

…and it reproduced on the first run. Confirmed from opensearch logs that it’s version 2.9.0.

@calebmer
Copy link
Author

I borrowed a friend’s laptop which does not have OpenSearch installed and was able to reproduce using the same steps. So it’s not some configuration unique to my setup.

They also are running macOS on an Apple M1 chip.

@gashutos
Copy link
Contributor

gashutos commented Aug 27, 2023

@calebmer yes, I am able to repro the bug now, it is coming with Lucene 9.7.0 upgrade. Will be sending fix soon.

@gashutos
Copy link
Contributor

Not sure about exact root cause yet, but reverting this PR apache/lucene#12334 of lucene works fine.

@gashutos
Copy link
Contributor

PR raised here with fix.
apache/lucene#12520

This scenario is specifically happening when search after is delaling with missing value as search_after value like in this case Int.MAX_VALUE

@dblock
Copy link
Member

dblock commented Aug 28, 2023

@gashutos very nice job!

@gashutos
Copy link
Contributor

gashutos commented Sep 5, 2023

This should be fix after lucene 9.8 upgrade.

@kotwanikunal kotwanikunal added the Search Search query, autocomplete ...etc label Sep 19, 2023
@msfroh msfroh added v2.11.0 Issues and PRs related to version 2.11.0 and removed untriaged labels Sep 20, 2023
@msfroh
Copy link
Collaborator

msfroh commented Sep 20, 2023

Closing this issue, since the bug was fixed in Lucene. We should expect the fix to be available in OpenSearch 2.11.

@msfroh msfroh closed this as completed Sep 20, 2023
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Sep 20, 2023
@gashutos
Copy link
Contributor

Closing this issue, since the bug was fixed in Lucene. We should expect the fix to be available in OpenSearch 2.11.

Thanks @msfroh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search Search query, autocomplete ...etc v2.11.0 Issues and PRs related to version 2.11.0
Projects
Archived in project
Development

No branches or pull requests

5 participants