Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"IN" (Set condition) for small namespaces after v2.9.2 #61

Closed
oruchreis opened this issue Aug 7, 2020 · 7 comments
Closed

"IN" (Set condition) for small namespaces after v2.9.2 #61

oruchreis opened this issue Aug 7, 2020 · 7 comments

Comments

@oruchreis
Copy link

Hi,
After v2.9.2 we can't search small namespaces with IN condition but bigger namespaces works expected.
This works on small namespaces:
Id IN (1)
But this doesn't work:
Id IN (1,1,1,1)
If items count in the namespace is bigger than 50 or something it works expected.
I've figured out that if I set kMaxSelectivityPercentForIdset to 100 in indexunorderd.cc, it works like before v2.9.2. I think there is a bug in this condition in indexunorderd.cc at line 192:

return res.size() > 1 && (100 * idsCount / ctx.opts.itemsCountInNamespace > kMaxSelectivityPercentForIdset);

Best Regards.

@slowcheetahzzz
Copy link
Contributor

Hello oruchreis, thanks for reporting the issue. We'll take a look at it in the nearest future.

@olegator77
Copy link
Contributor

Hi,
We can't reproduce this issue with simple data set.
Plz provide more details about this issue: Indexes structure, data structure, exact query, query explain ouptut

@oruchreis
Copy link
Author

oruchreis commented Aug 7, 2020

Hi,
Sorry I forgot to mention about built environment. I could reproduce this issue with windows build(MSVC) of standalone reindexer server. In linux and osx build I can't reproduce this issue neither. I simply cloned latest master branch to my local and built x64-Debug(I've tried Release also) with Visual Studio 2019. I've only enabled "LINK_RESOURCES" option for face and swagger. Here are the structures:
Namespace: Test
Indexes:

{
  "total_items": 1,
  "items": [
    {
      "name": "Id",
      "field_type": "int64",
      "index_type": "hash",
      "is_pk": true,
      "is_array": false,
      "is_dense": true,
      "is_sparse": false,
      "collate_mode": "none",
      "sort_order_letters": "",
      "expire_after": 0,
      "config": {},
      "json_paths": [
        "Id"
      ]
    }
  ]
}

Data:

{
  "items": [
    {
      "Id": 1
    },
    {
      "Id": 2
    }
  ],
  "namespaces": [
    "Test"
  ],
  "cache_enabled": true,
  "total_items": 2
}

The query that fails:

SELECT Id FROM Test WHERE Id IN (1,1,1,1)

If I add EXPLAIN to start of the query, the application crashes.

@oruchreis
Copy link
Author

oruchreis commented Aug 7, 2020

I've also tested x64-release build and the setup file from appveyor artifacts (which afaik it is VS 2017 and x64-release). In both release version the query failed like before, but I could get the explain result without crash. Here are explain outputs of the failed query:

{
  "items": [],
  "namespaces": [
    "Test"
  ],
  "cache_enabled": true,
  "explain": {
    "total_us": 56,
    "prepare_us": 25,
    "indexes_us": 21,
    "postprocess_us": 6,
    "loop_us": 3,
    "general_sort_us": 0,
    "sort_index": "-",
    "sort_by_uncommitted_index": false,
    "selectors": [
      {
        "items": 2,
        "matched": 2,
        "method": "scan",
        "type": "SingleRange"
      },
      {
        "field": "Id",
        "keys": 0,
        "comparators": 1,
        "cost": 3,
        "matched": 0,
        "method": "scan",
        "type": "OnlyComparator"
      }
    ]
  }
}

@olegator77
Copy link
Contributor

olegator77 commented Aug 7, 2020

Actually problem looks like tsl::hopscotch_set or MSVC compiler bug. I wrote issue to tsl::hopscotch_set repo - Tessil/hopscotch-map#52

In the next reindexer release we will introduce workaround: just replace emplace with insert here: https://github.com/Restream/reindexer/blob/master/cpp_src/core/comparatorimpl.h#L102

@olegator77
Copy link
Contributor

olegator77 commented Aug 7, 2020

I've figured out that if I set kMaxSelectivityPercentForIdset to 100 in indexunorderd.cc, it works like before v2.9.2. I think there is a bug in this condition in indexunorderd.cc at line 192:

This change can seriously slow down queries execution with wide IN () condition to low selectivity indexes.
Actually, kMaxSelectivityPercentForIdset mean, that 'choose scan method instead of index, if scan will be more efficient'

@oruchreis
Copy link
Author

I confirm that it is fixed in v2.11.1. Thanks for your effort.
Best Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants