Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple OR statements, such as a list of IPs very slow (due to nested query?) #69649

Closed
archon810 opened this issue Jun 22, 2020 · 6 comments · Fixed by #93506
Closed

Multiple OR statements, such as a list of IPs very slow (due to nested query?) #69649

archon810 opened this issue Jun 22, 2020 · 6 comments · Fixed by #93506
Assignees
Labels
enhancement New value added to drive a business result Feature:KQL KQL impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:medium Medium Level of Effort PR sent

Comments

@archon810
Copy link

Kibana version:
7.8

Elasticsearch version:
7.8

Server OS version:
OpenSUSE 15.1

This is a follow-up to #4673 (comment). I'll quote it here again.

Just tried to query nginx logs in Kibana by the top 1000 IPs in https://www.abuseipdb.com, using source.ip:IP1 or source.ip:IP2... format, and the search is still going after many minutes. It'd be great to implement not just this feature but also any related speed optimizations so that querying for 1000 IPs doesn't run endlessly and pretty much never return.

Here's an example of what Inspect shows for such a query with only 4 IPs. No wonder ES has trouble with it when we get to 1000 values due to all the nesting: https://gist.github.com/6dc0915601699a37a763d4dc99263a16.

Is there really no equivalent of MySQL's IN parameter so that we could say something like source.ip:(IP1, IP2, IP3, etc)?

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@timroes
Copy link
Contributor

timroes commented Jun 23, 2020

Just to have it directly available here, let's copy the query sent over here:

Query send
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "bool": {
            "should": [
              {
                "bool": {
                  "should": [
                    {
                      "match": {
                        "source.ip": "1.6.103.18"
                      }
                    }
                  ],
                  "minimum_should_match": 1
                }
              },
              {
                "bool": {
                  "should": [
                    {
                      "bool": {
                        "should": [
                          {
                            "match": {
                              "source.ip": "1.6.182.218"
                            }
                          }
                        ],
                        "minimum_should_match": 1
                      }
                    },
                    {
                      "bool": {
                        "should": [
                          {
                            "bool": {
                              "should": [
                                {
                                  "match": {
                                    "source.ip": "1.9.46.177"
                                  }
                                }
                              ],
                              "minimum_should_match": 1
                            }
                          },
                          {
                            "bool": {
                              "should": [
                                {
                                  "match": {
                                    "source.ip": "1.9.78.242"
                                  }
                                }
                              ],
                              "minimum_should_match": 1
                            }
                          }
                        ],
                        "minimum_should_match": 1
                      }
                    }
                  ],
                  "minimum_should_match": 1
                }
              }
            ],
            "minimum_should_match": 1
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2020-06-19T00:17:56.518Z",
              "lte": "2020-06-22T00:17:56.518Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },

This looks indeed weird, that we convert a flat query like source.ip:IP1 or source.ip:IP2 or source.ip:IP3 or source.ip:IP4 into a kind of arbitrary nested query like that. Did you use any kind of parantheses in the query, or was the query really just that linear as my example in the previous sentence?

@lukasolson
Copy link
Member

Just FYI, the following syntax is supported:

source.ip:(IP1 or IP2 or IP3 or...)

However, I believe it still generates in the nested structure. Are you sure the bottleneck is caused by how the query is written (i.e. the query runs more quickly when sending it in a single bool)? If so, we could potentially change things so that it doesn't, but it seems like Elasticsearch should be more intelligent in how it processes the query.

We have discussed an optimization step in how KQL queries are written (for example, to combine range queries as well with multiple conditions). It might make sense to consider as part of that effort, if only for the purpose of sending a smaller payload to the server/Elasticsearch.

@lukasolson lukasolson added enhancement New value added to drive a business result impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:medium Medium Level of Effort labels Jun 23, 2020
@archon810
Copy link
Author

I also tried this syntax before reporting, just forgot to mention it source.ip:(IP1 or IP2 or IP3 or...) and observed what you did that it didn't change the query.

@seanbirdsell
Copy link

I just discovered this problem today. Any ETA on a resolution?

@archon810
Copy link
Author

Hooray!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:KQL KQL impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:medium Medium Level of Effort PR sent
Projects
None yet
5 participants