Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiPoint query yields more false positives when points are further apart #27954

Closed
mikeurbach opened this issue Dec 21, 2017 · 5 comments
Closed
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@mikeurbach
Copy link

Elasticsearch version: Version: 6.1.1, Build: bd92e7f/2017-12-17T20:23:25.338Z, JVM: 1.8.0_45

Plugins installed: []

JVM version : Java(TM) SE Runtime Environment (build 1.8.0_45-b14)

OS version: Darwin Kernel Version 14.5.0: Sun Jun 4 21:40:08 PDT 2017; root:xnu-2782.70.3~1/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:

I am using a geo_shape mapping to index polygons and querying with a multi point. I get false positives when the multi point contains points that are very far apart and do not intersect any polygon. If I use non-matching points that are sufficiently close together, there is no false positive, as expected.

This appears to only be happening when I enable the "tree": "quadtree" option for this field. I couldn't reproduce with the default geohash tree. I've tried with different precision and distance error percent settings, but the issue remains, and it occurs with the default settings.

I don't think this is a duplicate of #27123, as this seems to be related to the quadtree implementation, multi points, and much larger distances than the example in #27123.

Steps to reproduce:

  1. Create an index with a geo_shape field and quadtree enabled.
curl -XDELETE localhost:9200/test

curl -XPUT -H'Content-Type: application/json' localhost:9200/test -d'{
  "mappings": {
    "test": {
      "properties": {
        "location": {
          "type": "geo_shape",
          "tree": "quadtree",
          "precision": "100m",
          "distance_error_pct": 0.05
        }
      }
    }
  }
}'
  1. Index a polygon
curl -XPUT -H'Content-Type: application/json' localhost:9200/test/test/1 -d'{
  "location": {
    "type": "Polygon",
    "coordinates": [
      [
        [-0.1263, 51.5016],
        [-0.1263, 51.4996],
        [-0.1228, 51.4996],
        [-0.1228, 51.5016],
        [-0.1263, 51.5016]
      ]
    ]
  }
}'
  1. Query with non-matching points, which are "close". Nothing is returned, as expected. Screenshot of geojson.io: http://take.ms/DYuU8m
curl -XPOST -H'Content-Type: application/json' localhost:9200/test/_search -d'{
  "query": {
    "geo_shape": {
      "location": {
        "shape": {
          "type": "MultiPoint",
           "coordinates": [
             [-0.1264, 51.5062],
             [-0.1235, 51.4935]
           ]
        }
      }
    }
  }
}'
  1. Query with non-matching points, which are "far apart". The document is returned, even though neither of the points intersects it. Screenshot of geojson.io: http://take.ms/Xjp15
curl -XPOST -H'Content-Type: application/json' localhost:9200/test/_search -d'{
  "query": {
    "geo_shape": {
      "location": {
        "shape": {
          "type": "MultiPoint",
           "coordinates": [
             [-0.1264, 51.5062],
             [-117.7679, 35.5330]
           ]
        }
      }
    }
  }
}'

I haven't played around too much with exactly how "far apart" the points need to be for this to happen.

@jkakavas jkakavas added the :Analytics/Geo Indexing, search aggregations of geo points and shapes label Dec 22, 2017
@DaveCTurner
Copy link
Contributor

Hi @mikeurbach. Firstly, thanks so much for putting the effort into providing a small reproduction for this. It makes it so much easier to see what's going on. I can reproduce this undesirable hit with version 6.1.1 and on master.

In some sense, I think this is expected behaviour: distance_error_pct applies at both index and search time, and allows for an error which is a proportion of the size of the shape in question. The query containing points which are far apart is much larger and therefore includes a wider margin of error, which intersects the returned document.

However, regarding the following:

This appears to only be happening when I enable the "tree": "quadtree" option for this field. I couldn't reproduce with the default geohash tree. I've tried with different precision and distance error percent settings, but the issue remains, and it occurs with the default settings.

I can't reproduce this effect. Reducing distance_error_pct from 0.05 to 0.005 is enough to prevent the test case you give from returning any hits, and if I switch tree from quadtree to geohash then I still get the hit.

@nknize I'm wondering if it makes sense to apply distance_error_pct in this way for MultiPoint shapes (and indeed for the other Multi* shapes) as opposed to applying it to each individual Point (resp. *). What do you think?

@DaveCTurner DaveCTurner changed the title Geo Shape field with quadtree returns documents when their shapes do not intersect MultiPoint query yields more false positives when points are further apart Dec 22, 2017
@mikeurbach
Copy link
Author

Thanks for the explanation. I was wondering if distance_error_pct applied at search time, but I was not completely sure from the docs. Re-reading the geo_shape query documentation, I can see how the distance_error_pct configured in the mapping will be applied at search time as well.

The geo_shape query uses the same grid square representation as the geo_shape mapping to find documents that have a shape that intersects with the query shape. It will also use the same PrefixTree configuration as defined for the field mapping.

Regarding the difference between using the geohash and quadtree tree implementation, I have a more specific example. If I use the default settings, I don't get the false positive with the following mapping:

{
  "mappings": {
    "test": {
      "properties": {
        "location": {
          "type": "geo_shape",
          "tree": "geohash"
        }
      }
    }
  }
}

But I do get the false positive when I switch to the following:

{
  "mappings": {
    "test": {
      "properties": {
        "location": {
          "type": "geo_shape",
          "tree": "quadtree"
        }
      }
    }
  }
}

I understand that this is a pretty outlandish case, but it was surprising to me that one tree implementation returns a false positive with the default precision settings while the other does not.

It seems like the behavior is indeed expected, and if I really need more accuracy for these huge MultiPoints, I can tune the parameters as you mentioned. Thanks again for the clarification.

@imotov
Copy link
Contributor

imotov commented Dec 13, 2018

Depends on #32039

@imotov imotov added the stalled label Dec 13, 2018
@imotov
Copy link
Contributor

imotov commented Jan 7, 2019

We still cannot resolve this one since BKD-tree backed geoshapes don't support MultiPoint queries.

@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@iverase
Copy link
Contributor

iverase commented Oct 23, 2020

I am closing this issue as BKD-backed geoshapes support multi-points and they do not show this behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Geo Indexing, search aggregations of geo points and shapes >bug stalled Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

8 participants