Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer/long mapping is trappy for ID-based fields #49538

Closed
markharwood opened this issue Nov 25, 2019 · 7 comments · Fixed by #49933
Closed

Integer/long mapping is trappy for ID-based fields #49538

markharwood opened this issue Nov 25, 2019 · 7 comments · Fixed by #49933
Assignees
Labels
>docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@markharwood
Copy link
Contributor

markharwood commented Nov 25, 2019

Saw another example of someone falling into the trap of mapping fields as longs when they are queried for exact-values rather than ranges. These are better mapped as keyword fields.

It's a trap that is too easy to fall into - "I have a long field so I just map it as a long-value, right?"

I can think of 4 ways to address the problem:

  1. Change documentation - provide bigger, bolder notices that not all longs should be mapped as longs. Nothing mentioned on numbers docs currently, only a speed-tuning page.
  2. Change field type names - use long_quantity or long_identifier to express the usage of a long number (quantities are things like prices or durations and are optimised for range queries whereas identifiers are looked up by exact matches)
  3. Better tooling - mapping-generation tools that suggest appropriate mappings given a sample of data (the ML team and me have both worked on these sort of data profiling tools before now).
  4. Automated indexing of numerics for both range and exact-match purposes (would require extra disk space)

An advantage of 2) is that I imagine we can have more optimised storage than string-based keyword for long_identifier types?

@markharwood markharwood added the :Search Foundations/Mapping Index mappings, including merging and defining field types label Nov 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@markharwood
Copy link
Contributor Author

markharwood commented Nov 27, 2019

I ran some quick benchmarks and keyword based searches looked to be 4 times faster than long based searches so using the right mapping can be an important enhancement.
This benefit perhaps shouldn't be worth paying for in a change to our default template mapping. Option 4 above (defaulting to indexing both for range and keyword lookup mappings) adds disk storage cost - my benchmark index of 32mb (just longs) grew to 43mb when I added the keyword subfield type.

@markharwood
Copy link
Contributor Author

We discussed this in FixItFriday.
The option of automatically indexing numerics optimised for both range and term queries would add to disk storage and we chose not to add this overhead because there's been insufficient complaint about exact-match performance. In my own benchmarks the overheads of exact-match on long types weren't noticeable until I ran multi-term terms queries - the slowdown is otherwise lost in the other costs of running a query (parsing JSON etc).

As far as changes go we should update the following docs to share the "map numerics as keyword" advice for improving exact-matching:

  1. The page for numeric types
  2. The page for keyword types

@markharwood markharwood added the >docs General docs changes label Nov 29, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@jrodewig jrodewig self-assigned this Dec 4, 2019
@jrodewig
Copy link
Contributor

jrodewig commented Dec 4, 2019

I'll work on adding this documentation.

@jdcohen220
Copy link

It seems like it would be beneficial to add a tip to the Terms Aggregation docs that the long type is not fully supported for bucketing in a terms aggregation.

@jrodewig
Copy link
Contributor

Hi @jdcohen220

Thank you for your feedback. Do you mind creating a separate issue with some steps for reproduction?

While both involve the long mapping datatype, I don't believe the problem you pointed out is directly related. Also feel free to reach out via https://discuss.elastic.co/c/elasticsearch/ if you have questions.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants