Skip to content

Commit

Permalink
Update numbers to reflect 4-byte UTF-8-encoded characters (#27083)
Browse files Browse the repository at this point in the history
You need 4 bytes for characters outside the BMP, which includes many emoji and
a bunch of less-common writing characters too.
  • Loading branch information
DaveCTurner committed Jul 2, 2018
1 parent bffb32e commit cbbb0ca
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/reference/mapping/params/ignore-above.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,5 @@ limit of `32766`.

NOTE: The value for `ignore_above` is the _character count_, but Lucene counts
bytes. If you use UTF-8 text with many non-ASCII characters, you may want to
set the limit to `32766 / 3 = 10922` since UTF-8 characters may occupy at most
3 bytes.
set the limit to `32766 / 4 = 8191` since UTF-8 characters may occupy at most
4 bytes.

0 comments on commit cbbb0ca

Please sign in to comment.