Docs: UTF-8 character can take up 4 bytes #27060

ChrisGreenaway · 2017-10-20T13:54:32Z

Elasticsearch version (bin/elasticsearch --version): master

Plugins installed: N/A

JVM version (java -version): N/A

OS version (uname -a if on a Unix-like system): N/A

Description of the problem including expected versus actual behavior:

ignore-above.asciidoc says "If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 3 = 10922 since UTF-8 characters may occupy at most 3 bytes." however a UTF-8 character can take up 4 bytes.

Steps to reproduce:

Look at ignore-above.asciidoc

Provide logs (if relevant):

The text was updated successfully, but these errors were encountered:

nik9000 · 2017-10-20T14:01:46Z

@clintongormley git blame says you wrote this. @ChrisGreenaway is right, 4 bytes is possible, but not common. It looks like 4 bytes is fairly rare unless you are dealing with fairly special text. Or emoji.

DaveCTurner · 2017-10-23T16:05:12Z

I agree - pretty common to see emoji these days. I opened #27083.

DaveCTurner · 2017-10-24T09:49:44Z

Thanks for the report, @ChrisGreenaway, much appreciated.

ChrisGreenaway · 2017-10-24T09:51:39Z

You are welcome. Thanks for fixing it.

nik9000 added the >docs General docs changes label Oct 20, 2017

DaveCTurner mentioned this issue Oct 23, 2017

Update numbers to reflect 4-byte UTF-8-encoded characters #27083

Merged

DaveCTurner closed this as completed in #27083 Oct 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: UTF-8 character can take up 4 bytes #27060

Docs: UTF-8 character can take up 4 bytes #27060

ChrisGreenaway commented Oct 20, 2017

nik9000 commented Oct 20, 2017

DaveCTurner commented Oct 23, 2017

DaveCTurner commented Oct 24, 2017

ChrisGreenaway commented Oct 24, 2017

Docs: UTF-8 character can take up 4 bytes #27060

Docs: UTF-8 character can take up 4 bytes #27060

Comments

ChrisGreenaway commented Oct 20, 2017

nik9000 commented Oct 20, 2017

DaveCTurner commented Oct 23, 2017

DaveCTurner commented Oct 24, 2017

ChrisGreenaway commented Oct 24, 2017