Support for IPv6 #17007

clintongormley · 2016-03-08T14:27:13Z

Lucene now has sandbox support for IPv6 (LUCENE-7043).

InetAddressPoint is 1-dimensional by nature: of course you can have multiple values per field, thats different. Since we prefix-compress values, we can just map IPv4 addresses to IPv6 space and it works for both types. This is consistent with what InetAddress does itself anyway.

We should look at what needs to be done to support IPv6 in Elasticsearch. We should add a new ipv6 field which handles both IPv4 and IPv6.

The only problem I can see is that users might expect to get an IPv4 address back (eg 192.168.1.5) and instead it'd be returned as 0:0:0:0:0:ffff:c0a8:105. Perhaps we should have an ipv4 field as well that handles only IPv4 addresses, or add some option to render as IPv4 when possible?

The text was updated successfully, but these errors were encountered:

rmuir · 2016-03-08T14:32:11Z

The only problem I can see is that users might expect to get an IPv4 address back (eg 192.168.1.5) and instead it'd be returned as 0:0:0:0:0:ffff:c0a8:105. Perhaps we should have an ipv4 field as well that handles only IPv4 addresses, or add some option to render as IPv4 when possible?

That is not true. This is only something that happens internally in a BKD tree. Users do not see it. You do not "get things back" from that API.

Also note: if you follow the RFC correctly, then calling https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getByAddress%28byte[]%29 with a 128-bit mapped IPv4 address returns back a 32-bit Inet4Address: it does this round tripping for you. That is why all Query.toString()'s and so on work intuitively with the lucene support.

jpountz · 2016-03-08T14:34:34Z

While we don't have this issue with the bkd tree, I think it's something we will have to think about for doc values, eg. for computing the top ip addresses that hit a web server.

rmuir · 2016-03-08T14:36:42Z

its really not at all. just encode the same way, and use InetAddress to decode for any presentation form.

rmuir · 2016-03-08T14:38:04Z

Its this part that is super-important: calling https://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#getByAddress%28byte[]%29 with a 128-bit mapped IPv4 address returns back a 32-bit Inet4Address: it does this round tripping for you

so if you want to display to the user in any way, make an InetAddress from the bytes. it does the right thing.

Please don't make ipv4 vs ipv6 types or any other craziness when this all works exactly as the user expects using basic java apis!

jpountz · 2016-03-08T14:57:33Z

I agree there should not be two types and I'm not concerned about the Java API. The question was more about what we do on the rest layer eg. is it ok for aggs to render ip addresses differently from the way they were formatted at index time. I think it is, but it did not happen with the v4-only field so I think it's also important that we raise the question and document the behaviour.

rmuir · 2016-03-08T14:59:25Z

I don't think you guys get how the transition mechanism works inside the java InetAddress class. I encourage you to play with it.

clintongormley · 2016-03-08T15:05:11Z

Please don't make ipv4 vs ipv6 types or any other craziness when this all works exactly as the user expects using basic java apis!

Relax, Rob. The only reason I suggested it was because I was unaware of the information that you have provided. Thanks for the link.

Isn't it true though that if a user indexes an IPv4 address in IPv6 format, doc values will return the address in IPv4 format? (ie not what they indexed?).

rmuir · 2016-03-08T15:09:45Z

If someone supplies an ipv4 address (mapped or not), its an ipv4 address. thats what that reserved space in the ipv4 region is for. If someone supplies an ipv6 address, its an ipv6 address.

note that ipv6 addresses can be represented in many ways. i recommend emitting canonical representations. So if someone supplies 0:0:0:0:0:0:0:1 they can expect to get it back as ::1.

Its always "what they indexed". its the same address. Its like returning 16.0 vs returning 16.

Please, lets not try to overcomplicate this. Its not complicated.

clintongormley · 2016-03-08T15:11:25Z

Please, lets not try to overcomplicate this. Its not complicated.

I wasn't trying to overcomplicate it, I was trying to understand it. Thank you for explaining.

rmuir · 2016-03-08T15:14:51Z

as far as docvalues, i would just use SortedSetDocValues class for internet addresses. This is critical for ipv6 data because (like the BKD tree) it will provide prefix compression.

For network addresses, "range" operations are a common operation (e.g. cidr/prefix operations), so having ordinal indirection is not useless, and can speed things up.

rmuir · 2016-03-08T15:21:36Z

And for that to work, docvalues needs a little tweak before the compression will happen: we currently always 'bail' on prefix-compression for fixed-length data: https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/codecs/lucene54/Lucene54DocValuesConsumer.java#L423-L425

that is not always the best tradeoff...

rmuir · 2016-03-08T15:31:12Z

We may even want to turn that logic off completely. basically that logic was originally added to "save space" (= save on addresses) before we had prefix compression at all...

Unfortunately, we really need to solve this to avoid "my ip address field quadrupled in size!"

rmuir · 2016-03-08T15:44:00Z

I opened https://issues.apache.org/jira/browse/LUCENE-7081

This makes all numeric fields including `date`, `ip` and `token_count` use points instead of the inverted index as a lookup structure. This is expected to perform worse for exact queries, but faster for range queries. It also requires less storage. Notes about how the change works: - Numeric mappers have been split into a legacy version that is essentially the current mapper, and a new version that uses points, eg. LegacyDateFieldMapper and DateFieldMapper. - Since new and old fields have the same names, the decision about which one to use is made based on the index creation version. - If you try to force using a legacy field on a new index or a field that uses points on an old index, you will get an exception. - IP addresses now support IPv6 via Lucene's InetAddressPoint and store them in SORTED_SET doc values using the same encoding (fixed length of 16 bytes and sortable). - The internal MappedFieldType that is stored by the new mappers does not have any of the points-related properties set. Instead, it keeps setting the index options when parsing the `index` property of mappings and does `if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }` when parsing documents. Known issues that won't fix: - You can't use numeric fields in significant terms aggregations anymore since this requires document frequencies, which points do not record. - Term queries on numeric fields will now return constant scores instead of giving better scores to the rare values. Known issues that we could work around (in follow-up PRs, this one is too large already): - Range queries on `ip` addresses only work if both the lower and upper bounds are inclusive (exclusive bounds are not exposed in Lucene). We could either decide to implement it, or drop range support entirely and tell users to query subnets using the CIDR notation instead. - Since IP addresses now use a different representation for doc values, aggregations will fail when running a terms aggregation on an ip field on a list of indices that contains both pre-5.0 and 5.0 indices. - The ip range aggregation does not work on the new ip field. We need to either implement range aggs for SORTED_SET doc values or drop support for ip ranges and tell users to use filters instead. elastic#17700 Closes elastic#16751 Closes elastic#17007 Closes elastic#11513

clintongormley added >feature discuss :Search Foundations/Mapping Index mappings, including merging and defining field types labels Mar 8, 2016

clintongormley mentioned this issue Mar 8, 2016

add ipv6 field support #5758

Closed

jpountz mentioned this issue Apr 14, 2016

Use the new points API to index numeric fields. #17746

Merged

jpountz closed this as completed in #17746 Apr 14, 2016

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for IPv6 #17007

Support for IPv6 #17007

clintongormley commented Mar 8, 2016

rmuir commented Mar 8, 2016

jpountz commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

jpountz commented Mar 8, 2016

rmuir commented Mar 8, 2016

clintongormley commented Mar 8, 2016

rmuir commented Mar 8, 2016

clintongormley commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

Support for IPv6 #17007

Support for IPv6 #17007

Comments

clintongormley commented Mar 8, 2016

rmuir commented Mar 8, 2016

jpountz commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

jpountz commented Mar 8, 2016

rmuir commented Mar 8, 2016

clintongormley commented Mar 8, 2016

rmuir commented Mar 8, 2016

clintongormley commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016

rmuir commented Mar 8, 2016